Optimizing g_utf8_offset_to_pointer

Hey Federico,

I’m most likely missing something extremely important here but simply doing,

gchar *
g_utf8_offset_to_pointer (const gchar *str,
                          glong        offset)
{
  const gchar *s = str;
  while (offset--)
    if (*s < 191) s++;
    else s = g_utf8_next_char (s);

  return (gchar *)s;
}

Makes it a little bit faster (on my system). Oh and that way can the utf8_skip_data bitmap also be 191 bytes smaller, of course (adjust the macro to subtract 191 from (*s)). The idea here is that arithmetic operations are less expensive (on my CPU/architecture/system) than looking up memory. Just compile test.c using gcc -S test.c with and without -DOPT and check the differences in the resulting test.s file.

4 thoughts on “Optimizing g_utf8_offset_to_pointer”

  1. next char, please

    Federico found g_utf8_offset_to_pointer() showing up in profiles. I remembered that we copied some of the UTF-8 code for …

  2. * even smaller table

    * no if-then-else

    static const uchar utf8_skip[32] = {

    1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,

    2,2,2,2,2,2,2,2,3,3,3,3,4,4,5,0

    };

    const uchar * const g_utf8_skip = utf8_skip-(192/2);

    uchar *

    utf8_offset_to_pointer (

    const uchar *str,

    long offset)

    {

    while (offset–) {

    uchar c = (*(uchar *)(str++));

    if (c &gt;= 192)

    str += g_utf8_skip[c&gt;&gt;1];

    }

    return (uchar *)str;

    }

Comments are closed.