Hey Federico,
I’m most likely missing something extremely important here but simply doing,
gchar * g_utf8_offset_to_pointer (const gchar *str, glong offset) { const gchar *s = str; while (offset--) if (*s < 191) s++; else s = g_utf8_next_char (s); return (gchar *)s; }
Makes it a little bit faster (on my system). Oh and that way can the utf8_skip_data bitmap also be 191 bytes smaller, of course (adjust the macro to subtract 191 from (*s)). The idea here is that arithmetic operations are less expensive (on my CPU/architecture/system) than looking up memory. Just compile test.c using gcc -S test.c with and without -DOPT and check the differences in the resulting test.s file.
next char, please
Federico found g_utf8_offset_to_pointer() showing up in profiles. I remembered that we copied some of the UTF-8 code for …
* even smaller table
* no if-then-else
static const uchar utf8_skip[32] = {
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,3,3,3,3,4,4,5,0
};
const uchar * const g_utf8_skip = utf8_skip-(192/2);
uchar *
utf8_offset_to_pointer (
const uchar *str,
long offset)
{
while (offset–) {
uchar c = (*(uchar *)(str++));
if (c >= 192)
str += g_utf8_skip[c>>1];
}
return (uchar *)str;
}
I spy an evil magic number. What is this 191?
192 is the number of 1’s in the vector. 32x6lines in the code.