It’s time for some monthly bashing-others about memory wasting. Moehaha! But hey, Camel was not done that bad. This is a friendly one. Some people in the past had this strange idea that I was questioning the competence of those who’ve built Camel. That’s absolutely not true. There’s a reason why I use it myself, don’t forget that (and some day, people will bash the fuck out of tinymail, right? Good! keeps us going).
Some developers enjoy the fact that glib makes their lives more easy a lot, it seems. And I agree with them! You should leverage the fact that glib is a well tested library. Who wants to implement his own hashtable or doubly-linked list each project he starts? I honestly don’t.
However. In the CamelFolderSummary, the number one piece of code that consumes most memory of E-mail clients that are being build on top of Camel (that includes Evolution and tinymail with its camel-lite), we surprisingly see a GHashTable being used in parallel with a GPtrArray! Both holds the exact same instances?!
Somebody like me … questions that. Because both in memory and in performance this (in this case) is a loose-loose situation: adding it to the hashtable takes time, searching a GPtrArray is (in this case) not going to take a lot longer (read below for more information on the “in this case” situation).
Consider the memory caused by each item you add to the GHashTable:
struct _GHashNode {
gpointer key;
gpointer value;
GHashNode *next;
guint key_hash;
};
That’s on a typical x86 architecture 4 + 4 + 4 + 4 bytes, right? 16 bytes per item (Or am I calculating this wrong? There’s no waste caused by mem alignment here I think, and it uses g_slice_new so there’s also not a lot heap admin). Multiply that with 10,000 headers, that’s 152KB of memory. Nothing you say? Okay, I agree (well, except that for a mobile application this is a significant memory improvement).
However, consider that each and every summary item that you can see in your Evolution consumes this. With the help of some mailing list subscriptions, I use Evolution to manage up to 1,000,000 messages that way: 15,625KB or 15MB of GHashNode instances (gah, I must be miscalculating because that’s much more than what I expected).
Because this:
static CamelMessageInfo*
find_message_info_with_uid (CamelFolderSummary *s, const char *uid)
{
CamelMessageInfo *retval = NULL;
guint i = 0, len = strlen (uid);
for (i=0; i < s->messages->len; i++) {
CamelMessageInfo *info = s->messages->pdata[i];
if (info && !strncmp (info->uid, uid, len)) {
retval = info;
break;
}
}
return retval;
}
Is more difficult than this:
g_hash_table_lookup(s->messages_uid, uid)
I don’t whine without a patch, right? right!
Update: and on the tinymail front, I removed the need for often calling this function (update: which is an important pre-condition for applying patches like this). This made loading large folders a lot faster when using the framework for this. You can svn diff -r 1451:1452 to check what I changed for that.
Update on CPU consumption (because some people have questions about that): The hashtable lookup would indeed be faster, but the hashtable lookup can (and usually is) avoided in Camel. Not sure whether it’s also avoided (and avoidable) in Evolution code (that’s a lot code to check), but all occurrences where the hashtable lookup is needed are avoidable and are therefore avoided by tinymail. In other words: looking up by uid is not often needed. But you are right that a hashtable lookup is faster than that find_message_info_with_uid implementation.