camel-folder-summary.c

I’m searching for a few holy knights to help me replace the fopen()/fread() implementation in camel-folder-summary.c at GUADEC. Such a replacement would (I think) dramatically improve the memory usage of Evolution (and tinymail. On small devices the improvement would be even more noticeable).

The current implementation copies the string after fread()ing it, it also searches a hash-table to avoid duplicates and NULL terminates the copy. Because the used string format in the summary file isn’t NULL terminated C char pointers, but in stead pascal-string like (with length information in front of the string), while the infrastructure that uses the information doesn’t use the strings this way, I think the best solution will be to replace the file-format with a more mmap()-friendly one. For example with both the length information in front of the string and a NULL termination byte at the end of the string.

The reason why I think it would dramatically improve the situation is that with mmap(), the kernel gets to decide about whether or not putting the memory in its buffers/cache. Note that access to the information should, however, be fast. For example sorting headers depends on this. The access is (while sorting) random (qsort or mergesort or something like that). I don’t want to change thousands of Evolution lines for just this optimization.

Using valgrind I measured that a quite large part of the total amount of memory being allocated during one Evolution (or tinymail) session, goes to this summary information. Being mmap()ed, I think this data (mostly being buffer/cache) wouldn’t harm as much. We would have to drop the hashtable trick that avoids duplication, or we would have to implement it in such a way that duplication is also avoided in the file itself (which isn’t the case at this moment).

I think it would be a nice temporary solution until the disk-summary branch of camel (or libspruce) is finished.

4 thoughts on “camel-folder-summary.c”

  1. > The reason why I think it would dramatically improve the situation is that with mmap(), the kernel gets to decide about whether or not putting the memory in its buffers/cache

    I thought that chunks of mmap()ed memory were just loaded into memory when used. Are the chunks also put back onto disk or virtual memory when not used for a while? If not, it seems like it might force evo to hold on to a lot of non-cacheable RAM.

    Reducing frequent memory copying does seem like a nice performance aim.

  2. In fact, in libetpan, using mmap for mbox and maintaining all mailboxes opened revealed to be a problem. The process had a huge virtual memory (and resident memory). This made other applications swap a lot.
    I guess that we can improve the behavior by unmapping files when they are not needed for a while but I don’t know what is the behavior of the kernel. Does it keep most frequently blocks of the file in memory or are they flushed and reloaded ? Case where the performance would be bad.

  3. Hey hoa, note that camel wouldn’t need to keep entire mbox files open. Camel keeps just the summary info in a file. Such a file is ~ 200kb in size for a 1,000 headers folder.

    It would be possible (for tinymail) to close the mmap each time the folder is not active anymore. It wouldn’t be possible to set this behaviour in Evolution. The vfolders feature of evolution requires the folders (an summary mmap()’s) to remain open.

  4. Also note that the current implementation also keeps the entire current summary info in memory (at all times). So the same amount of memory per folder would be used. Except that now it would be in the format of an mmap() rather than a series of malloc()’s filled in with the results of an fread().

Comments are closed.