I’m searching for a few holy knights to help me replace the fopen()/fread() implementation in camel-folder-summary.c at GUADEC. Such a replacement would (I think) dramatically improve the memory usage of Evolution (and tinymail. On small devices the improvement would be even more noticeable).
The current implementation copies the string after fread()ing it, it also searches a hash-table to avoid duplicates and NULL terminates the copy. Because the used string format in the summary file isn’t NULL terminated C char pointers, but in stead pascal-string like (with length information in front of the string), while the infrastructure that uses the information doesn’t use the strings this way, I think the best solution will be to replace the file-format with a more mmap()-friendly one. For example with both the length information in front of the string and a NULL termination byte at the end of the string.
The reason why I think it would dramatically improve the situation is that with mmap(), the kernel gets to decide about whether or not putting the memory in its buffers/cache. Note that access to the information should, however, be fast. For example sorting headers depends on this. The access is (while sorting) random (qsort or mergesort or something like that). I don’t want to change thousands of Evolution lines for just this optimization.
Using valgrind I measured that a quite large part of the total amount of memory being allocated during one Evolution (or tinymail) session, goes to this summary information. Being mmap()ed, I think this data (mostly being buffer/cache) wouldn’t harm as much. We would have to drop the hashtable trick that avoids duplication, or we would have to implement it in such a way that duplication is also avoided in the file itself (which isn’t the case at this moment).
I think it would be a nice temporary solution until the disk-summary branch of camel (or libspruce) is finished.