A few interesting things that happened today

I decided to just do it and throw together .NET bindings for Tinymail. They are nearly finished and looking great. I might never have told anybody, but a really sweet looking E-mail API for my favorite programming language C# was my original reason when I started Tinymail.

While I was doing Tinymail I of course learned about D, so now C# as a programming language might have to compete with D. Meanwhile Jürg did Vala too. To make sure OLPC would have a reason to take a look at Tinymail I fiddled around with Python bindings too. Mark Doffman made these bindings actually usable. Meanwhile Jürg made the Vala bindings for Tinymail. Just for fun.

That brings us at three bindings for Tinymail: .NET, Python and Vala. I’m hoping to add C++ to that list soon. I’m planning to do this in a way that makes the library sensible to use for Qt developers. This of course implies dealing with mainloop integration and providing a few standard models and views to get a typical Qt E-mail client developer quickly going. I’m not a big fan of C++ but I was pragmatic about GObject and C when I started Tinymail too.

I have some sort of goal in my mind and if I have to do twice as much work to get it working without my favorite tools, then well, that’s what it takes. Politics and religion just makes it more difficult to reach your goals, so I’m not very interested in .NET fans, Python fans, C++ fans, Qt fans and GNOME fans cheerleading because their favorite tech is being used (which is not the point of technology and innovation anyway). I do care about the technical implications of depending on something like a virtual machine, though. Especially since Tinymail has a focus on mobile.

Mono has an ARM port, so it’s usually not really a technical limitation. On top of that I found out about GC.AddMemoryPressure in .NET. Tinymail holds significant large resources that your garbage collector doesn’t know about. With GC.AddMemoryPressure you can easily teach the .NET garbage collector about these non-managed resources.

Something like GC.AddMemoryPressure is really a missing feature in Python in my opinion. When your E-mail client uses mostly the managed higher programming language just for invoking functionality implemented in C, not a lot of Python executions will occur during runtime. This means that Python’s garbage collector is only rarely called to do a collect. Unless you do gc.collect() at an intelligent location, your Python E-mail client will keep instances around for a very long time. In .NET, with GC.AddMemoryPressure, you can tell the virtual machine’s garbage collector that an external resource is significant enough to care about it more. It’s not as drastic as doing a GC.Collect() in .NET. GC.Collect is not recommended by Microsoft, since the .NET garbage collector is supposed to learn from the past to try to predict the future. If you GC.Collect(), you circumvent this. Using GC.AddMemoryPressure you don’t really do anything bad for the garbage collector. You just inform it about the significance of an external resource.

Regretfully, and I already asked Miguel, doesn’t the Mono runtime care about GC.AddMemoryPressure. But the API is available for when Mono’s garbage collector will be adapted to do care about it. Microsoft’s virtual machine does care, by the way.

Nice would be porting GObject to WinCE and then bringing Tinymail to that platform. Making a Tinymail based E-mail client with either Compact Framework .NET or Mono sounds like a fun project.

Meanwhile, Modest is doing great too. I’m still waiting a little bit with Tinymail’s first release. I would like to give the Modest team as gift when they release their first, Tinymail’s 1.0 release. I think we are nearing that moment.

Finally an opportunity!

So, if I get this right … Phones will use Qt, tablets will stay Maemo. That’s finally an opportunity for us to work together with the Qt people! I’m looking forward to a future where Nokia will fund all this jing jang. Great!

I guess I’ll soon be preparing Tinymail to be very usable for Qt developers too. Except for a few isolated places where it now requires GMainLoop integration, which I guess on devices will be shared for both Qt and Glib anyway, this should be possible already. The API just needs some C++ wrapping and perhaps an implementation of libtinymailui that uses Qt’s components.

I also started putting .NET bindings in place for the API. These bindings ain’t finished yet. Mike’s Gapi2 GTypeInterface binding generating code works very nice. Just a few compilation errors that I need to fix now (check gtk-sharp’s mailing list).

I sensed some enthusiasm in Rob’s post about all this Nokia and Trolltech stuff. My guess is that Rob is trying to tell us: “jeej! stuff is moving! Let’s go for it!”, but I’m not sure of course. But if that’s what you mean Rob, let’s indeed go for it and make the things in technology move!

All your privacy are belong to me!

I’ve been using Google analytics for Tinymail.org for a few months now.

I was mostly interested in results per city. As expected is Helsinki scoring high. Since a lot of Modest’s developers live in Spain there are a few cities with a lot of visits in Spain too. Now I know where you guys live!

Nothing surprising. Except maybe the visitors from the Indian cities Hyderabad and Bangalore. I wonder what Indian company is working on a mobile E-mail client? The visitors from South America are interesting too! Are you guys working on one for OLPC?

I also have a lot of Brooklyn and Tempe visitors. That’s Red Hat, right?

What Nokia division can we find in Oulu by the way? And Sydney, is that jdub visiting?

Cute and I guess typical are all European cities. All major cities in Europe had a lot of visitors. Just never really a lot, unless they are located in Finland and are called either Helsinki or Oulu.

With one single exception for Europe: a city in my own country, Heist-Op-Den-Berg. So, who’s that Tinymail fan in Heist-Op-Den-Berg? Let’s get a drink somewhere? What about FOSDEM this year? It was not me, my own home city scored like all other European cities.

Disappointing is Russia. The visits for all of Russia compares to one European city, all Russian visitors came from either Moscow, Tula or Lisichansk. In Russia E-mail libraries code you?

I had three visits from Honolulu!

What is strange is that Google analytic’s analysis of amount of visitors doesn’t really match my actual Apache logs if I manually count them. Something like 60% less unique visits on Google analytics. I wonder at what point will Google analytics start grouping the hits of a user as an actual visit?

Shaken or stirred?

I just released Tinymail’s pre-release 0.0.7. This release is mostly a bugfix release. We’ve been fixing memory problems and a few crashers in features that we introduced in pre 0.0.6. And because I have been telling people this about every pre release, this one too is probably one of the last pre releases before a final release!

The release notes can be found here. Downloads here. API documentation here. And the friendly guys doing Modest have made a beta release based on Tinymail’s trunk of today (which is the same version of the code as this pre-release 0.0.7). Install Modest on your N800 and N810 by tapping here.

The new Air thing!

This guy from Igalia has the new Air laptop thing already, go check it out!

Video by this guy.

And of course …

The same rfc2047 decoder fixes that Jeffrey did for upstream Camel are of course ported to Tinymail‘s parsing code. So E-mail clients like Modest will also parse those broken E-mails correctly (once you update your Modest packages on your Nokia devices, of course).

Of course are Nokia’s testers testing the application with such broken E-mails. We are indeed seeing that more E-mails can be displayed correctly with Jeffrey’s new rfc2047 decoder. Usually spam succeeds more often now. Legitimate E-mails are less frequently broken. I guess spammers want to fool weak rfc2047 decoder implementations in spam detection softwares.

For the last few weeks I have been synchronizing the embedded Camel of Tinymail with Camel upstream. Other than bringing upstream’s bugfixes to Tinymail, this will of course make it more easy to port features back to Camel upstream. I must stress again that a lot of the new features are specific for mobile use cases and that a lot of them are not done in such a way that they can easily be ported. Others are simply not very interesting for a desktop E-mail client, and some are.

Warning. This one is a little bit technical

First of all, a summary is the overview of your E-mail folder or mailbox. It shows you the cc, to, from, subject of each E-mail. In IMAP terminology people also call this the ENVELOPE of each E-mail. Showing all ENVELOPEs of a folder is showing the summary to the user. Some people want more than just the cc, to, from and subject to be visible. Most E-mail clients also indicate the read and importance status of the E-mails in this view. Some E-mail clients also show the size of the E-mail. Whatever yours shows, that is what I’ll here call … the summary.

About a year ago I was telling people about how few memory Tinymail consumed: it consumes fewer memory than most other E-mail clients because it maps the summary data. That’s still true and, in my opinion, a lot better than just copying the strings in memory. However …

The implementation didn’t care a enough about the VmRss. The analysis of memory usage was focused on what valgrind’s massif tool showed. That of course shows just the heap and the stack. Both are relevant, but not the only kind of numbers that you have to consider. The referred-to page explains this too (don’t worry, I was never trying to hide this fact).

What matters for a mobile device is the VmRss. The VmRss is a number that indicates how much of your data is in real memory modules. For a mobile device this is important because these devices often can’t have a swap partition, or a slow one on a level wearing flash device, have relatively few RAM installed, or consume more battery and/or are more expensive per unit to produce. I know RAM is not important for your server, and I don’t care about your server.

Oh .. that desktop or laptop that you just bought? In terms of amount of memory, it’s a server too in my eyes. I don’t care about your desktop machine either.

Another something that ain’t really good about the current implementation is that writing the file takes a long time, that it grows relatively large, and that it can contain redundant information. This redundant information contributes to the VmSize growing.


Note: VmSize is the total amount of memory being used, including things that got swapped out, libraries, heap, stack, everything. As a number it’s good for making wannabe geeks scared about the memory consumption of your application. Other than that, it’s not interesting as a number unless you have a good idea what exactly it means (like, know how shared libraries are handled nowadays and things like that).

What is worse is that because of the redundancy of data the locality of the data that might be required becomes fragmented (you’ll get more page faults). This contributes to a growing VmRss, which is bad news for the mobile device that is sparse on memory availability. I’d love to explain why this contributes to VmRss, … it comes down to “a kernel can only page in using buffers/blocks of 4k”, “no matter how small your string is”, “so keep the ‘needed’ things close together and try to fit them in as few pages as possible”.

Average Joe Six Pack the kernel developer will simply translate that to: try to avoid page faults.

I started writing an experimental new summary storage engine which will in future be used by Tinymail. This one will store duplicate strings uniquely, will sort the strings on reference count, will store lists of addresses (the to and cc fields) in a sorted way (in the hopes of creating more duplicate strings this way) and finally to speedup the writing it will store blocks of summary data rather than all summary of a folder in one big file. The reason for the blocks is that a summary is relatively read only and usually only grows in size. (ps. Desrt is the cool guy who came up with the idea of sorting the lists of addresses)

Next to all these improvements, it’ll have a few new features like out of order adding and expunging of items using both the UID and the sequence number of the item. The reason for this flexibility in the API is that modern IMAP servers more or less require you to look them up using both the UID and the sequence number while handling EXPUNGE, FETCH and VANISHED responses.

These are the results of a summary with 50,000 completely unique summary items (each string in each field of the summary item is unique). It’s a worse case scenario. Everybody probably understands that a large amount of strings in the summary view of his E-mail’s INBOX are duplicates. Right? The strings where Around 20 – 30 bytes each. This is a low average, most E-mails have larger strings. (But most blog items have smaller texts, I know, patience, I’m almost finished)

Mmap file's size: 13 Mb

VmPeak:    22076 kB
VmSize:    22076 kB <-
VmLck:         0 kB
VmHWM:      7604 kB
VmRSS:      7604 kB <-
VmData:     7084 kB
VmStk:        88 kB
VmExe:        16 kB
VmLib:      2152 kB
VmPTE:        16 kB

We can see the large VmSize (large, as we expected, since the mapped file is 13 Mb in size). Interesting the VmRss is just around 8 Mb. These eight megs of mostly heap and stack is being used by pointers that point to the data in the mapping, hashtable nodes and admin info like the reference-count integer of the items. I did the measurement before touching any of the items's data. The kernel has therefore effectively not paged in any of the mapped file's data (on demand paging). This memory, however, is what I call "must have": you wont ever get rid of those eight megs with a folder that has 50,000 items loaded. If after those the VmRss grows caused by pages from the mapped file, when memory availability gets sparse your other applications can get it back from the kernel (depending on how active you are using the E-mail client's summary data at the moment, of course).

You can get the experimental summary store here (it's attached to the mail).