FYI, the bodystructure parser now in a SVN repo

I just added the bodystructure parser that I mentioned a few hours ago to a Subversion repository. I also fixed a bunch of parsing mistakes and figured out a few of the unknown fields.

Edit

I just added error checking. The parser now supports recursive bodystructures for multipart/*, application/*, text/*, message/rfc822, audio/* and image/*. It will parse as much fields as possible (I think I’m only missing two or three fields right now, they seem the be always NIL anyway). Fields include the parameters, content disposition, octets, lines, encoding, description, language, …

BODYSTRUCTURE, a full parser in C

By requesting the BODYSTRUCTURE you can get a preview of the structure of a message without having to download it. Most E-mail clients parse this structure to the point of what they need from it. I did some searching and as I expected did few of the opensource E-mail clients do a complete parse. Some just scanned for some words, others did a reasonably well job but skipped information that they are simply not interested in. And that’s fine, if you don’t need it.

However

Soon will CONVERT be unleashed by the IMAP server developers. CONVERT is very interesting for high latency networks, mobile devices with limited display capabilities and other situations where you want to convert something in the E-mail to something else. A perfect example is scaling down the dimensions of an image that is embedded in a HTML message.

At the Lemonade interop that we did this week I met Roger Grönberg. The company where Roger works has an interesting but quite different solution for this. To me it’s interesting because it shows how wanted such a feature is on mobiles like phones: at the SMTP server first they somehow tag each image, then they rewrite it in the E-mail by scaling it to the perfect size for your mobile (they store your mobile’s capabilities at their SMTP server), and then the SMTP server delivers it to their IMAP server. Your phone’s standard E-mail client will now get a nicely formatted E-mail. Because they tagged it they’ll revert this change when you forward the image to somebody else. So they find the original image back, and correct the forwarded E-mail with that original image.

This approach of course implies various difficulties and changing the content of the E-mail at the location of the IMAP server. Although the server can still recover from this change, of course. With Lemonade we are, however, interested in making this possible without needing such tricks.

I started my blog item with BODYSTRUCTURE, didn’t I? Let me get back to that. You can probably imagine that if you want the IMAP server to instruct converting a MIME part to something suitable for your situation, that you’ll need to have a lot of information about the original MIME part before you start converting. Right? In case of a bitmap you probably want to know about the dimensions of the original one, so that you can ask CONVERT to convert while maintaining the aspect ratio of the image.

Well. For that we need a full BODYSTRUCTURE parser. Especially the content’s parameters are needed. I know I should probably have used sexp for parsing the S-expression like structures and I know that I probably got a bunch of things wrong. I even know that this current version leaks memory. For now it’s the parsing that matters most. Next on my list is hardening it against Courier-like IMAP servers. With some versions of Courier, you never know what you’ll get.

I prepared the sample so that you can just run “./test_app.sh”. That script will fetch 10,000 BODYSTRUCTUREs from my test IMAP server and then it will start parsing them.

I hope other E-mail client authors will test this stuff and point me to bugs, send me patches. Perhaps use it in their E-mail clients as soon as it’s actually usable. If somebody wants to convert this away from glib, to no-strings-attached pure C with POSIX, or C with the Linux kernel’s library then that’s fine and cute, send me the patch.