E-mail metadata, “E-mail as a (desktop) service”

Not on the desktop but on mobiles I think the era of E-mail clients will soon be over. Just like the era of filemanagers will be over. A person who’s using a mobile or a phone doesn’t really want to start and stop applications. Those users don’t start and stop applications to receive phonecalls and text messages. Why would they want to start and stop E-mail clients?

Not only that. People also want their text messages, E-mails, history of calls, meetings, contacts and photos to be integrated.

When we search for a meeting, we want to find the photos we took at that meeting. We also want to find the contacts that were at that meeting. We want to find the invitation E-mail and all the replies to the invitation. And when we select a contact, we want to see a tree of E-mail discussions that we once had with the contact.

On a mobile we don’t want one big application that does all this. Instead we want all applications to integrate with all this information. And we want it to be very easy for application developers to integrate with this system. An application framework.

This will be the purpose of Tracker on mobiles.

This means that the concept of an E-mail client will eventually be moved to the background of the mobile device. E-mail must just be there, not started. Something must communicate with the IMAP server whenever needed. Meanwhile all applications on the mobile need to have easy access to the metadata of the E-mails. It must be easy for them to get a MIME part of an E-mail, perhaps as a InputStream?

The combination of E-mail metadata querying and handling of the E-mail’s MIME parts is what I refer to as “E-mail as a service”. Some people in the past tried to explain me that if they would just put JavaMail’s API over D-Bus that this would already solve “E-mail as a service”. I don’t think this is true. Camel, which had its API based on JavaMail, offers a truly weak query interface. It’s for example not possible to ask for MIME parts that are images that have a specific Exif tag set.

That’s of course because it’s not Camel’s purpose to do metadata indexing of the attachments. But that was yesterday. Yesterday was boring.

With modern IMAP servers that have the CONVERT capability it will be possible to ask for a converted MIME part of an E-mail. Converted to Exif plain-text data. Meaning that we don’t have to fetch 5MB of JPEG data just to read a few hundred bytes of Exif metadata.

Meanwhile normal IMAP servers already offer ENVELOPE and BODYSTRUCTURE which of course gives us a lot of metadata too.

To assist people who want to write a “E-mail as a service” D-Bus service today I have decided to write a document that explains some of the capabilities of modern IMAP.

I think the future of E-mail “infrastructure” lies in:

  1. Using an RDF store that can be accessed using SPARQL, like Tracker. This stores the ENVELOPEs and BODYSTRUCTUREs of your E-mails next to the attachment’s other metadata triples. The query language can then be used to query against metadata found by analyzers like Tracker’s own extractors and/or Strigi’s StreamAnalyzer as well as metadata coming from IMAP itself.

    People who saw my presentation at FOSDEM already know that we are planning to push Tracker in the direction of SPARQL + Nepomuk as ontology. Meanwhile we are in discussion with the Xesam and Nepomuk people to change the Nepomuk Message Ontology to be suitable for this. As a result Evgeny Egorochkin made this proposal.

  2. Having a small service for dealing with E-mail specific things. Like getting the contents of the MIME parts as streams. Requesting them to be downloaded if they aren’t cached locally yet. Requesting a CONVERTed version of them.

    There are some experiments happening that will implement this capability. It’s all still very early. If I ever start a Tinymail 2.0, I will probably make it focus primarily on this.

I don’t think the conventional E-mail client will survive for very long. Especially not on mobiles where “integration” is far more important for the end-user.

Every (mobile) application can soon become as capable of handling E-mail as what we today call “the E-mail client”. At least from the point of view of the user. In reality a desktop service will solve the hard stuff.

11 thoughts on “E-mail metadata, “E-mail as a (desktop) service””

  1. Philip, you’re a bit selective in what you think users care about — they may indeed not care about e-mail clients, but you think they will care about what, say, CONVERT will bring them? As with many of the newer IMAP extensions, I don’t see much demand for that.

    About the meta data – I personally think the indexed contents of e-mail message is the most interesting part. I see there is now at least ‘plainTextMessageContent’. Will I be able to search for messages by means of a couple of (not necessarily adjacent) words? That seems more useful than search for Exif data of pictures I don’t have yet…

  2. @dirk: But you do have them (the images), they are just stored on your IMAP server. That means once found using a SPARQL query, the app that wants to display it can with an easy request to the desktop service (mentioned in #2) get it either in the original format, or in a converted format. For example if the phone’s LCD is not very big, there’s no point in requesting the 5MB-full-resolution version of the image. If the device has no Word File viewer, and you are trying to access a Word Document attachment, it can request a PDF version of it (if the IMAP server supports that conversion action). Maybe we can one day even use CONVERT to fetch just one page of a document that is an attachment of an E-mail? Convert Page 3 of a Word Document into a PDF, and send it to me BINARY.

    The user itself will obviously don’t care about whether or not IMAP’s CONVERT is used. The user cares about accessing his data for viewing purposes. Neither does the user care about having to install all sorts of softwares on his mobile in order to display strange document formats X and Y.

    The reason why IMAP’s new extensions are not yet being used a lot is because of the chicken & egg problem: IMAP server developers don’t always implement it because most E-mail clients are written crappy, and therefor wont use it anyway. And most E-mail clients don’t consume it because most IMAP servers don’t offer it.

    But saying that because of this we can’t solve the problem smells a little bit too much like defeatism to me.

    Which is why I added the section “Controversial or drastic solutions” in the document about E-mail metadata fetching: let the company who sold you the mobile device provide proxy IMAP servers that are fully capable. Use that to solve the POP paradox (POP not being designed for more-than-one E-mail client) and the bad-IMAP-servers’ problem. And then develop modern E-mail clients that consume the proxy IMAP server.

    I’m not being paid by Isode, but I know some of the guys who work there indeed (no need to hide that I do, I do). But they have a product that scales to hundreds of thousands of simultaneous users and that implements IMAP + POP proxying while delivering probably the most modern IMAP server available (in terms of both capabilities and performance).

    Knowing them means I’m biased (I agree). But it’s also a matter of practical & pragmatic availability of a product that just matches the use-case and solves the problem.

    About the nie:plainTextContent field: we plan to index that indeed. That’s a matter of fetching all text/plain MIME parts of each message and storing that in the RDF store. This is definitely among the things we want E-mail clients and/or engines to store into Tracker’s Nepomuk-based RDF store. The field is also already queryablewith SPARQL in our experimental branch of Tracker (so if you store it, you can use it already, today).

    You talk about demand for conversions of E-mails. Well, check out MoMail: http://www.momail.org/ . They are a successful company by solving a problem with E-mail on mobile devices. They convert the E-mails (at their server) into a format that your mobile device’s E-mail client can handle. It’s a pragmatic solution, but they are successful doing this. So there is demand for this.

  3. I think it looks very logical to have “email as a service”, bringing the “information” stored on the mailbox available to different applications, attachments info included.
    People wants to have access to the personal information, no matter if it is on the IMAP server or locally.

    Keep up going with your hard work Philip!

  4. @Philip: thanks for the reply. I’m happy to hear there’s going to be content indexing.

    About the other things, well, sure with 5 10E+9 people on this planet, there will be demand for just about anything (like garden-gnome-shaped toothbrushes)… but you have to ask how big the demand is.

    There is *some* value in selectively downloading attachments – but this CONVERT business… your examples use the kind of special cases where your phone can view PDFs and not DOCs, or has a very small display yet want to view pictures yet don’t have the bandwidth. I guess before you have convinced people they need CONVERT, phone capabilities have increased enough to make it unneeded, with maybe some very small niches who would have a use for it.

    Anyway, proof me wrong :) it’s interesting technology, which is nice just because of that. So good luck with it — no defeatism, just some healthy skepticism. Maybe it’s just too cold here :)

  5. All this “index everything client-side” trend is great an all, but seriously doing this on a phone, considering our average mailstore size (and I’m sure my users aren’t the worst offenders here) does not make much sense to me.

    The benefits of what you describe are obvious to me from the user’s functionality POV, but trying to index 10’s of GBs of emails spread over 10’s of thousands email in dozens of folders … and all that over a pay-by-the-byte phone connection … WTF??

    Even skipping all the biggest binary blobs (and indexing only its metadata), and converting things to text only for document formats, it’s going to be an absurdly big amount of data to index for any email heavy user … which IMHO would be the most eager to get such functionality in the first place.

    Then doing just the same thing on the 3 desktops and 2 smartphones where a user might have configured his IMAP account … isn’t all this calling for a better server side IMAP search extension?

    As a side note, this resembles a lot the inability of desktop metadata extractors such as tracker to play nice when you try to deploy it in environments where you have a multi TB shared file store that you’d wish to index for a random number of desktops … except that users with multiple GBs of email are way more common.

    IMHO, all this metadata indexing things seem to be designed to not scale past the “bunch of emails” kind of user, or the “bunch of personal local files” kind of user. Which is sad as it leaves any corporate deployments out of the question.

  6. I hate email clients on the desktop too. I want what you describe on my desktop and latptop as well. I want all my data stored as I decide though, not hidden in ~/.evolution, I want it folders organised how I like it, for eg a Maildir at ~/comm/mail/foo@bar.com/INBOX or an mbox at ~/comm/mail/archives/bar.com/2007 or an Outlook .pst at ~/comm/mail/archives/ancient-uni-shit.pst

  7. I don’t want to have a framework that hides the fact that someone sent me a .DOC file. In the end, everybody has to use a toolchain of huge IMAP severs that try to do everything and you can’t access anything easily without having Tracker and everything as an e-mail client.

    Why not assert that the current IMAP server features is all you have and work from there? As another person already pointed out, having fast full-text search of a remote IMAP account is what most people want to do. Maybe have an option to search for all mails containing JPEG files. But don’t try to do everything.

    Do not over-engineer things. Make the simple things work tomorrow – don’t make the really hard things (Exif tags searching? – c’mon!) work “maybe” in three years. After the simple things work, you can always plug exiftool on top of the indexing engine and get the desired effect.

  8. @Angel Marin: Just downloading the metadata of those very large E-mail folders isn’t going to require even ten or twenty megabytes, let alone gigabytes.

    @thp: It’s not very complex and it works already in our experimental branch of Trunk. If you just device a way to collect the metadata in an efficient way. With a modern IMAP server that has the CONVERT capability, you already have that efficient method.

    I don’t know what you mean with ‘hiding the fact that’. What I describe is of course mostly for mobile E-mail integration. You can still use your normal E-mail client to access your normal IMAP server too. Nobody is planning to make Tracker a required dependency for accessing your E-mails. That would of course be very silly.

  9. i tend to agree with the poster who suggests having a far more efficient search engine for the IMAP server … then any device or (uh oh) client can request a search and display the results as it sees fit … the huge indexes and such are kept on the server where gobs of storage and processing can live happily AND when the same user connects from different devices the searches would already exist in the server cache thus enabling the user to move seamlessly across devices without losing a train of thought so to speak

  10. @lauren: but IMAP doesn’t have a very powerful search engine specified. And if you specify one now it will take at least two or three years until only one IMAP service provider will have it, and another two or three years before a majority of the IMAP service providers have it.

    That’s a bit too long for a lot of devices, who want to ship within a year or two maximum.

    The process of getting an agreement and an RFC at the level of IMAP would also take up a few months. But note that IMAP’s ESEARCH has been agreed and specified. It still lacks a few important things, though. But it’s an improvement over IMAP’s SEARCH.

    Note that a local index is not very expensive nor very big at all. And note that IMAP can’t store search states. So a new connection to the IMAP server must start its searches from scratch. I also don’t think that you’d get much agreement among the IMAP server developers if you’d specified stateful searches in IMAP. In the end you need to convince them if you want to see it being implemented. Else your new capability will just be yet another one that only your IMAP server will have (if you develop one, else no server will have it).

    And finally you can’t include the E-mail results in a search whenever you are offline this way. The user might want to know about E-mails that match before going online (for example as soon as the user wants to display the E-mail, after finding it using his just-metadata matching query). When consuming an IMAP SEARCH service, you can’t do this unless you require the user to go online each time he does any query. So eventually you’ll need a local index for these kinds of use-cases anyhow. And there are plenty of such use-cases (maybe you don’t need them, but just yours aren’t representative for all users’ use-cases).

  11. Indeed.

    With Akonadi as the framework for actually accessing, transferring and caching data, any client can show and manipulate all kinds of PIM data without having to know the gory details of accessing all the various storage systems.

Comments are closed.