On desktop data services

A few days a go a few students asked me to help them with designing a better EDS. I know I’m going to be hated for doing this blog item by some people (because, well, I’m pointing to some flaws in the architecture of some components. People don’t always want you to identify flaws).

Although I’m not very focused on calendaring and other things related to what EDS offers, here’s my try on the subject:

Evolution data server will via the notify_matched_object_cb in the ECalData lib issue a notifyObjectsAdded for all matched objects that your query (ECalView) wants. It seems it’s not doing that for just the ones that became recently visible.

+ e_cal_view_start
|- CalView_start (goes over the IPC)
| - impl_EDataCalView_start
|  + foreach: notify_matched_object_cb
|  |- notifyObjectsAdded (goes over the IPC)
|  | - impl_notifyObjectsAdded
|  |  - g_signal_emit "objects-added"
|  + end foreach
+ end e_cal_view_start

After that you will also receive notifications when an item that matches your query gets removed, changed, added, yadi yada.

Although that sounds reasonable, on a desktop, there’s a problem with this on mobiles: Unless you limit the query to exactly what you will see on the view, you’ll needlessly transfer a lot of iCal data over the IPC and worse, you’ll need to store it in the memory of the user interface (the client).

Depending on the backend implementation of EDS, this means that it’s in memory twice. Given that EDS is a locally running service, that’s a little bit stupid (if it would be a service running over a slow GPRS connection, I would better understand the need to fully cache everything in the client).

Another reason why you want to keep the memory at the service, is that the service is the centralized infrastructure. All clients using it, share the same memory. If your clients always need their own copies, you are effectively doubling all memory consumption for calendaring.

Although the user will do queries that are much larger (like: give me all items of this month), a mobile device’s view will most often display only a few calendar items. Which of course makes a good developer think about the other ones: do you really need them in memory at all times? Or is a proxy at the client good enough? A proxy that will get the real thing from the service, by asking a factory, at the time the user starts using it.

Therefore wouldn’t for example a cursor style remote API be much better? The model of the view would get the currently visible one by simply iterating to it, and only then getting it.

A cursor is quite simple and looks a lot like a C++, Java or .NET iterator indeed:

c = create_cursor (expression)
c.move_next, c.move_prev
c.get_current

The view would get a model that utilizes that cursor efficiently. For example: the view asks the model for the 100th item, the current position of the model is 80. So the model will do a c.move_next 20 times and then give the view c.get_current. Finally the view unreferences the instance as soon as the item is not visible anymore.

That iterator doesn’t have to be implemented using only remote calls. It can be emulated by storing the query result as long as the cursor is kept alive (or let it die on timeout or something) in the service, and implementing a get_current that takes a query id and an “nth” parameter. The move_next and move_prev are implemented locally (just keeping a “current nth” or position status as an integer).

Is this slow? Probably will the experience for the user be a lot faster than having to initially download the entire result-set of a query. It’s true that the performance would be slow when a lot items are visible: that’s because a lot c.get_current calls would happen. But then again, most mobile devices have small screens and therefore can’t display a lot calendar items in a meaningful way to the user.

Also, as a solution for that, you can make a proxy that has the first 10 characters of each once received item’s description cached. The model can now instead of returning c.get_current, return a proxy. The view can once the item gets invisible clear the real from the proxy. If the view is set to display a lot items, it would only ask for those first 10 characters: the proxy would only the first time need to get the real to fulfill that API. Zooming in, though, would make the view asks the proxy for information that it doesn’t necessarily have (any more), so the proxy would ask the model (or a factory) to do a c.get_current (getting the real) to fulfill the interface of the type for the view.

But really. Instead of an implementation like EDS, both KDE and GNOME experts should stick their heads together and create a D-BUS specification for this. Perhaps one that copes with that cursor idea?

Clearly, both teams are most likely not going to agree on sharing one implementation soon.

I see frightened people screaming and yelling after I just said that. That’s not necessary. See, guys, dear users, we developers do talk with each other at conferences. We love each other! We love competing! Competing makes both sides better and sharper. Don’t you sometimes do friendly competition with your partner?

With a good specification, we could (and eventually would too) compete on implementation. It’s like agreeing on the rules of a game of Pool with your partner. Or bowling.

That is why I told those students to focus on a very good D-BUS spec. Perhaps do a proof of concept initial implementation to proof test your new D-BUS protocol?

Doing things in parallel, downloading messages while getting summary

A little bit more technical … some people like that, others don’t.

Today I did a cute hack on the embedded camel of Tinymail, camel-lite: I altered the camel_folder_get_message implementation in such a way that it would create a new CamelImapStore instance.

The CamelImapStore is a type that derives from CamelService and holds the connection lock. It also has the pointers to two CamelStreams who represent the access to the socket filedescriptor. That is your connection with the IMAP server. The CamelStream abstracts away SSL and yadi yada but the principle is the same: it’s the store that can only perform one procedure simultaneously in Camel (and therefore, also in Evolution).

In Camel this meant a lot of locking. Regretfully isn’t the IMAP implementation very fine grained in its locking (and actually, it sucks a little bit). Nor does the IMAP implementation do pipelining or any other such neat tricks. It’s a simple “lock, send query and fetch result, unlock”-concept put in practice. I have broken up some procedures, like getting the summary, into smaller queries: by looping until I have all of the summary. During that loop, the locks get unlocked. A get-message would therefore, in theory, get a chance to occur while the loop is happening in another thread.

That theory actually does work in practice. However, it was a little bit difficult to get it to behave absolutely correct. On top of that is Camel’s “design” far from perfect. Therefore in stead of endlessly trying to get it correct, I decided to make the decision and do a proof of concept that basically creates a new connection each time you download a message and store it locally in the cache.

The final idea for all this is to have a flexible queue mechanism that, for E-mail clients that want this functionality, will in the background download (new) messages while getting summary in parallel. While if the user clicks a message, while summary is being received or after it, the queue will get a high priority item added that will first download the clicked message and display it in the message view component.

I know that this is the core of a lot of E-mail clients. It’s exactly what I want tinymail to provide within the framework, as yet another component.

Next to that I will also implement a folder observer that will act on Push E-mail events by putting the request for getting the new E-mail on the queue. All of this will of course be optional behavior: on a GPRS network you specifically don’t want to retrieve all (new) messages. That would consume shitloads of bandwidth and would cost you a lot of money. But before going offline, you might want to ask your E-mail client to do indeed get all the messages and put them in the offline cache? While it’s doing this, you still want to work normally. And why not? Exactly. That’s why the second connection proof of concept was done.