April 2007 – How easy it is to make people believe a lie, and how hard it is to undo that work again

API docs at 100%

Today Tinymail’s API reference manual is covering 100% of the API. This means that there are no undocumented functions being exported by the Tinymail libraries, none.

A few days a go a few students asked me to help them with designing a better EDS. I know I’m going to be hated for doing this blog item by some people (because, well, I’m pointing to some flaws in the architecture of some components. People don’t always want you to identify flaws).

Although I’m not very focused on calendaring and other things related to what EDS offers, here’s my try on the subject:

Evolution data server will via the notify_matched_object_cb in the ECalData lib issue a notifyObjectsAdded for all matched objects that your query (ECalView) wants. It seems it’s not doing that for just the ones that became recently visible.

+ e_cal_view_start
|- CalView_start (goes over the IPC)
| - impl_EDataCalView_start
|  + foreach: notify_matched_object_cb
|  |- notifyObjectsAdded (goes over the IPC)
|  | - impl_notifyObjectsAdded
|  |  - g_signal_emit "objects-added"
|  + end foreach
+ end e_cal_view_start

After that you will also receive notifications when an item that matches your query gets removed, changed, added, yadi yada.

Although that sounds reasonable, on a desktop, there’s a problem with this on mobiles: Unless you limit the query to exactly what you will see on the view, you’ll needlessly transfer a lot of iCal data over the IPC and worse, you’ll need to store it in the memory of the user interface (the client).

Depending on the backend implementation of EDS, this means that it’s in memory twice. Given that EDS is a locally running service, that’s a little bit stupid (if it would be a service running over a slow GPRS connection, I would better understand the need to fully cache everything in the client).

Another reason why you want to keep the memory at the service, is that the service is the centralized infrastructure. All clients using it, share the same memory. If your clients always need their own copies, you are effectively doubling all memory consumption for calendaring.

Although the user will do queries that are much larger (like: give me all items of this month), a mobile device’s view will most often display only a few calendar items. Which of course makes a good developer think about the other ones: do you really need them in memory at all times? Or is a proxy at the client good enough? A proxy that will get the real thing from the service, by asking a factory, at the time the user starts using it.

Therefore wouldn’t for example a cursor style remote API be much better? The model of the view would get the currently visible one by simply iterating to it, and only then getting it.

A cursor is quite simple and looks a lot like a C++, Java or .NET iterator indeed:

c = create_cursor (expression)
c.move_next, c.move_prev
c.get_current

The view would get a model that utilizes that cursor efficiently. For example: the view asks the model for the 100th item, the current position of the model is 80. So the model will do a c.move_next 20 times and then give the view c.get_current. Finally the view unreferences the instance as soon as the item is not visible anymore.

That iterator doesn’t have to be implemented using only remote calls. It can be emulated by storing the query result as long as the cursor is kept alive (or let it die on timeout or something) in the service, and implementing a get_current that takes a query id and an “nth” parameter. The move_next and move_prev are implemented locally (just keeping a “current nth” or position status as an integer).

Is this slow? Probably will the experience for the user be a lot faster than having to initially download the entire result-set of a query. It’s true that the performance would be slow when a lot items are visible: that’s because a lot c.get_current calls would happen. But then again, most mobile devices have small screens and therefore can’t display a lot calendar items in a meaningful way to the user.

Also, as a solution for that, you can make a proxy that has the first 10 characters of each once received item’s description cached. The model can now instead of returning c.get_current, return a proxy. The view can once the item gets invisible clear the real from the proxy. If the view is set to display a lot items, it would only ask for those first 10 characters: the proxy would only the first time need to get the real to fulfill that API. Zooming in, though, would make the view asks the proxy for information that it doesn’t necessarily have (any more), so the proxy would ask the model (or a factory) to do a c.get_current (getting the real) to fulfill the interface of the type for the view.

But really. Instead of an implementation like EDS, both KDE and GNOME experts should stick their heads together and create a D-BUS specification for this. Perhaps one that copes with that cursor idea?

Clearly, both teams are most likely not going to agree on sharing one implementation soon.

…

I see frightened people screaming and yelling after I just said that. That’s not necessary. See, guys, dear users, we developers do talk with each other at conferences. We love each other! We love competing! Competing makes both sides better and sharper. Don’t you sometimes do friendly competition with your partner?

With a good specification, we could (and eventually would too) compete on implementation. It’s like agreeing on the rules of a game of Pool with your partner. Or bowling.

That is why I told those students to focus on a very good D-BUS spec. Perhaps do a proof of concept initial implementation to proof test your new D-BUS protocol?

Doing things in parallel, downloading messages while getting summary

A little bit more technical … some people like that, others don’t.

Today I did a cute hack on the embedded camel of Tinymail, camel-lite: I altered the camel_folder_get_message implementation in such a way that it would create a new CamelImapStore instance.

The CamelImapStore is a type that derives from CamelService and holds the connection lock. It also has the pointers to two CamelStreams who represent the access to the socket filedescriptor. That is your connection with the IMAP server. The CamelStream abstracts away SSL and yadi yada but the principle is the same: it’s the store that can only perform one procedure simultaneously in Camel (and therefore, also in Evolution).

In Camel this meant a lot of locking. Regretfully isn’t the IMAP implementation very fine grained in its locking (and actually, it sucks a little bit). Nor does the IMAP implementation do pipelining or any other such neat tricks. It’s a simple “lock, send query and fetch result, unlock”-concept put in practice. I have broken up some procedures, like getting the summary, into smaller queries: by looping until I have all of the summary. During that loop, the locks get unlocked. A get-message would therefore, in theory, get a chance to occur while the loop is happening in another thread.

That theory actually does work in practice. However, it was a little bit difficult to get it to behave absolutely correct. On top of that is Camel’s “design” far from perfect. Therefore in stead of endlessly trying to get it correct, I decided to make the decision and do a proof of concept that basically creates a new connection each time you download a message and store it locally in the cache.

The final idea for all this is to have a flexible queue mechanism that, for E-mail clients that want this functionality, will in the background download (new) messages while getting summary in parallel. While if the user clicks a message, while summary is being received or after it, the queue will get a high priority item added that will first download the clicked message and display it in the message view component.

I know that this is the core of a lot of E-mail clients. It’s exactly what I want tinymail to provide within the framework, as yet another component.

Next to that I will also implement a folder observer that will act on Push E-mail events by putting the request for getting the new E-mail on the queue. All of this will of course be optional behavior: on a GPRS network you specifically don’t want to retrieve all (new) messages. That would consume shitloads of bandwidth and would cost you a lot of money. But before going offline, you might want to ask your E-mail client to do indeed get all the messages and put them in the offline cache? While it’s doing this, you still want to work normally. And why not? Exactly. That’s why the second connection proof of concept was done.

The drugs did it

Gaphor is missing quite a lot of features to be called a usable UML editor … yet. Nevertheless I tried it. If you know how not to hate a work in progress, because you know people with passion are working on it, the tool is definitely worth a try.

You know … I just had such a moment where one little dude in my head nearly starved caused by the ultra high doses of code that I injected through my eyes straight into my brain. My Amygdala got emotionally worried and understood the problem immediately, so she (eu .. or he) started instructing my hormone factories to start making drugs so that I started to want to create a class diagram (you know, in stead of coding).

Yes, I realize I’m quite crazy if that happens.

But don’t worry, we are under control. Just a little bit intoxicated. Usually that doesn’t turn the individual into a virus writing terrorist. Although last few years you didn’t have to be guilty for it to happen, just a non-Western person, you don’t have to send a CIA plane to Belgium to pick me up yet. Trust me, I’m not dangerous. And hey, Belgium, the city in Brussels, is a Western country! I’m a Western! Don’t! I mean, com’n, Belgians are adorable. We make beer and chocolate! I can’t be guilty!

I don’t know what the diagram really is about. I just starting drawing some stuff because .. well I already explained. It turned out it looks like how you could design a mail user agent on top of tinymail. Because Gaphor is missing a lot of features, it’s missing a lot information.

Also, I stopped drawing because at this size, Gaphor started becoming slow on my dual p4 with 2 GB of RAM (that’s just amazing, how on earth do you get drawing ~25 rectangles to be slow on THAT machine?! People could go to Mars with far less! — bah, unfair comparison. I know –)

The class diagram in PNG

If you want it in Gaphor’s format, I will most likely create a wiki page on tinymail’s trac once it’s finished.

Merged folders, transport queues, searching and more

At some point it’s time to check whether your design did indeed do what it was supposed to do. Although there are a few discrepancies, it does. It turned out that implementing a TnyFolder was quite simple (I of course implemented the default one for tinymail, but this is the first one that is totally new).

The idea that I had to create was “merged folders”. There are a few reasons why an E-mail client might want to merge folders. Among the reasons are implementing one of IBM’s REMail ideas: not having folders anymore, but instead displaying everything as a flat E-mail account and labeling the messages with tags.

It’s also useful for visualizing search results. Although I haven’t yet focused on searching capabilities for tinymail, I know that I will get it as a feature request very soon. People usually want to search multiple folders.

To support searching multiple folders, this will be implemented by proxying the search method to the merged mother folders of a TnyMergeFolder. The application developer gets the result as a TnyFolder instance.

Right now, I created the merge feature for bringing together the sent and outbox folders of multiple TnySendQueues. A send-queue is an asynchronous queue for a transport like an SMTP account. It has its own sent and outbox folders. Mail user agents, however, want to display the two folders of each such queue as only two folders in the user interface. TnyMergeFolder can be used for this.

I know people are going to be confused now. It’s indeed not an account: it’s a queue. Think about it: an account represents something that connects to the SMTP server. A queue is code that embeds such an account, yes, but it’s a queue and not an account. It’s perfectly possible to have transport accounts that don’t require local queuing. It’s also possible to have a queue implementation that qualifies between different transport accounts depending on the current active network that got detected.

Can you always access your SMTP server on each and every network that you connect to with your mobile phone? Maybe will the GPRS network provider advertise these settings? In future maybe it’ll be an ACAP server?

Is it always a setting that is glued to your account? No it isn’t! Nearly all E-mail clients get this wrong, indeed.

Know what, back to the initial subject. Here’s a video demo showing the TnyMergeFolder feature. The code for this is in the tinymail repository already.

Video demo showing the TnyMergeFolder API feature

ps. We’ll have some very cool video demos of Modest on the N800 doing Push E-mail soon.

Proposal

Let’s make a GNOME OCR application on top of Ocropus. One where the user can select regions to scan for text and where those regions will translate to XHTML DIV tags that are relatively positioned right and where the user can select regions to simply copy as image. Doesn’t sound terrible hard, or does it?

After that, let’s rethink some of the printer uis and dialogs and/or integrate it with SANE a little bit: a lot of printers nowadays are so called ‘multifunctionals’: they combine a flatbed scanner with a printer and have fancy features like: scan to your computer, scan to a MMC card, scan and print (make a copy).

It’s a little bit silly that I right now have to scan to an MMC card, put that MMC in my N800 because Linux doesn’t support the MMC slot of my Laptop, wire things up with USB cables, copy the a file called SCAN0016.JPG to a folder, open it with GIMP and dissect it into regions and do other conversions to the image that might improve OCR detection, manually create an xhtml document and manually measure the positions on the original, manually put that into relative positions of the xhtml file, etc etc. I mean, these are all tasks that can easily be automated.

And now we finally have a reasonable good OCR library or framework as underlying engine for this.

I’m sure Google would love projects like this for their Summer Of Code, for example. No?

Anyway, the Google Ocropus thingy works on most normal texts. I just printed out a few documents with also some handwritten names and signatures on, scanned the prints in and did a OCR scan on the scans. The handwritten parts caused some discrepancies in the detection, but the vast majority of the text got detected right. With maybe a few a-s that turned into o-s (well, that document’s font was quite hard for those two characters indeed). I’m quite sure the library will improve.

The getting started page talks about going into a release directory, right? Well, the page isn’t very clear about it (yet): you need to get both tesseract-ocr and ocropus itself (which is explained in the “Downloads” tab of the site). That release directory is your “ocropus” Subversion checkout, it seems. Well, that worked for me. You’ll also need to install jam, libtiff4-dev, libaspell-dev. All the other stuff was already installed on my typical “gnome-devel”-prepared Edgy.

Guademy slides

Sergio Villar gave me his Guademy presentation about tinymail.

It’s in Spanish. If somebody makes a translation: send it to me (or to Sergio) and I’ll put it online.

Update: English translation by Arien.

Oh, I’m also back from Paris (fostel) yadi yada. It was very interesting and highly technical (which is the way I like conferences). I met a lot of very interesting people too. Guess I will be answering my E-mail of last week tomorrow.

Flattered

Andreas Proschofsky did an interview with Nat Friedman titled “Flamewars are part of the community culture“.

I titled this blog item “flattered” because, well. Read it. You’ll see. (thanks, btw)

Andreas .. hmm, is that the same reporter-dude who was in the same bungalow with me and MDK in Vilanova? *waves at Andreas*

Oh, other than being flattered .. I’m also packing for FOSTEL and a few days of Paris with Tinne after that. Last time I was in Paris I got very sick. Dear French people who are sick and in Paris: please don’t do this to me again.

M	T	W	T	F	S	S
« Mar				May »
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Month: April 2007