SPARQL, Nepomuk, StreamAnalyzer and Tracker

We at the Tracker team should in my opinion report more often in our blogs about our progress on things like Tracker’s SPARQL and Nepomuk support.

Let’s start with the awesome SPARQL support that Jürg has been working on. Just a few minutes ago when you made a SPARQL query that had a unknown predicate, Tracker returned an empty array over D-Bus.

dbus-send --print-reply --dest=org.freedesktop.Tracker
   --type=method_call /org/freedesktop/Tracker/Search
   org.freedesktop.Tracker.Search.SparqlQuery
   string:'SELECT ?title WHERE { ?s nie:ttle
      ?title FILTER regex(?title, ".*in.*") . }'

method return sender=:1.66 -> dest=:1.98 reply_serial=2
   array [
   ]

Leaving you in the unknown about your query being in error. Jürg fixed this and now you get something like this instead.

tracker-sparql --query="SELECT ?title WHERE { ?s nie:ttle
      ?title FILTER regex(?title, \".*in.*\") . }"
Could not query search, Unknown property `http://.../nie#ttle'

This way you can fix your query’s error and do something like this instead:

tracker-sparql --query="SELECT ?title WHERE { ?s nie:title
      ?title FILTER regex(?title, \".*in.*\") . }"

  The final metadata solution
  Tracker in gnome bugzilla

Today I migrated the code in Tracker that implements support for the metadata D-Bus API for E-mail to the Nepomuk Message Ontology. Meaning that Tracker will store the metadata it receives from E-mail clients like KMail and Evolution using the NMO ontology and that it’ll make this metadata available to the SPARQL query engine.

Great news that we got informed of this week is that a developer has started implementing the metadata D-Bus API for E-mail in Thunderbird. He left a pointer to his git repository on the wiki-page.

Meanwhile I have implemented the API in KMail. This patch is pending review. We are planning to add support for this in Modest soon too.

Next. We are migrating the indexers and extractors to Nepomuk. These tasks come with all sorts of extra work related to integrating with Nepomuk as ontology.

I have also implemented integration with Strigi’s truly awesome StreamAnalyzer. I have rarely seen such a beautifully designed piece of code that in my opinion outperforms whatever Tracker has at this moment for extracting metadata in several interesting ways.

I don’t know why we shouldn’t join Strigi on making StreamAnalyzer kick ass. I can find no reason why instead of trying to compete with it we shouldn’t integrate with it. I’m pushing our team to consider the integration option and so far they are enthusiastic about it.

StreamAnalyzer needs a migration from Xesam as ontology to Nepomuk. But Evgeny Egorochkin and Jos Vandenoever already told me that they have put this on their agenda. After that, with the integration that I did for Tracker, can StreamAnalyzer become the core analysis code that Tracker uses. Right now the plan is to let StreamAnalyzer be the first to run and then letting Tracker’s own extractors follow up.

Let’s make some more bridges with KDE projects. Why not!

E-mail metadata, “E-mail as a (desktop) service”

Not on the desktop but on mobiles I think the era of E-mail clients will soon be over. Just like the era of filemanagers will be over. A person who’s using a mobile or a phone doesn’t really want to start and stop applications. Those users don’t start and stop applications to receive phonecalls and text messages. Why would they want to start and stop E-mail clients?

Not only that. People also want their text messages, E-mails, history of calls, meetings, contacts and photos to be integrated.

When we search for a meeting, we want to find the photos we took at that meeting. We also want to find the contacts that were at that meeting. We want to find the invitation E-mail and all the replies to the invitation. And when we select a contact, we want to see a tree of E-mail discussions that we once had with the contact.

On a mobile we don’t want one big application that does all this. Instead we want all applications to integrate with all this information. And we want it to be very easy for application developers to integrate with this system. An application framework.

This will be the purpose of Tracker on mobiles.

This means that the concept of an E-mail client will eventually be moved to the background of the mobile device. E-mail must just be there, not started. Something must communicate with the IMAP server whenever needed. Meanwhile all applications on the mobile need to have easy access to the metadata of the E-mails. It must be easy for them to get a MIME part of an E-mail, perhaps as a InputStream?

The combination of E-mail metadata querying and handling of the E-mail’s MIME parts is what I refer to as “E-mail as a service”. Some people in the past tried to explain me that if they would just put JavaMail’s API over D-Bus that this would already solve “E-mail as a service”. I don’t think this is true. Camel, which had its API based on JavaMail, offers a truly weak query interface. It’s for example not possible to ask for MIME parts that are images that have a specific Exif tag set.

That’s of course because it’s not Camel’s purpose to do metadata indexing of the attachments. But that was yesterday. Yesterday was boring.

With modern IMAP servers that have the CONVERT capability it will be possible to ask for a converted MIME part of an E-mail. Converted to Exif plain-text data. Meaning that we don’t have to fetch 5MB of JPEG data just to read a few hundred bytes of Exif metadata.

Meanwhile normal IMAP servers already offer ENVELOPE and BODYSTRUCTURE which of course gives us a lot of metadata too.

To assist people who want to write a “E-mail as a service” D-Bus service today I have decided to write a document that explains some of the capabilities of modern IMAP.

I think the future of E-mail “infrastructure” lies in:

  1. Using an RDF store that can be accessed using SPARQL, like Tracker. This stores the ENVELOPEs and BODYSTRUCTUREs of your E-mails next to the attachment’s other metadata triples. The query language can then be used to query against metadata found by analyzers like Tracker’s own extractors and/or Strigi’s StreamAnalyzer as well as metadata coming from IMAP itself.

    People who saw my presentation at FOSDEM already know that we are planning to push Tracker in the direction of SPARQL + Nepomuk as ontology. Meanwhile we are in discussion with the Xesam and Nepomuk people to change the Nepomuk Message Ontology to be suitable for this. As a result Evgeny Egorochkin made this proposal.

  2. Having a small service for dealing with E-mail specific things. Like getting the contents of the MIME parts as streams. Requesting them to be downloaded if they aren’t cached locally yet. Requesting a CONVERTed version of them.

    There are some experiments happening that will implement this capability. It’s all still very early. If I ever start a Tinymail 2.0, I will probably make it focus primarily on this.

I don’t think the conventional E-mail client will survive for very long. Especially not on mobiles where “integration” is far more important for the end-user.

Every (mobile) application can soon become as capable of handling E-mail as what we today call “the E-mail client”. At least from the point of view of the user. In reality a desktop service will solve the hard stuff.

A generic E-mail metadata D-Bus API

A generic E-mail metadata API

You guys remember the Evolution Metadata API? No? I wrote about it a few days ago.

We now want Tracker to integrate with the KDE desktop too. I started implementing the D-Bus API in KMail. I have adapted the specification to become more generic and to split out specific pieces into a shared and a specific part. Like for example the ontology.

QT/C++

Yes, I’m doing some QT/C++ coding after three years of not having to touch a lot of C++. But I’ll manage. I mean, it can’t get worse than Python, PHP or Perl, right? I survived developing with those languages too and it’s not my first time that I had to touch C++ code. Also, I survive GObject without always having Vala available.

Well, QT/C++ is interesting. Firstly I refuse to call it normal C++, it’s not normal C++ at all. That’s not necessarily a bad thing about it, though. Secondly on top of the inhuman quagmire that C++ has become by itself, QT adds almost as much boilerplate and macros as GObject is doing on top of C

That’s not good.

In my opinion C++ is not a nice language at all. It’s a completely inconsistent quagmire in my opinion.

Freedesktop.Org

I’m not immediately proposing it as a Freedesktop.Org standard. My personal experience with F.D.O. is that unless you have a few implementations nothing much will happen with your proposal. At least not for several months. Maybe if I first finish the implementation for KMail while having the one for Evolution ready it’ll speeds things up?

Let’s first implement it for both E-mail clients.

Akonadi

This is a message for specifically Aaron Seigo, because I know he would otherwise be frantic about it.

I have been talking with the KMail developers and because KMail’s Akonadi integration will take too long to be finished we have agreed to start with this D-Bus API. We will most likely integrate Tracker with Akonadi as soon as it starts becoming consumable on people’s KDE desktops.

Birds will sing, the sun will shine. Don’t worry too much and especially don’t start punching the project teams who are actively but pragmatically trying to make things better.

We’ll get there but converting Evolution to use Akonadi is trying to take too large steps and is therefor not a useful proposal at this moment. While for Tracker, Akonadi needs to be ready *now*.

I propose that the Akonadi people develop a CamelProvider that talks with Akonadi, by the way. Then make an EPlugin that overrides the account management to always create such Akonadi accounts and develop working and well tested migration code for the data.

I think even if this is ever started as a project that it would be quite unlikely that this will be finished in time. So not very useful for Tracker right now.

Utilitarianism

Introduction

In a discussion some concluded that technology X is ‘more tied to GNOME’ than technology Y because ‘more [GNOME] people are helped by X’ due to dependencies for Y. Dependencies that might be unacceptable for some people.

This smells like utilitarianism and therefore it’s subject to criticism.

Utilitarianism is probably best described by Jeremy Bentham as:

Ethics at large may be defined, the art of directing men’s actions to the production of the greatest possible quantity of happiness.

— Bentham, Introduction to the Principles of Morals and Legislation

A situational example that, in my opinion, falsifies this:

You are standing near the handle of a railroad switch. Six people are attached to the rails. Five of them at one side of the switch, one at the other side of the switch. Currently the handle is set in such a way that five people will be killed. A train is coming. There’s no time to get help.

  • Is it immoral to use the handle and kill one person but save five others?
  • Is it immoral not to use the handle and let five people get killed?

The utilitarianist chooses the first option, right? He must direct his actions to the production of the greatest possible quantity of happiness.

Body of the discussion

Now imagine that you have to throw a person on the rails to save the lives of five others. The person would instantly get killed but the five others would be saved by you sacrificing one other.

A true utilitarianist would pick the first option in both exercises; he would use the handle and he would throw a person on the rails. In both cases he believes his total value of produced happiness is (+3) and he believes that in both situations picking the second option means his total value of produced happiness is (-4) + (+1) = (-3). The person who picks the second option is therefore considered ethically immoral by a true utilitarianist.

For most people that’s not what they meant the first time. Apparently ethics don’t allow you to always say (+4) + (-1) = (+3) about happiness. I’ll explain.

The essence of the discussion

Psychologically, less people will believe that throwing a person on the rails is morally the right thing to do. When we can impersonificate we make it more easy for our brains to handle such a decision. Ethically and morally the situation is the same. People feel filthy when they need to physically touch a person in a way that’ll get him killed. A handle makes it more easy to kill him.

Let’s get back to the Gnome technology discussion … If you consider pure utilitarianism as most ethical, then you should immediately stop developing for GNOME and start working at Microsoft: writing good Windows software at Microsoft would produce a greater possible quantity of happiness.

Please also consider reading criticism and defence of utilitarianism at wikipedia. Wikipedia is not necessarily a good source, but do click on some links on the page and you’ll find some reliable information.

Some scientists claim that we have a moral instinct, which is apparently programmed by our genes into our brains. I too believe that genetics probably explain why we have a moral system.

The developer of X built his case as following: My technology only promotes happiness. The technology doesn’t promote unhappiness.

It was a good attempt but there are multiple fallacies in his defense.

Firstly, in a similar way doesn’t technology Y promote unhappiness either. If this is assumed about X, neither promote unhappiness.

Secondly, how does the developer of X know that his technology promotes no unhappiness at all? Y also promotes some unhappiness and I don’t have to claim that it doesn’t. That’s a silly assumption.

Thirdly, let’s learn by example: downplaying the amount of unhappiness happens to be the exact same thing regimes having control over their media also did whenever they executed military action. The act of downplaying the amount of unhappiness should create a reason for the spectator to question it.

Finally, my opinion is that the very act of claiming that ‘X is more tied to GNOME’, will create unhappiness among the supporters of Y. Making the railroad example applicable anyway.

My conclusion and the reason for writing this

‘More’ and ‘less’ happiness doesn’t mean a lot if both are incommensurable. Valuations like “more tied to GNOME” and “less tied to GNOME” aren’t meaningful to me. That’s because I’m not a utilitarianist. I even believe that pure utilitarianism is dangerous for our species.

To conclude I think we should prevent that the GNOME philosophy is damaged by too much utilitarianism.

Moral indulgence

In the last few days people seemingly implying a descent from superiority of moral highground to me, have called upon me (in private conversations) to decide for my readers if the content that I write is morally acceptable for planet.gnome.org. Their reasoning is that I should feel an implied responsibility for the content of that website.

If I don’t take the responsibility that readers have themselves already, I’m to be considered a coward. That’s because, according to these people, I avoid the moral responsibility to uphold an imaginary highground reputation of the organization behind said website.

It needs no illustration that this is just the opinion of a group within the GNOME community. Not the entire community. Nonetheless this seemingly moral superiority is not to be mistaken with a condescending circus show.

The moral of respect for other opinions is a meme that for the last decades (and I hope in future too) has been a very successful one. I consider this meme to be the most important one humanity ever got convinced of.

Moral superiors do not need to present empirical proof of correctness in their Sophia. The truth of their moral values are unquestionable.

Let’s assume this to be the case: it’s immoral to only assume that your readers will make up their own minds about ideas that appear on websites like planet.gnome.org. Instead, it’s a necessity that each and every author of a blog, from which planet.gnome.org pulls content, is required to have a “responsibility of content”.

I conclude that it isn’t necessary that the audience of that website gets an honest illustration of who we are: human beings who are sometimes geniuses and sometimes idiots.

Instead it’s necessary that we are portrayed as good role models. Concepts such as good and bad are of course defined by the superiors. Those concepts are unquestionable.

Let me be clear that I disagree with this.

I questioned whether only intent can either be good or bad, but that question was refuted as irrelevant. For it’s the beholder who matters. Not the producer.

The reason for this irrelevance being that an audience doesn’t take the responsibility of trying to understand intent. I disagree with this conclusion. I think the audience does understand intent.

I have decided to tag my future posts as “condescending” in case I feel the content might be interpreted as showing superiority. Don’t be surprised if the majority of posts will be tagged as such.

The freedom to choose is morally more important to me than the necessity to mark responsible content. Therefore I ask my audience, and planet maintainers, to decide for themselves.