TreeModel ZERO, a taste of life as it should be

If bugmasters are allowed to blog wishlists, then developers should also be allowed to write them! Which is why I wrote my wishlist!

Gtk.TreeModel was in my humble opinion designed wrong. In API design should an interface be just one thing.

A little bit of history

Many framework designers have repeated this in the past. Two of the best framework designers that we have on this planet, Krysztof Cwalina and Brad Abrams from Microsoft, added the meme to one of their books. It would be unfair to only mention those two guys and not the other people at Microsoft, and before that at the Delphi team at Borland. Brian Pepin notes at page 83 of Framework Design Guidelines: “Another sign that you’ve got a well defined interface is that the interface does exactly one thing. If you have an interface that has a grab bag of functionality, that’s a warning sign.”

The problem

What are the things a Gtk.TreeModel are or represent?

  • It’s something that is iterable
  • It’s something that is an iterator
  • It’s apparently something that has columns, which should have been at the View’s side of the story
  • It’s something that can be a tree
  • It’s something that emits row changes

That’s not one thing, and therefore we have a warning sign. If I count it correctly that’s at least five things, so that’s a big warning sign.

I’m sure I can come up with a few other things that a Gtk.TreeModel actually represents. For example its unref_node and ref_node make me think that it’s a garbage collector or something, too.

This is absolutely not good. I believe it is what makes the interface shockingly complicated. Because none of those five things can be made reusable this way.

What I think would be the right way

A prerequisite for this, and presumably also the reason why Gtk+ developers decided to do Gtk.TreeModel the way they did it a few years ago, is a collection framework.

Sadly is this proposal being ~ignored by the current GLib maintainers. Understandable because everybody is overloaded and busy, but in my opinion it’s nonetheless blocking us from heading in the right direction.

There are by the way quite a lot of other reasons mentioned on the proposal. This is just one of them.

interface GLib.Iterable {
	Iterator iterator();
}

interface GLib.Iterator {
	bool next ();
	object current;
}

Next would be recursive iterators or trees. There are many ways to represent these, but I’ll just take a simple route. Remember that when picking an API design, the most simple idea is often the most right one. But yeah, you can probably improve this.

interface GLib.TreeIterable : GLib.Iterable {
	GLib.TreeIterable get_children ();
	int n_children;
	bool has_child (GLib.TreeIterable e);
	GLib.TreeIterable parent;
}

In Gtk+ we would have the view, of course. It would hold the columns, as it should be.

class Gtk.TreeView {
	int n_columns;
	GLib.Type get_column_type (int n);
	GLib.TreeModel model;
	Gtk.ColumnBinding binding;
	Gtk.TreeView (GLib.TreeModel m);
	GLib.ColumnBinding column_binding;
}

We don’t have guaranteed introspection in Gtk+. To do the binding between a column in the view and a property of an instance in the model we need some code. In Gtk.TreeModel this is the get_value function.

It shouldn’t be part of the Gtk.TreeModel: That way it ain’t reusable and will it require each person implementing a Gtk.TreeModel to reinvent the code.

abstract class GLib.ColumnBinding {
	abstract GLib.Value get_value (GLib.TreeModel model,
	                               GLib.TreeIterable e,
	                               int column);
}

Let’s have some concrete column bindings:

class Gtk.TreeStoreColumnBinding : GLib.ColumnBinding {
}

class Gtk.ListStoreColumnBinding : Gtk.TreeStoreColumnBinding {
}

If we do have introspection we can do the same thing .NET offers: Link up the column number with a property name that can be found in the type of the instances that the model holds.

class GI.IntrospectColumnBinding : GLib.ColumnBinding {
	void add_column (int column, string prop_name);
}

These wouldn’t change at all, except that they implement GLib.TreeModel instead of Gtk.TreeModel

class Gtk.TreeStore : GLib.TreeModel {
}

class Gtk.ListStore : GLib.TreeModel {
}

And then we are at Gtk.TreeModel, of course. Well just take everything that we don’t do yet. That’s the row change emissions, right? Personally I think rows are too specific. A model is something that can be iterated. Being iterable doesn’t mean that you have rows, it just means that you have things that the consumer, the view in a model’s case, can iterate to. Let’s call them nodes.

Gtk.TreePath sounds to me like serializing and deserializing a location. It’s nothing special, just a way to formulate pointing to a node in the tree. It’s the model that exposes this capability.

I’m not sure about flags. Maybe it should just be moved to Gtk.TreeView. I don’t get the point of the flags anyway. Both ITERS_PERSIST and LIST_ONLY sound like an implementation detail to me: not something you want to expose to the API anyway. But fine, for sake of completeness I’ll put it here.

interface GLib.TreeModel : GLib.TreeIterable {
	signal node_changed (GLib.TreeIterable e);
	signal node_inserted (GLib.TreeIterable e);
	signal node_deleted  (GLib.TreeIterable e);
	signal node_reordered (GLib.TreeIterable e);
	GLib.TreeModelFlags flags;
	GLib.TreePath get_path (GLib.TreeIterable e);
	GLib.TreeIterable get_node (GLib.TreePath p);
}

Who’ll start GLib 4.0? Let’s do this stuff while the desktop guys play with GNOME 3.0? Why not?

The impact of a highly improbable event

Being in free time mode today I decided to continue reading Nassim Nicholas Taleb’s Black Swan book. Nassim Taleb is about as arrogant as I am, so I’m enjoying reading his book a lot.

I for example enjoyed reading how he’s pissed at today’s academic philosophers for having become exercisers in linguistics rather than getting to the point of thinking. Nassim Taleb himself needed about 350 pages of text to basically say that the Gauss curve is useless in extremistan, but usually useful in mediocristan, and that Mandelbrot’s fractals are a little bit more useful for extremistan. But not really.

Anyway, he also wrote things that we should consider thinking about. For example: We no longer believe in papal infal­li­bil­ity; we seem to believe in the infal­li­bil­ity of the Nobel prize winners. That’s a good point.

One more chapter and I’m relieved of this book. Apparently I enjoy the distress of reading Nassim Taleb’s books.

He’s going to tell me in this chapter how to deal with these highly improbable events that have a great impact, referred to as black swans. I have to congratulate Nassim Taleb for succeeding making me a hyperskeptic, which was a highly improbable event. But then again, being a software developer I’m into bottom-up acquisition of knowledge. Which means that for me it’s more easy to be Fat Tony, than to be Dr. John. You need to read the book’s chapter 17 to understand Fat Tony. I was quite a skeptic before I started reading the book. But not (always) about the kind of things Nassim Taleb asks us to be skeptic .

Found this while surfing the internets

The Theory of Interstellar Trade. A paper by Paul Krugman, July 1978.

It should be noted that, while the subject of this paper is silly, the analysis actually does make sense. This paper, then, is a serious analysis of a ridiculous subject, which is of course the opposite of what is usual in economics.

Pat Condell on ultra tolerant liberal left people

Not watching youtubers very often I almost forgot about Pat Condell’s video blog. Today I decided to take a look at his latest video material.

Pat Condell is, just like me, an outspoken atheist who enjoys exercising his freedom of speech to criticize various religions. Fairly often he criticizes Islam.

Before I continue I’ll remind people that, like Pat Condell, I have nothing in particular against Islam. I don’t have anything against peaceful people in general. Christian, Muslim, atheist, Buddhist or whatever: I don’t care that much. I don’t believe any of those fairy tales, but it’s your freedom to do! I do care about it when, in for example Western countries, countless Christians try to expunge you from society because “you don’t believe in anything”. For many of them not believing is worse than believing in the wrong God, or being a Satanist, or being a sadist. I want to criticize religions and I want to stress the importance of having the right to criticize religions.

Pat takes on the ultra tolerant liberal left people in this video. Just like Pat I used to be on the liberal left. And just like Pat, because I believe in things like social justice, tolerance and respect, I am no longer on the liberal left. Here’s a quote from the video:

You people have certainly reminded me , as if I needed reminding, why my political views have changed in recent years. You see.. foolishly, perhaps, I used to take freedom for granted.

But now thanks to ultra tolerant self hating-multicultural lemmings like you, I don’t.

Politically I used to always be on the liberal left. Because I believe in things like social justice, tolerance and respect. You know, the good things in life. I still believe in those things, which is why I’m no longer on the liberal left.

Apologists for evil

In this video Pat talks about banning the burka. Given that wearing a burka in Western countries is most definitely only done to make a pathetic political statement, I think it is indeed a good idea to ban burkas. Besides you’re not allowed to wear ski masks when you enter a bank either. You’re not allowed to walk naked in the streets. Yet countless people are trying to claim that these women should have a right to wear burkas. Framing it that way is of course utter bullshit: the debate isn’t about women rights at all. Claiming that it is, is being intellectually dishonest. The debate is about the right for a Islamist husband to claim ownership over a woman or a girl. This isn’t a right in Western countries. The fact that it isn’t, is a good thing.

Pat also points out that Western feminists are rather silent about women rights in Islam. Usually feminists are assertive and confident but this time, apparently, feminists are muted on the issue. Why is that? Where are they?

Ban the burka

For the person who recently debated religion with me (you know who you are): I recently read “Letter to A Christian Nation” by Sam Harris. Very interesting read. I recommend it!

SPARQL’s str() function in Tracker

Today I implemented the str() function for our SPARQL engine.

This makes it possible to use a <subject> just like a string.

Let’s first insert some data into our SPARQL store.

tracker-sparql -u -q \
   "INSERT { <urn:baaa> a rdfs:Resource }"

Following query doesn’t work, as variable ?s isn’t assigned with a xsd:string here, but a rdfs:Resource.

tracker-sparql -q
"SELECT ?s WHERE {
	?s a rdfs:Resource .
	FILTER REGEX (?s, '.*baaa', 's')
}"

This version works, because we introduce the str() function.

tracker-sparql -q
"SELECT ?s WHERE {
	?s a rdfs:Resource .
	FILTER REGEX (str(?s), '.*baaa', 's')
}"
  urn:uuid:94baaa45-99a6-e0f4-0bd9-f83ca90a9039
  urn:uuid:6e909006-a6ac-baaa-2ae4-cc01adcd5de7
  urn:baaa

You can also use a direct match, of course.

tracker-sparql -q
"SELECT ?s WHERE {
	?s a rdfs:Resource .
	FILTER (str(?s) = 'urn:baaa')
}"
  urn:baaa

By the way. Ivan made a cute tool in Python for typing in your queries:

It even does some code completion. If you type nco:[TAB] it’ll show you the NCO ontology. Nice!

Async with the mainloop

A technique that we started using in Tracker is utilizing the mainloop to do asynchronous functions. We decided that avoiding threads is often not a bad idea.

Instead of instantly falling back to throwing work to a worker thread we try to encapsulate the work into a GSource’s callback, then we let the callback happen until all of the work is done.

An example

You probably know sqlite3’s backup API? If not, it’s fairly simple: you do sqlite3_backup_init, followed by a bunch of sqlite3_backup_step calls, finalizing with sqlite3_backup_finish. How does that work if we don’t want to block the mainloop?

I removed all error handling for keeping the code snippet short. If you want that you can take a look at the original code.

static gboolean
backup_file_step (gpointer user_data)
{
  BackupInfo *info = user_data; int i;
  for (i = 0; i < 100; i++) {
    if ((info->result = sqlite_backup_step(info->backup_db, 5)) != SQLITE_OK)
        return FALSE;
  }
  return TRUE;
}

static void
backup_file_finished (gpointer user_data)
{
  BackupInfo *info = user_data;
  GError *error = NULL;
  if (info->result != SQLITE_DONE) {
    g_set_error (&error, _DB_BACKUP_ERROR,
                 DB_BACKUP_ERROR_UNKNOWN,
                 "%s", sqlite3_errmsg (
                    info->backup_db));
  }
  if (info->finished)
    info->finished (error, info->user_data);
  if (info->destroy)
    info->destroy (info->user_data);
  g_clear_error (&error);
  sqlite3_backup_finish (info->backup);
  sqlite3_close (info->db);
  sqlite3_close (info->backup_db);
  g_free (info);
}

void
my_function_make_backup (const gchar *dbf, OnBackupFinished finished,
                         gpointer user_data, GDestroyNotify destroy)
{
  BackupInfo *info = g_new0(BackupInfo, 1);
  info->user_data = user_data;
  info->destroy = destroy;
  info->finished = finished;
  info->db = db;
  sqlite3_open_v2 (dbf, &info->db, SQLITE_OPEN_READONLY, NULL);
  sqlite3_open ("/tmp/backup.db", &info>backup_db);
  info->backup = sqlite3_backup_init (info->backup_db, "main",
                                      info->db, "main");
  g_idle_add_full (G_PRIORITY_DEFAULT, backup_file_step,
                   info, backup_file_finished);
}

Note that I’m not suggesting to throw away all your threads and GThreadPool uses now.
Note that just like with threads you have to be careful about shared data: this way you’ll allow that other events on the mainloop will interleave your backup procedure. This is async(ish), it’s precisely what you want, of course.

More introduction to RDF and SPARQL

Introduction

I plan to give an introduction to features like COUNT, FILTER REGEX and GROUP BY which are supported by Tracker‘s SPARQL engine. We support more such features but I have to start the introduction somewhere. And overloading people with introductions to all features wont help me much with explaining things.

Since my last introduction to RDF and SPARQL I have added a few relationships and actors to the game.

We have Morrel, Max and Sasha being dogs, Sheeba and Query are cats, Picca is still a parrot, Fred and John are contacts. Fred claims that John is his friend. I changed the ontology to allow friendships between the animals too: Sasha claims that Morrel and Max are her friends. Sheeba claims Query is her friend. John bought Query. Fred being inspired by John decided to also get some pets: Morrel, Sasha and Sheeba.

Ontology

Let’s put this story in Turtle:

<test:Picca> a test:Parrot, test:Pet ;
	test:name "Picca" .

<test:Max> a test:Dog, test:Pet ;
	test:name "Max" .

<test:Morrel> a test:Dog, test:Pet ;
	test:name "Morrel" ;
	test:hasFriend <test:Max> .

<test:Sasha> a test:Dog, test:Pet ;
	test:name "Sasha" ;
	test:hasFriend <test:Morrel> ;
	test:hasFriend <test:Max> .

<test:Sheeba> a test:Cat, test:Pet ;
	test:name "Sheeba" ;
	test:hasFriend <test:Query> .

<test:Query> a test:Cat, test:Pet ;
	test:name "Query" .

<test:John> a test:Contact ;
	test:owns <test:Max> ;
	test:owns <test:Picca> ;
	test:owns <test:Query> ;
	test:name "John" .

<test:Fred> a test:Contact ;
	test:hasFriend <test:John> ;
	test:name "Fred" ;
	test:owns <test:Morrel> ;
	test:owns <test:Sasha> ;
	test:owns <test:Sheeba> .

Querytime!

Let’s first start with all friend relationships:

SELECT ?subject ?friend
WHERE { ?subject test:hasFriend ?friend }

  test:Morrel, test:Max
  test:Sasha, test:Morrel
  test:Sasha, test:Max
  test:Sheeba, test:Query
  test:Fred, test:John

Just counting these is pretty simple. In SPARQL all selectable fields must have a variable name, so we add the “as c” here.

SELECT COUNT (?friend) AS c
WHERE { ?subject test:hasFriend ?friend }

  5

We counted friend relationships, of course. Let’s say we want to count how many friends each subject has. This is a more interesting query than the previous one.

SELECT ?subject COUNT (?friend) AS c
WHERE { ?subject test:hasFriend ?friend }
GROUP BY ?subject

  test:Fred, 1
  test:Morrel, 1
  test:Sasha, 2
  test:Sheeba, 1

Actually, we’re only interested in the human friends:

SELECT ?subject COUNT (?friend) AS c
WHERE { ?subject test:hasFriend ?friend .
        ?friend a test:Contact
} GROUP BY ?subject

  test:Fred, 1

No no, we are only interested in friends that are either cats or dogs:

SELECT ?subject COUNT (?friend) AS c
WHERE { ?subject test:hasFriend ?friend .
       ?friend a ?type .
       FILTER ( ?type = test:Dog || ?type = test:Cat)
} GROUP BY ?subject"

  test:Morrel, 1
  test:Sasha, 2
  test:Sheeba, 1

Now we are only interested in friends that are either a cat or a dog, but whose name starts with a ‘S’.

SELECT ?subject COUNT (?friend) as c
WHERE { ?subject test:hasFriend ?friend ;
                 test:name ?n .
       ?friend a ?type .
       FILTER ( ?type = test:Dog || ?type = test:Cat) .
       FILTER REGEX (?n, '^S', 'i')
} GROUP BY ?subject

  test:Sasha, 2
  test:Sheeba, 1

Conclusions

Should we stop talking about ontologies and start talking about searchboxes and user interfaces instead? Although I certainly agree more UI-stuff is needed, I’m not sure yet. RDF and SPARQL are also about relationships and roles. Not just about matching stuff. Whenever we explain the new Tracker to people, most are stuck with ‘matching’ in their mind. They don’t think about a lot of other use-cases.

Such a search is just one use-case starting point: user entered a random search string and gives zero other meaning about what he needs. Many more situations can be starting points: When I select a contact in a user interface designed to show an archive of messages that he once sent to me, the searchbox becomes much more narrow, much more helpful.

As soon as you have RDF and SPARQL, and with Tracker you do, an application developer can start taking into account relationships between resources: The relationship between a contact in Instant Messaging and the attachments in an E-mail that he as a person has sent to you. Why not combine it with friendship relationships synced from online services?

With a populated store you can make the relationship between a friend who joined you on a trip, and photos of a friend of your friend who suggested the holiday location.

With GeoClue integration we could link his photos up with actual location markers. You’d find these photos that came from the friend of your friend, and we could immediately feed the location markers to the GPS software on your phone.

I really hope application developers have more imagination than just global searchboxes.

And this is just a use-case that is technically already possible with today’s high-end phones.

Introduction to RDF and SPARQL

Let’s start with a relatively simple graph. The graph shows the relationships between John, Fred, Max and Picca. John and Fred are humans who we’ll refer to as contacts. Max and Picca are pets. Max is a dog and Picca is a parrot. Both Picca and Max are owned by John. Fred claims that John is his friend.

If we would want to represent this story semantically we would first need to make an dictionary that describes pets, contacts, dogs, parrots. The dictionary would also describe possible relationships like ownership of a pet and the friendship between two contacts. Don’t forget, making something semantic means that you want to give meaning to the things that interest you.

Giving meaning is exactly what we’ll start with. We will write the schema for making this story possible. We will call this an ontology.

We describe our ontology using the Turtle format. In Turtle you can have prefixes. The prefix test: for example is the same as using <http://test.org/ontologies/tracker#>.

In Turtle you describe statements by giving a subject, a predicate and then an object. The subject is what you are talking about. The predicate is what about the subject your are talking about. And finally the object is the value. This value can be a resource or a literal.

When you write a . (a dot) in Turtle it means that you end describing the subject. When you write a ; (semicolon) it means that you continue with the same subject, but will start describing a new predicate. When you write a , (comma) it means that you even continue with the same predicate. The same rules apply in the WHERE section of a SPARQL query. But first things first: the ontology.

Note that the “test” ontology is not officially registered at tracker-project.org. It serves merely as an example.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix tracker: <http://www.tracker-project.org/ontologies/tracker#> .
@prefix test: <http://www.tracker-project.org/ontologies/test#> .

test: a tracker:Namespace ;
	tracker:prefix "test" .

test:Entity a rdfs:Class .

test:Contact a rdfs:Class ;
	rdfs:subClassOf test:Entity .

test:Pet a rdfs:Class ;
	rdfs:subClassOf test:Entity .

test:Dog a rdfs:Class ;
	rdfs:subClassOf test:Entity .

test:Parrot a rdfs:Class ;
	rdfs:subClassOf test:Entity .

test:name a rdf:Property ;
	rdfs:domain test:Entity ;
	rdfs:range xsd:string .

test:owns a rdf:Property ;
	rdfs:domain test:Contact ;
	rdfs:range test:Pet .

test:hasFriend a rdf:Property ;
	rdfs:domain test:Contact ;
	rdfs:range test:Contact .

Now that we have meaning, we will introduce the actors: Picca, Max, John and Fred. Copy the @prefix lines of the ontology file from above, put the ontology file in the share/tracker/ontologies directory and run tracker-processes -r before restarting tracker-store in master. After doing all that you can actually store this as a /tmp/import.ttl file and then run tracker-import /tmp/import.ttl and it should import just fine. Ready for the queries below to be executed with the tracker-sparql -q ‘$query’ command.

Note that tracker-processes -r destroys all your RDF data in Tracker. We don’t yet support adding custom ontologies at runtime, so for doing this test you have to start everything from scratch.

<test:Picca> a test:Parrot, test:Pet ;
	test:name "Picca" .

<test:Max> a test:Dog, test:Pet ;
	test:name "Max" .

<test:John> a test:Contact ;
	test:owns <test:Max> ;
	test:owns <test:Picca> ;
	test:name "John" .

<test:Fred> a test:Contact ;
	test:hasFriend <test:John> ;
	test:name "Fred" .

Let’s do some simple SPARQL queries. You can execute these queries this way:

tracker-sparql -q "SELECT ?subject WHERE { ?subject a test:Parrot }"

In this query we ask for the subject of each entity that is a parrot. The query will yield test:Picca because Picca is the only parrot in our situation.

  test:Picca

Usually we aren’t interested in the subject, but in a real property of the parrot. We can ask for such a property this way:

SELECT ?subject ?name WHERE { ?subject a test:Parrot ; test:name ?name}
  test:Picca, Picca

Another simple example, give me all the contacts:

SELECT ?subject WHERE { ?subject a test:Contact }"
  test:John
  test:Fred

Just the contacts doesn’t illustrate much. Give me all contacts that have a friend. And display the contact and the friend’s names:

SELECT ?name ?friend
WHERE { ?subject test:hasFriend ?f ;
                 test:name ?name .
        ?f test:name ?friend }
  Fred, John

Let’s ask for all the pets that are owned:

SELECT ?subject WHERE { ?unknown test:owns ?subject }
  test:Max
  test:Picca

Oh, not the subject. The names. How did we do that again? Right:

SELECT ?name
WHERE { ?unknown test:owns ?subject .
        ?subject test:name ?name }
  Max
  Picca

This will of course yield the same results in our situation:

SELECT ?name
WHERE { <test:John> test:owns ?subject .
        ?subject test:name ?name }
  Max
  Picca

But this wont, Fred doesn’t own any pets. Only John owns pets.

SELECT ?name
WHERE { <test:Fred> test:owns ?subject .
        ?subject test:name ?name }

Let’s print the owner’s and the pet’s names:

SELECT ?owner ?name
 WHERE { ?unknown test:owns ?subject ;
                  test:name ?owner .
         ?subject test:name ?name }"
  John, Max
  John, Picca

Still with me? Let’s now conclude with requesting the names of the contacts who are a friend of the person who owns Picca:

SELECT ?name
WHERE { ?subject test:owns <test:Picca> .
        ?unknown test:hasFriend ?subject ;
                 test:name ?name }
  Fred

Invitation for Jürg and Rob: How about you guys writing a introduction to OPTIONAL, SUM, COUNT, GROUP-BY and FILTER, etc in SPARQL? :-) The more advanced stuff.

The subject of a resource, Nepomuk’s isStoredAs

After the many discussions the Tracker team did at the Desktop Summit in Gran Canaria I think a lot of people will start trying out Tracker’s master. We will indeed start making 0.7.x releases somewhere this or next month.

Meanwhile I’d like to point out that among the decisions that we made during the meetings and at the Ontology BOFs is that we wont use the URL of resources as the RDF’s subject field anymore. Instead we’ll use the nie:isStoredAs predicate for storing the URL.

Right now we already set nie:isStoredAs, but we still use the URL as subject. This will change, though. Just assume the subject to be something you should only use as an unique piece of data about the resource, pointing at it (in the RDF store). More details can be found here. If you want the thing itself (the file, the E-mail, the .desktop file, the website’s URL), ask for nie:isStoredAs.

For example:

<file:///tmp/myfile.png> a nfo:FileDataObject .
<urn:nepomuk:file:d7ea...> a nfo:Image ;
	nie:isStoredAs <file:///tmp/myfile.png> .

And to query:

tracker-sparql -q "SELECT ?url WHERE { ?subject a nfo:Image ; nie:isStoredAs ?url }

We know that many people want these 0.7.x releases to happen soon. I can only invite those people to just join coding. Awesome stuff is indeed taking place, but at the same time there is a lot of work and decision making to do.

Things like a user interface like the T-S-T (Tracker Search Tool) from Tracker 0.6, documentation with a lot of examples. SPARQL, SPARQL Update and Nepomuk all have quite a lot of documentation by themselves. But people are still asking for even more examples. Anybody interested in making that? Maybe if somebody who was at Rob Taylor’s BOF could write down his and Jürg’s lectures on RDF and SPARQL? I think they explained it all very well.

A ridiculous small shellscript

Now, we can finally replace Richard Stallman with a small shellscript

— Alp Toker, Gran Canaria at the Igalia party, 06 juli 2009

I’ll write it in C#

public void ActCrazy () {
   while (true) {
      be incorrect about Mono
   }
}

Tracker experimental merged to main development tree, Ivan’s presentation

I’m currently involved in the Tracker project and our project will be presented by Ivan Frade at the Desktop Summit this Sunday.

We merged our experimental branch tracker-store to master. This means that our reachitecture plans for Tracker have mostly been implemented and are being pushed forward into the main development tree.

I will start with a comparison with Tracker’s 0.6.x series.

Tracker master:

  • Uses SPARQL as query language
  • Uses Nepomuk for its base ontologies
  • Supports SPARQL Update
  • Supports aggregates COUNT, AVG, SUM, MIN and MAX in SPARQL
  • Operates for all its storage functionality as a separate binary
  • Operates all its indexing, crawling and monitoring functionalities in a separately packagable binary

Tracker 0.6.9x:

  • Uses RDFQuery as query language
  • Has its own ontology
  • Has very limited support for storing your own data
  • Supports several aggregate functions in its query language
  • Operates for all its storage functionality in the indexer
  • Operates for all its query functionality in the permanent daemon
  • Does file monitoring and crawling in the permanent daemon
  • Operates all its indexing functionality in a separately packagable binary

Tracker master:

Architecture

The storage service uses the Nepomuk ontologies as schema. It allows you to both query and insert or delete data.

The fs-miner finds, crawls and monitors file resources. It also analyses those files and extracts the metadata. It instructs the storage service to store the metadata.

External applications and other such miners are allowed to both query and insert using the storage service. Priority is given to queries over stores.

Plugins that run in process of the application can push information into Tracker. We indeed don’t try to scan Evolution’s cache formats, we have a plugin that gets it out of Evolution and into Tracker.

Storage service’s API and IPC

The storage service gives priority to SELECT queries to ensure that apps in need of metadata get serviced quickly.

INSERT and DELETE calls get queued. SELECT ones get executed immediately. For apps that require consistency and/or insertion speed we provide a batch mode that has a commit barrier. When the commit calls back you know that everything that came before it, is in a consistent shape. We don’t support full transactions with rollback.

The standard API operates over DBus. This means while using it you are subject to DBus’s performance limitations. In SPARQL Update it is possible to group a lot of writes. Due to DBus’s latency overhead this is recommended when inserting larger sets of data. We’re experimenting with a custom IPC system, based on unix sockets, to get increased throughput for apps that want to put a lot of INSERTs onto our queue.

We provide a feature that signals on changes happening to certain types. You can see this as a poor man’s live search. Full live search for SPARQL is fairly complicated. Maybe in future we’ll implement something like that.

Ontology

We support the majority of the Nepomuk base ontologies and our so called filesystem miners will store found metadata using Nepomuk’s ontologies. We support static custom ontologies right now. This means that it’s impossible to dynamically add a new ontology unless you reset the entire database first.

We’re planning to support dynamically adding and removing ontologies. The ontology format that we use is Turtle.

Backup and import

Right now we support loading data into our database using either SPARQL Update, an experimental unix-socket based IPC, and by passing us a Turtle file.

We currently have no support for making a backup. Support for this is on priority planning. It will write a Turtle file (which can be loaded afterward).

Backup and import of ontology specific metadata

When we introduce support for custom ontologies it’ll be useful for apps that provided their own custom ontology to get a backup of just the data that has relevance to said ontology. We plan to provide a method to do that.

Volume support

Having a static custom ontology for volume support, volumes and their status is queryable over SPARQL. File resources also get linked to said volumes. This makes it possible to get the availability of a file resource. For example: return metadata about all photos that are located on a specific camera, although the camera isn’t connected to this device.

Volume support is a work in progress at this moment.

By the way

Tinymail isn’t a sleeping project. I just stopped blogging about it. José Dapena Paz and Sergio Villar Senin are working very hard making it rock solid. Having worked together with Sergio a lot, I trust him. So a few months ago I made him Co maintainer of the project. He’ll probably perform the first release (or decide to do a few more pre-releases first). Being Modest’s technical maintainer Sergio has worked hard on and contributed a lot to Tinymail. Last few weeks José Dapena Paz is the guy who apparently is on fire, writing patches like a madman.

And it looks like there’s no stopping José! Maybe will GUADEC stop him for at least a few days? Maybe I should help Sergio a bit with reviewing all that stuff?

As far as I know will Modest be the default E-mail client on Maemo’s Fremantle device. It has been available for the N810 for some months of course, but for the Fremantle release I’m sure the guys have improved the user interface a lot. I, personally, have been working on Tracker and didn’t focus much on Tinymail. And of course I’m already thinking about how we can make E-mail part of that RDF platform. But that’s another story (and I think I wrote two articles on that already).

Anyway, just letting everybody know: people are still working on Tinymail. They just don’t blog about it as much as I used to do. No worries, though. They are doing great stuff.

Finite resources, infinite growth

For some people this post can be controversial. I added a category “controversial” to my blog for people who prefer to filter it.

We start a imaginary experiment where we start with a bottle filled up with food and room left for exactly two worms. We assume worms replicate at a doubling time of one minute. We observed in a previous experiment that the bottle is filled up in exactly one hour. They eat the food as they double themselves, etc (use your imagination).

At 11’O clock in the morning we place two worms in the bottle. At what time will the bottle be full (easy)? At what time will the bottle be half full? At what time is the bottle only 3% filled up?

Humans have a global population growth of about 1.2% per year. It’s about 1% in wealthy countries and about 2-3% in poor countries. If you want to calculate a doubling time you take 70 and you divide it with the growth percentage. Which means that at our current growth rate, we’ll double our total population in 60 years.

In 1950 we were with about 2.7 thousand million people, in 1990 we were with 5 thousand million people. In 2050 we will be with 10 thousand million people. Infinite growth isn’t possible with finite resources. In 2400 years, at current growth rate, the earth’s mass will in theory be roughly equal to the total amount of human flesh.

The main question is, how big is our bottle? Let’s go back to the worms. For the worms the bottle is about 3% filled up at 11:55. It’s half full at 11:59. It’s overpopulated at 12:00. When three new bottles are found and pipes are connected with the first, the three new bottles will be filled up at 12:02. After that will four new bottles be filled up at 12:03. After that you need eight new bottles to survive minute 12:04. In minute 12:05 it starts getting crazy proportions.

Even if our bottle is only 3% filled up now, then still at our retirement age we will inevitably be at 50% capacity. During those retirement years we’ll see the population grow at an enormous speed to maximum capacity within a few years.

I’m among the people who believe that we’re already at 70% capacity of our planet. I think we have about 30 years of finite resources left: doubling the population to 10 thousand million people, is impossible (not unreasonable to think). Moving to another bottle will take us at least several more centuries of top notch space science (so this solution is not applicable). And that’s assuming we can leverage the resources of another planet. Moving to another star is simply out of the question unless we invent technology that allows us to let a huge mass travel at the speed of light (again, the solution isn’t applicable).

A solution that I have in mind? Genetically modifying newborn humans to have an annual fertility frequency and having their fertility enabled at a mature age. Instead of based on the phase of the moon would women be fertile only once per year. And instead of at the average age of 12 would women start becoming fertile at the average age of, for example, 25.

Is genetic modification immoral? Being an atheist I don’t have any believe system that forbids me to tamper with species. It’s indeed still immoral because we don’t know what we are doing, yet. No, morality is not divinely injected by a God. Atheists are born with morals, too.

But if we have to choose between living with each other under the condition of having insufficient resources, or making a change to our species, I know which of the two I will prefer.

Now, if you do believe in a God, then you must also acknowledge that your God’s intention was for us to become intelligent enough to genetically modify our species. If not, why ain’t it stopping us? We, for example, have successfully been genetically selecting dogs for centuries. And we have started genetically modifying them (active modification: interfering with the egg and sperm cells).

Mankind will have to open this difficult discussion sooner or later.

FWD: Entrepreneurs can change the world

Link for planets

Rearchitecting Tracker

Jürg and me have started working on the rearchitecture plans that we have for Tracker. You can follow the code being changed here and here.

What is finished?

  • Jürg took all database code out of the indexer. The indexer is now a consumer of tracker-store like any other. It commands tracker-store to store metadata. The indexer now also queries tracker-store for things like the modification time. Currently it has no access to the database directly. This might change, for performance reasons, we’re not sure about that yet.
  • The trackerd process got renamed to tracker-store.
  • The DBus object in tracker-store now executes the SPARQL Update requests itself. It used to send this request to tracker-indexer.

  • Jürg moved the watching and crawling code that used to be in the daemon to the indexer. This means that tracker-store doesn’t depend on inotify anymore. This work made it possible to make your own indexer or not to have an indexer at all. This was quite a big task and got pushed today. This is of course being tested as we speak.

  • I wrote an internal API to queue database store requests, making it possible to asynchronously deal with large amounts of data when multiple metadata deliverers will be giving tracker-store commands to store their metadata.
  • I also ported existing code to use this internal API. This task item is ongoing and being tested. For example the Turtle Import, support for removable device caches in Turtle, Push modules (support for E-mail clients) and the DBus SPARQL Update API are affected by this.
  • The class signals feature, which now doesn’t require involvement of the indexer, got fixed.

What is left to do?

Right now the indexer will instruct an extractor process to extract metadata from a file. This extractor process communicates the metadata first to the indexer, which in turn communicates the same metadata to tracker-store. This can be done more efficient by letting the extractor communicate the metadata directly to tracker-store.

We also have quite a few other plans for the indexer’s code. Such plans are a bit less short term planning. For example splitting support for the removable devices and the normal filesystem into two processes.

E-mail as a desktop service, this is how it should be done

While developing Tinymail, a library for writing E-mail clients, I was convinced that the storage of the summary was something Tinymail itself must handle. Back then there was, even pragmatically, nothing that could cope with the requirements of E-mail on mobile devices for this task.

Meanwhile I got the opportunity to work on the Tracker project. Using the Nepomuk Ontology, I made sure that the message ontology that Tracker uses can actually handle these requirements. I believe that the adoption of Nepomuk and SPARQL evolved Tracker from something that isn’t useful for E-mail software to something that should be involved when writing a desktop service for E-mail today.

Ryan, a pioneer in experimenting E-mail as a desktop service, advised me to be careful with my bias for RDF and SPARQL. I’ll keep it in mind! However…

I believe such a desktop service for E-mail should:

  • Download metadata by getting and parsing ENVELOPE and BODYSTRUCTURE using the FETCH programme of an IMAP server. As explained in this document.
  • Give priority to downloading metadata of those E-mails nearby the user’s scroll position.
  • Use IMAP’s pipelining. It gives the user the feeling that his technology operates faster than his human brains, even when on high latency connections like GPRS.
  • Cache the information, using Tracker’s Nepomuk Message Ontology as schema.
  • Make it possible to fetch just one particular MIME part and not the entire messages.
  • Enable it to create a new message, consisting of individual MIME parts.
  • Make it possible for those MIME parts to have their source in existing messages on an IMAP server when creating a message. When the IMAP server supports CATENATE, it should be used for this purpose.
  • Make applications use SPARQL with Tracker’s NMO to query metadata about E-mails.
  • Provide a stream API to get access to the the actual data of individual MIME parts. If not cached, the service should download the MIME part on demand. The DBus Stream API should look like GInputStream. Except for read(): I think for the transfer of the chunks of data that Unix Sockets or named pipes are better than using D-Bus.

To this I would like to add that although many people falsely believe that E-mails are like files, E-mails are more like recursive directories (container MIME parts) with items: the E-mail’s MIME parts. Any API that doesn’t admit this, is incorrectly designed.

This goes all the way up to the protocol, where you fetch per MIME part. You don’t fetch entire messages. You can indeed do that but that doesn’t mean it isn’t wrong. IMAP is not POP3. It’s also better to design for IMAP, than to use IMAP as a POP3 service. Better have hacks to support POP3 in your model (I’m serious).

Please don’t make the same mistake nearly every newcomer of E-mail solutions makes. There’s plenty of rubbish already, seriously.

Tracker, our near future plans

Apparently

Apparently this hasn’t been echoed enough times. A lot of teams are still wondering what they should use if they want to store RDF metadata in Nepomuk and how to query it.

What happened before

We have refactoring to bring Tracker’s codebase into a better state. This is being released as Tracker 0.6.9x. This one sentence is really not enough to describe the changes. We can’t continue talking about the past forever. Sorry guys.

We have introduced support for SPARQL and Nepomuk in Tracker. We also added the class-signals feature, Turtle import & export, and many other features like SPARQL UPDATE support. Making the storage engine effectively a generic Nepomuk RDF store that can be used to store and query RDF data.

What will happen

We are at this moment planning to rearchitect Tracker a little bit.
Among our plans we want to make the RDF metadata store standalone. The store stores your metadata using Nepomuk as ontology and enables the application developer to query in SPARQL. This means that it’ll be possible to use this storage service without the indexer even installed. This is already possible but right now we do the crawling and monitoring in the storage service.

We plan to move the crawling and monitoring to the indexer. One idea is that the indexer will instruct the extractor to do an analysis and then the extractor will push the extracted metadata to the RDF storage service. Making the indexer and extractor a provider & consumer like any other. Making them optional and separately packagable.

This because we get requests from other teams who don’t want the indexing. Modularizing is usually a good thing, so we now have plans to make this possible as a feature.

Other plans

Other plans that we haven’t thoroughly planned yet include support for custom ontologies. We have a good idea for this, though. We want to wait for it until after the rearchitecturing. Support for custom ontologies will include removing ontologies, installing ontologies and asking for a backup that’ll contain the metadata that is specific for an installed ontology.

Support for custom ontologies doesn’t mean that application developers should all go spastic and start making ontologies. I know you guys! Don’t do it! We want applications to reuse as much of the Nepomuk set as possible. The more Nepomuk gets reused, the more interopability between apps is possible.

Volume support in experimental Tracker

In Tracker’s trunk we have support for volumes. This means that we track removable devices appearing and disappearing. A removable device that disappears means that in your search result you wont see the resources that are on the disconnected removable device.

In trunk we keep a separate table with volume registrations for this.

In master we simply use the ontology. Which also means that we now make this information available to you as metadata in a clean way.

Some examples. List all the volumes that we know about:

tracker-sparql -q "SELECT ?o ?m ?z WHERE {
   ?o a tracker:Volume ;
   tracker:mountPoint ?m ;
   tracker:unmountDate ?z }"

That will return something like this (wrapping the line):

urn:nepomuk:datasource:/org/freed../Hal/devices/volume_uuid_XXX_ABCDE,
    file:///media/USBStick, 2009-04-01T13:38:20

The ones that we know about but aren’t mounted:

tracker-sparql -q "SELECT ?o ?m WHERE {
   ?o a tracker:Volume ;
   tracker:mountPoint ?m ;
   tracker:isMounted false } "

The ones that we know about and are mounted:

tracker-sparql -q "SELECT ?o ?m WHERE {
   ?o a tracker:Volume ;
   tracker:mountPoint ?m ;
   tracker:isMounted true } "

Let’s just for fun list the volumes that got unmounted before a specific date and didn’t get mounted anymore:

tracker-sparql -q "SELECT ?o ?m WHERE {
   ?o a tracker:Volume ;
   tracker:mountPoint ?m ;
   tracker:isMounted false ;
   tracker:unmountDate ?z .
   FILTER (?z < \"`date -u +"%Y-%m-%dT%H:%M:%S%z"`\") }"

I'd like to add that replacing HAL isn't our intention. Not at all. We depend on HAL to track this information ourselves. We need to know the availability of a volume and because we link every file resource to the volume's resource we can that way know about the availability of a resource.

It's under Tracker's own ontology prefix, and it's not really to be considered as decided or stable ontology API yet. We might change some things about it. It wasn't even a design goal to make this publicly accessible.

Why am I showing it? Well, because it's a nice way to explain one of our sprint tasks for the the next two weeks. I promised you guys that I would talk about what we are up to at Tracker. So here you have it!

And a thank you to Sally Shapiro and Johan Agebjörn

Thanks for your music Johan Agebjörn and Sally Shapiro.

It’s wonderful!

Hey Karoliina, Sally Shapiro sounds a bit like the stuff you make. I don’t know, maybe it’s a completely different style. Who cares?

When is your next song ready, by the way? Aha! Train Tracks. Listening.

Don’t stop making music! We addicts need our drug.

Documentation. Documentation. Documentation!

As I was rereading Ivan Frade’s blog post about the class-signals feature that me and Ivan developed for Tracker last week I got reminded of something I wrote a long time ago:

In my opinion, there’s nothing as important as developer documentation for a framework or library.

Ivan decided to ~ dump our internal planning document on GNOME’s Live wiki service. He didn’t write the document so he’s of course not to blame, but the document was in my opinion not suitable as end user framework documentation. Application developers should not be required to understand what goes on in our team’s minds. So I rewrote this.

You can find the document at the SignalsOnChanges page. Let’s see if starting next week I can convince the other team members to document their new features in a similar way. Many new things in Tracker’s experimental branch could use more clarification and examples.

I happen to believe that undocumented libraries and frameworks aren’t meaningful. That’s because letting it remain undocumented the developer renders his work utterly irrelevant for a serious application developer. I also believe that undocumented infrastructure software is worse than no infrastructure software.

I strongly recommend to application developers to refrain from using any piece of undocumented library or framework. Don’t lower yourself to their standards. Demand documentation by refusing to use undocumented infrastructure. Doing it as free software is no excuse for delivering extremely low quality, like what undocumented infrastructure is (in my opinion).

Thanks to Ivan’s blog post for reminding me of what I once wrote, and today still believe in, myself. While passionately coding new features you sometimes forget about this.

Which is unacceptable.