Supporting ontology changes in Tracker

It used to be in Tracker that you couldn’t just change the ontology. When you did, you had to reboot the database. This means loosing all the non-embedded data. For example your tags or other such information that’s uniquely stored in Tracker’s RDF store.

This was of course utterly unacceptable and this was among the reasons why we kept 0.8 from being released for so long: we were afraid that we would need to make ontology changes during the 0.8 series.

So during 0.7 I added support for what I call modest ontology changes. This means adding a class, adding a property. But just that. Not changing an existing property. This was sufficient for 0.8 because now we could at least do some changes like adding a property to a class, or adding a new class. You know, making implementing the standard feature requests possible.

Last two weeks I worked on supporting more intrusive ontology changes. The branch that I’m working on currently supports changing tracker:notify for the signals on changes feature, tracker:writeback for the writeback features and tracker:indexed which controls the indexes in the SQLite tables.

But also certain range changes are supported. For example integer to string, double and boolean. String to integer, double and boolean. Double to integer, string and boolean. Range changes will sometimes of course mean data loss.

Plenty of code was also added to detect an unsupported ontology change and to ensure that we just abort the process and don’t do any changes in that case.

It’s all quite complex so it might take a while before the other team members have tested and reviewed all this. It should probably take even longer before it hits the stable 0.8 branch.

We wont yet open the doors to custom ontologies. Several reasons:

  • We want more testing on the support for ontology changes. We know that once we open the doors to custom ontologies that we’ll see usage of this rather sooner than later.
  • We don’t yet support removing properties and classes. This would be easy (drop the table and columns away and log the event in the journal) but it’s not yet supported. Mostly because we don’t need it ourselves (which is a good reason).
  • We don’t want you to meddle with the standard ontologies (we’ll do that, don’t worry). So we need a bit of ontology management code to also look in other directories, etc.
  • The error handling of unsupported ontology changes shouldn’t abort like mentioned above. Another piece of software shouldn’t make Tracker unusable just because they install junk ontologies.
  • We actually want to start using OSCAF‘s ontology format. Perhaps it’s better that we wait for this instead of later asking everybody to convert their custom ontologies?
  • We’re a bunch of pussies who are afraid of the can of worms that you guys’ custom ontologies will open.

But yes, you could say that the basics are being put in place as we speak.

Zürichsee

Today after I brought Tinne to the airport I drove around Zürichsee. She can’t stay in Switzerland the entire month; she has to go back to school on Monday.

While driving on the Seestrasse I started counting luxury cars. After I reached two for Lamborgini and three for Ferrari I started thinking: Zimmerberg Sihltal and Pfannenstiel must be expensive districts tooAnd yes, they are.

I was lucky today that it was nice weather. But wow, what a nice view on the mountain tops when you look south over Zürichsee. People from Zürich, you guys are so lucky! Such immense calming feeling the view gives me! For me, it beats sauna. And I’m a real sauna fan.

I’m thinking to check it out south of Zürich. But not the canton. I think the house prices are just exaggerated high in the canton of Zürich. I was thinking Sankt Gallen, Toggenburg. I’ve never been there; I’ll check it out tomorrow.

Hmmr, meteoswiss gives rain for tomorrow. Doesn’t matter.

Actually, when I came back from the airport the first thing I really did was fix coping with property changes in ontologies for Tracker. Yesterday it wasn’t my day, I think. I couldn’t find this damn problem in my code! And in the evening I lost three chess games in a row against Tinne. That’s really a bad score for me. Maybe after two weeks of playing chess almost every evening, she got better than me? Hmmrr, that’s a troubling idea.

Anyway, so when I got back from the airport I couldn’t resist beating the code problem that I didn’t find on Friday. I found it! It works!

I guess I’m both a dreamer and a realist programmer. But don’t tell my customers that I’m such a dreamer.

Bern, an idyllic capital city

Today Tinne and I visited Switzerland’s capital, Bern.

We were really surprised; we’d never imagined that a capital city could offer so much peace and calm. It felt good to be there.

The fountains, the old houses, the river and the snowy mountain peaks give the city an idyllic image.

Standing on the bridge, you see the roofs of all these lovely small houses.

The bear is the symbol of Bern. Near the House of Parliament there was this statue of a bear. Tinne just couldn’t resist to give it a hug. Bern has also got real bears. Unfortunately, Tinne was not allowed to cuddle those bears.

The House of Parliament is a truly impressive building. It looks over the snowy mountains, its people and its treasury, the National Bank of Switzerland.


As you can imagine, the National Bank building is a master piece as well. And even more impressive; it issues a world leading currency.

On the market square in Oerlikon we first saw this chess board on the street; black and white stones and giant chess pieces. In Bern there was also a giant chess board in the backyard of the House of Parliament. Tinne couldn’t resist to challenge me for a game of chess. (*edit*, Armin noted in a comment that the initial position of knight and bishop are swapped. And OMG, he’s right!)

And she won!

At the House of Parliament you get a stunning, idyllic view on the mountains of Switzerland.


Bla bla bla, subqueries in SPARQL, bla bla

Coming to you in a few days is what Jürg has been working on for last week.

Yeah, you guess it right by looking at the query below: subqueries!

This example shows you the amount of E-mails each contact has ever sent to you:

SELECT ?address
    (SELECT COUNT(?msg) AS ?msgcnt WHERE { ?msg nmo:from ?from })
WHERE {
    ?from a nco:Contact ;
          nco:hasEmailAddress ?address .
}

The usual warnings apply here: I’m way early with this announcement. It’s somewhat implemented but insanely experimental. The SPARQL spec has something for this in a draft wiki page. Due to lack of error reporting and detection it’s easy to make stuff crash or to get it to generate wrong native SQL queries.

But then again, you guys are developers. You like that!

Why are we doing this? Ah, some team at an undisclosed company was worried about performance and D-Bus overhead: They had to do a lot of small queries after doing a parent query. You know, a bunch of aggregate functions for counts, showing the last message of somebody, stuff like that.

I should probably not mention this feature yet. It’s too experimental. But so exciting!

Anyway, here’s the messy branch and here’s the reviewed stuff for bringing this feature into master.

ps. I wish I could show you guys the query that we support for that team. It’s awesome. I’ll ask around.

Handling triplets arriving in tracker-store, CouchDB integration as use-case

At GCDS Jamie told us that he wants to make a plugin for tracker-store that writes all the triplets to a CouchDB instance.

Letting a CouchDB be a sort of offline backup isn’t very interesting. You want triples to go into the CouchDB at the moment of guaranteed storage: at commit time.

For the purpose of developing this we provide the following internal API.

typedef void (*TrackerStatementCallback) (const gchar *graph,
                                          const gchar *subject,
                                          const gchar *predicate,
                                          const gchar *object,
                                          GPtrArray   *rdf_types,
                                          gpointer     user_data);
typedef void (*TrackerCommitCallback)    (gpointer     user_data);

tracker_data_add_insert_statement_callback (TrackerStatementCallback callback,
                                            gpointer                 user_data);
tracker_data_add_delete_statement_callback (TrackerStatementCallback callback,
                                            gpointer                 user_data);
tracker_data_add_commit_statement_callback (TrackerCommitCallback callback,
                                            gpointer              user_data);

You’ll need to make a plugin for tracker-store and make the hook at the initialization of your plugin.

Current behaviour is when graph is NULL, it means that the default graph is being used. If it’s not NULL, it means that you probably don’t want the data in CouchDB: it’s data that’s coming from a miner. You probably only want to store data that is coming from the user. His applications won’t use FROM and INTO for their SPARQL Update queries, meaning that graph is NULL.

Very important is that your callback handler works with bottom halves: put your expensive task on a queue and handle the queued item somewhere else. You can for example use a GThreadPool or a GQueue plus a g_idle_add_full with G_PRIORITY_LOW callback picking items one by one on the mainloop. You should never have a TrackerStatementCallback or a TrackerCommitCallback that blocks. Not even a tiny tiny bit of blocking: it’ll bring everything in tracker-store on its knees. It’s why we aren’t giving you a public plugin API with a way to install your own plugins outside of the Tracker project.

By the way: we want to see code instead of talk before we further optimize things for this purpose.

Writeback, writing metadata back into your files

Today, I feel like exposing you to some bleeding edge development going on as we speak at the Tracker team. I know you’re scared of that and that’s precisely why I want to expose you! Hah.

We are prototyping writeback support for Tracker.

With writeback we mean writing metadata that the user passes to us via SPARQL UPDATE into the file that he’s describing.

This means that it must be about a thing that is stored, that it must update a property that we want to writeback and it means that we need to support the format.

OK, that’s three requirements before we write anything back. Let’s explain how this stuff works in the prototype!

In our prototype you mark properties that are eligible for being written into the files using tracker:writeback.

It goes like this:

nie:title a rdf:Property ;
   rdfs:label "Title" ;
   rdfs:comment "The title of the document" ;
   rdfs:subPropertyOf dc:title ;
   nrl:maxCardinality 1 ;
   rdfs:domain nie:InformationElement ;
   rdfs:range xsd:string ;
   tracker:fulltextIndexed true ;
   tracker:weight 10 ;
   tracker:writeback true .

Next you need a writeback module for tracker-writeback. We implemented a prototype one that can only write the title of MP3 files. It uses ID3lib‘s C API.

When the user is describing a file, the resource must have nie:isStoredAs. The property being changed ‘s tracker:writeback must be true. We want the value of the property too. That’s simple in SPARQL, right? Sure it is!

SELECT ?url ?predicate ?object {
    <$subject> ?predicate ?object ;
               nie:isStoredAs ?url .
    ?predicate tracker:writeback true
 }

You’ll find this query in the code, go look!

Now it’s simple: using ID3lib we map Nepomuk to ID3 and write it.

No don’t be afraid, we’re not going to writeback metadata that we found ourselves. We’ll only writeback data that the user provided in the form of a SPARQL Update on the default graph. No panic. Besides, using tracker-writeback is going to be completely optional (just don’t run it).

This is a prototype, I repeat, this is a prototype. No expectations yet please. Just feel exposed to scary stuff, get overly excited and then join us by contributing. It’s all public what we’re doing in the branch ‘writeback’.

ps. Whether this will be Maemo’s future metadata-write stuff? Hmm, I don’t know. Do you know? ;-)

Keeping the autotools guys happy with qmake

I’m still figuring out how to do the same thing with cmake, but various bloggers and comments appear to be promising that it’ll be even more easy.

But this is a message for probably all Nokia teams who are making Qt-based libraries:

First open your src/src.pro file and add this stuff:

CONFIG += create_pc create_prl
QMAKE_PKGCONFIG_REQUIRES = QtGui
QMAKE_PKGCONFIG_DESTDIR = pkgconfig
INSTALLS += target headers

Now open your debian/$package-dev.install file and add this line:

usr/lib/pkgconfig

You’ll be doing all the autotools people a tremendous favor.

Next, open the README file and document that you need to use qmake-qt4 on Debian or make either qmake-qt3 or qmake-qt4 work flawlessly with your build environment. Perhaps also mention how to set the install prefix, how to make qmake find and install .pc files in another location, stuff like that. I find that this is lacking for almost every Qt-based library.

You’ll be doing everybody who wants to use your software a tremendous favor.

Indentation

People,

Let’s all stop doing this:

static void
my_calling_function_wrong (void)
{
[tab]MyItem1 *item1;
[tab]MyItem2 *item2;
[tab]MyItem3 *item3;

[tab]my_long_funcion (item1,
[tab][tab][tab][tab]..item2,
[tab][tab][tab][tab]..item3);
}

And start doing this:

static void
my_calling_function_right (void)
{
[tab]MyItem1 *item1;
[tab]MyItem2 *item2;
[tab]MyItem3 *item3;

[tab]my_long_funcion (item1,
[tab].................item2,
[tab].................item3);
}

The former doesn’t make sense unless each and every code viewing text display understands Mode lines’ tab-width property. The latter just always works, with every normal text editor.

ps. The super cool guys at Anjuta have already fixed this for me. I’m sure the even more cool EMacsers and the uber cool vimers can also fix their text editors?

Unnecessary note: [tab] is a tab and . is a space in the examples.

A reason to get up in the morning

Ever since Nokia contacted me about improving Tinymail to make it suitable for their Modest E-mail client have they given me a reason to get up in the morning, to work on something of which I knew would someday kick ass.

With the Maemo5 based Nokia N900 device we’ll have Modest shipped by default, and Tracker being actively used by several of its softwares. Future is going to shine even brighter for Tracker. Hard to brag about it, Tracker is inherently a background thing. Ah, well, technical people know about it.

Having worked on Tracker for more than a year, I now understand Tracker’s potential. At first, while I was trying to make an API for- and store the summary of E-mail envelope headers, so that E-mail clients can access this in a memory efficient way, I was critical of this Tracker stuff.

But then I joined Ivan, Urho, Ottela, Martyn and Carlos who were working on Tracker. Later Jürg joined and at the Berlin Hackfest people like Rob Taylor, Jürg and Urho discussed replacing Tracker’s poorer own ontology with Nepomuk and replacing its query language with SPARQL.

Given the implied complexity I was again critical, but then that crazy Jürg guy in a few weeks time turned Tracker into 99.9% pure fine awesomeness. I quickly joined working on this crazy “vstore” branch. Since a few months we have convinced the other Tracker guys to just start calling it “master”.

Ever since I feel again like a student who is learning how to develop software. Jürg is utilizing so many good techniques and we’re implementing so many specifications that are just “the right thing to do”, that the beautify of it all could sometimes make me cry of happiness.

Thanks to creating the opportunity to develop on software that will be used on for example their N900 device, Nokia continues giving people like me a reason to get up in the morning.

Don’t tell the native Nokians, but that’s why the N900 announcements secretly also made me a little bit proud. To whoever of us that worked on this stuff: guys, we’re all doing a great job. Let’s make the next one even better!

As it should be

Last week I wrote down why I believe the model should not have anything about columns. In .NET many people only ever used DataTables as their models. Because of that they often believe that in .NET the model must contain the columns.

They forget that DataGridView ‘s DataSource doesn’t require a DataTable at all, DataTable just happens to implement what DataSource needs: IList. It’s correct that if the model has all information that .NET’s many databinding components will get all the information they need out of your model. But it ain’t true that this is the only way nor a by-design in .NET. In .NET the by-design is that the view has all this and the model *can* pass it, if it has it, but it doesn’t have to.

In this example I illustrate that in .NET you can do a databinding with a simple .NET array. In .NET simple arrays implement IList. When a column of the DataGridView isn’t ReadOnly the property setter of the instance in the array will be called after the user edited the cell. I’ll illustrate this in the dataGridView1_CellEndEdit method: the property setter of the property Age of the Person instance being edited will be called. The view will as a result of a Refresh fetch the model’s new values. The Changed property will be rendered as True, for the Person that got changed.

People with VS.NET can drag a DataGridView and a Button on a Windows Form, and copypaste the Person class, button1_Click’s and dataGridView1_CellEndEdit’s code over. It’ll work.

// No DataTable, I'm not even importing System.Data, IList is fine
using System;
using System.Windows.Forms;

public partial class Form1 : Form
{
    public Form1() [+]

    private void button1_Click(object sender, EventArgs e) {
        dataGridView1.AutoGenerateColumns = false;

        DataGridViewColumn column;

        column = new DataGridViewTextBoxColumn();

        column.DataPropertyName = "Name";
        column.HeaderText = "The name of the person";
        column.Width = 180;

        // As you can see we are not doing anything on the model
        // to tell the view what the columns are.

        dataGridView1.Columns.Add(column);

        column = new DataGridViewTextBoxColumn();
        column.DataPropertyName = "Age";
        column.HeaderText = "The age";
        column.Width = 70;

        // Let's make this one editable
        column.ReadOnly = false;

        // We're just telling the view about the properties it
        // needs to bind, using the DataPropertyName member of
        // a DataGridViewColumn

        dataGridView1.Columns.Add(column);

        // Let's add a column that will show us that the view
        // will fetch property values at refresh

        column = new DataGridViewTextBoxColumn();
        column.DataPropertyName = "Changed";
        column.HeaderText = "?";
        column.Width = 45;

        dataGridView1.Columns.Add(column);

        // This is a normal array in .NET: it implements IList.
        // An IList is a collection with a known order.

        Person[] people = new Person[2];

        // Let's create two people in this array

        people[0] = new Person();
        people[0].Name = "Jos";
        people[0].Age = 30;
        people[0].Changed = false;

        people[1] = new Person();
        people[1].Name = "Jan";
        people[1].Age = 25;
        people[1].Changed = false;

        // And let's set the model of the view to be that array

        dataGridView1.DataSource = people;
        dataGridView1.CellEndEdit += new DataGridViewCellEventHandler(dataGridView1_CellEndEdit);
    }

    void dataGridView1_CellEndEdit(object sender, DataGridViewCellEventArgs e)
    {
        // This makes the view refresh its currently visible values, by reading
        // them from the model again. This callback happens after the user is
        // done editing a cell.

        dataGridView1.Refresh();
    }
}

public class Person {
    private string name, city;
    private uint age;
    private bool changed;
    public string Name {
        get { return name; }
        set { name = value; }
    }
    public bool Changed {
        get { return changed; }
        set { changed = value; }
    }
    public uint Age {
        get { return age; }
        set {
            age = value;
            Changed = true;
        }
    }
    public string City {
        get { return city; }
        set { city = value; }
    }
}

You can compare a GtkTreeModel with a DataTable in .NET: it’s a model that has its own memory storage and it contains both rows and columns. This means that GtkTreeModel isn’t a generic model, like IList in .NET actually is. With GtkTreeModel you must always represent your data as rows and columns. Even if the data ain’t rows and columns.

I indeed believe that Microsoft got databinding right in their .NET platform, and that Gtk+’s GtkTreeView and GtkTreeModel got it wrong.

Also feel free to have a huge array of Person instances. It’ll only read property values of the visible ones (plus a few more, shouldn’t be much). Fun tip: write something to the console in the property getters of the Person class, and start scrolling. Now you can easily discover yourself how to do lazy loading tricks with MVC in .NET, and make things scale.

TreeModel ZERO, a taste of life as it should be

If bugmasters are allowed to blog wishlists, then developers should also be allowed to write them! Which is why I wrote my wishlist!

Gtk.TreeModel was in my humble opinion designed wrong. In API design should an interface be just one thing.

A little bit of history

Many framework designers have repeated this in the past. Two of the best framework designers that we have on this planet, Krysztof Cwalina and Brad Abrams from Microsoft, added the meme to one of their books. It would be unfair to only mention those two guys and not the other people at Microsoft, and before that at the Delphi team at Borland. Brian Pepin notes at page 83 of Framework Design Guidelines: “Another sign that you’ve got a well defined interface is that the interface does exactly one thing. If you have an interface that has a grab bag of functionality, that’s a warning sign.”

The problem

What are the things a Gtk.TreeModel are or represent?

  • It’s something that is iterable
  • It’s something that is an iterator
  • It’s apparently something that has columns, which should have been at the View’s side of the story
  • It’s something that can be a tree
  • It’s something that emits row changes

That’s not one thing, and therefore we have a warning sign. If I count it correctly that’s at least five things, so that’s a big warning sign.

I’m sure I can come up with a few other things that a Gtk.TreeModel actually represents. For example its unref_node and ref_node make me think that it’s a garbage collector or something, too.

This is absolutely not good. I believe it is what makes the interface shockingly complicated. Because none of those five things can be made reusable this way.

What I think would be the right way

A prerequisite for this, and presumably also the reason why Gtk+ developers decided to do Gtk.TreeModel the way they did it a few years ago, is a collection framework.

Sadly is this proposal being ~ignored by the current GLib maintainers. Understandable because everybody is overloaded and busy, but in my opinion it’s nonetheless blocking us from heading in the right direction.

There are by the way quite a lot of other reasons mentioned on the proposal. This is just one of them.

interface GLib.Iterable {
	Iterator iterator();
}

interface GLib.Iterator {
	bool next ();
	object current;
}

Next would be recursive iterators or trees. There are many ways to represent these, but I’ll just take a simple route. Remember that when picking an API design, the most simple idea is often the most right one. But yeah, you can probably improve this.

interface GLib.TreeIterable : GLib.Iterable {
	GLib.TreeIterable get_children ();
	int n_children;
	bool has_child (GLib.TreeIterable e);
	GLib.TreeIterable parent;
}

In Gtk+ we would have the view, of course. It would hold the columns, as it should be.

class Gtk.TreeView {
	int n_columns;
	GLib.Type get_column_type (int n);
	GLib.TreeModel model;
	Gtk.ColumnBinding binding;
	Gtk.TreeView (GLib.TreeModel m);
	GLib.ColumnBinding column_binding;
}

We don’t have guaranteed introspection in Gtk+. To do the binding between a column in the view and a property of an instance in the model we need some code. In Gtk.TreeModel this is the get_value function.

It shouldn’t be part of the Gtk.TreeModel: That way it ain’t reusable and will it require each person implementing a Gtk.TreeModel to reinvent the code.

abstract class GLib.ColumnBinding {
	abstract GLib.Value get_value (GLib.TreeModel model,
	                               GLib.TreeIterable e,
	                               int column);
}

Let’s have some concrete column bindings:

class Gtk.TreeStoreColumnBinding : GLib.ColumnBinding {
}

class Gtk.ListStoreColumnBinding : Gtk.TreeStoreColumnBinding {
}

If we do have introspection we can do the same thing .NET offers: Link up the column number with a property name that can be found in the type of the instances that the model holds.

class GI.IntrospectColumnBinding : GLib.ColumnBinding {
	void add_column (int column, string prop_name);
}

These wouldn’t change at all, except that they implement GLib.TreeModel instead of Gtk.TreeModel

class Gtk.TreeStore : GLib.TreeModel {
}

class Gtk.ListStore : GLib.TreeModel {
}

And then we are at Gtk.TreeModel, of course. Well just take everything that we don’t do yet. That’s the row change emissions, right? Personally I think rows are too specific. A model is something that can be iterated. Being iterable doesn’t mean that you have rows, it just means that you have things that the consumer, the view in a model’s case, can iterate to. Let’s call them nodes.

Gtk.TreePath sounds to me like serializing and deserializing a location. It’s nothing special, just a way to formulate pointing to a node in the tree. It’s the model that exposes this capability.

I’m not sure about flags. Maybe it should just be moved to Gtk.TreeView. I don’t get the point of the flags anyway. Both ITERS_PERSIST and LIST_ONLY sound like an implementation detail to me: not something you want to expose to the API anyway. But fine, for sake of completeness I’ll put it here.

interface GLib.TreeModel : GLib.TreeIterable {
	signal node_changed (GLib.TreeIterable e);
	signal node_inserted (GLib.TreeIterable e);
	signal node_deleted  (GLib.TreeIterable e);
	signal node_reordered (GLib.TreeIterable e);
	GLib.TreeModelFlags flags;
	GLib.TreePath get_path (GLib.TreeIterable e);
	GLib.TreeIterable get_node (GLib.TreePath p);
}

Who’ll start GLib 4.0? Let’s do this stuff while the desktop guys play with GNOME 3.0? Why not?

Async with the mainloop

A technique that we started using in Tracker is utilizing the mainloop to do asynchronous functions. We decided that avoiding threads is often not a bad idea.

Instead of instantly falling back to throwing work to a worker thread we try to encapsulate the work into a GSource’s callback, then we let the callback happen until all of the work is done.

An example

You probably know sqlite3’s backup API? If not, it’s fairly simple: you do sqlite3_backup_init, followed by a bunch of sqlite3_backup_step calls, finalizing with sqlite3_backup_finish. How does that work if we don’t want to block the mainloop?

I removed all error handling for keeping the code snippet short. If you want that you can take a look at the original code.

static gboolean
backup_file_step (gpointer user_data)
{
  BackupInfo *info = user_data; int i;
  for (i = 0; i < 100; i++) {
    if ((info->result = sqlite_backup_step(info->backup_db, 5)) != SQLITE_OK)
        return FALSE;
  }
  return TRUE;
}

static void
backup_file_finished (gpointer user_data)
{
  BackupInfo *info = user_data;
  GError *error = NULL;
  if (info->result != SQLITE_DONE) {
    g_set_error (&error, _DB_BACKUP_ERROR,
                 DB_BACKUP_ERROR_UNKNOWN,
                 "%s", sqlite3_errmsg (
                    info->backup_db));
  }
  if (info->finished)
    info->finished (error, info->user_data);
  if (info->destroy)
    info->destroy (info->user_data);
  g_clear_error (&error);
  sqlite3_backup_finish (info->backup);
  sqlite3_close (info->db);
  sqlite3_close (info->backup_db);
  g_free (info);
}

void
my_function_make_backup (const gchar *dbf, OnBackupFinished finished,
                         gpointer user_data, GDestroyNotify destroy)
{
  BackupInfo *info = g_new0(BackupInfo, 1);
  info->user_data = user_data;
  info->destroy = destroy;
  info->finished = finished;
  info->db = db;
  sqlite3_open_v2 (dbf, &info->db, SQLITE_OPEN_READONLY, NULL);
  sqlite3_open ("/tmp/backup.db", &info>backup_db);
  info->backup = sqlite3_backup_init (info->backup_db, "main",
                                      info->db, "main");
  g_idle_add_full (G_PRIORITY_DEFAULT, backup_file_step,
                   info, backup_file_finished);
}

Note that I’m not suggesting to throw away all your threads and GThreadPool uses now.
Note that just like with threads you have to be careful about shared data: this way you’ll allow that other events on the mainloop will interleave your backup procedure. This is async(ish), it’s precisely what you want, of course.

More introduction to RDF and SPARQL

Introduction

I plan to give an introduction to features like COUNT, FILTER REGEX and GROUP BY which are supported by Tracker‘s SPARQL engine. We support more such features but I have to start the introduction somewhere. And overloading people with introductions to all features wont help me much with explaining things.

Since my last introduction to RDF and SPARQL I have added a few relationships and actors to the game.

We have Morrel, Max and Sasha being dogs, Sheeba and Query are cats, Picca is still a parrot, Fred and John are contacts. Fred claims that John is his friend. I changed the ontology to allow friendships between the animals too: Sasha claims that Morrel and Max are her friends. Sheeba claims Query is her friend. John bought Query. Fred being inspired by John decided to also get some pets: Morrel, Sasha and Sheeba.

Ontology

Let’s put this story in Turtle:

<test:Picca> a test:Parrot, test:Pet ;
	test:name "Picca" .

<test:Max> a test:Dog, test:Pet ;
	test:name "Max" .

<test:Morrel> a test:Dog, test:Pet ;
	test:name "Morrel" ;
	test:hasFriend <test:Max> .

<test:Sasha> a test:Dog, test:Pet ;
	test:name "Sasha" ;
	test:hasFriend <test:Morrel> ;
	test:hasFriend <test:Max> .

<test:Sheeba> a test:Cat, test:Pet ;
	test:name "Sheeba" ;
	test:hasFriend <test:Query> .

<test:Query> a test:Cat, test:Pet ;
	test:name "Query" .

<test:John> a test:Contact ;
	test:owns <test:Max> ;
	test:owns <test:Picca> ;
	test:owns <test:Query> ;
	test:name "John" .

<test:Fred> a test:Contact ;
	test:hasFriend <test:John> ;
	test:name "Fred" ;
	test:owns <test:Morrel> ;
	test:owns <test:Sasha> ;
	test:owns <test:Sheeba> .

Querytime!

Let’s first start with all friend relationships:

SELECT ?subject ?friend
WHERE { ?subject test:hasFriend ?friend }

  test:Morrel, test:Max
  test:Sasha, test:Morrel
  test:Sasha, test:Max
  test:Sheeba, test:Query
  test:Fred, test:John

Just counting these is pretty simple. In SPARQL all selectable fields must have a variable name, so we add the “as c” here.

SELECT COUNT (?friend) AS c
WHERE { ?subject test:hasFriend ?friend }

  5

We counted friend relationships, of course. Let’s say we want to count how many friends each subject has. This is a more interesting query than the previous one.

SELECT ?subject COUNT (?friend) AS c
WHERE { ?subject test:hasFriend ?friend }
GROUP BY ?subject

  test:Fred, 1
  test:Morrel, 1
  test:Sasha, 2
  test:Sheeba, 1

Actually, we’re only interested in the human friends:

SELECT ?subject COUNT (?friend) AS c
WHERE { ?subject test:hasFriend ?friend .
        ?friend a test:Contact
} GROUP BY ?subject

  test:Fred, 1

No no, we are only interested in friends that are either cats or dogs:

SELECT ?subject COUNT (?friend) AS c
WHERE { ?subject test:hasFriend ?friend .
       ?friend a ?type .
       FILTER ( ?type = test:Dog || ?type = test:Cat)
} GROUP BY ?subject"

  test:Morrel, 1
  test:Sasha, 2
  test:Sheeba, 1

Now we are only interested in friends that are either a cat or a dog, but whose name starts with a ‘S’.

SELECT ?subject COUNT (?friend) as c
WHERE { ?subject test:hasFriend ?friend ;
                 test:name ?n .
       ?friend a ?type .
       FILTER ( ?type = test:Dog || ?type = test:Cat) .
       FILTER REGEX (?n, '^S', 'i')
} GROUP BY ?subject

  test:Sasha, 2
  test:Sheeba, 1

Conclusions

Should we stop talking about ontologies and start talking about searchboxes and user interfaces instead? Although I certainly agree more UI-stuff is needed, I’m not sure yet. RDF and SPARQL are also about relationships and roles. Not just about matching stuff. Whenever we explain the new Tracker to people, most are stuck with ‘matching’ in their mind. They don’t think about a lot of other use-cases.

Such a search is just one use-case starting point: user entered a random search string and gives zero other meaning about what he needs. Many more situations can be starting points: When I select a contact in a user interface designed to show an archive of messages that he once sent to me, the searchbox becomes much more narrow, much more helpful.

As soon as you have RDF and SPARQL, and with Tracker you do, an application developer can start taking into account relationships between resources: The relationship between a contact in Instant Messaging and the attachments in an E-mail that he as a person has sent to you. Why not combine it with friendship relationships synced from online services?

With a populated store you can make the relationship between a friend who joined you on a trip, and photos of a friend of your friend who suggested the holiday location.

With GeoClue integration we could link his photos up with actual location markers. You’d find these photos that came from the friend of your friend, and we could immediately feed the location markers to the GPS software on your phone.

I really hope application developers have more imagination than just global searchboxes.

And this is just a use-case that is technically already possible with today’s high-end phones.