Database cursors used in Tracker

A cursor on a query of a database is a finger pointing to the current row. Most databases do this without pulling the entire resultset into memory. It’s indeed much like a C/C++ pointer, except that a pointer can only point to memory in your process’ virtual memory. A cursor is a bit more abstract.

POSIX developers can compare a cursor with a pointer to a region in an mmap. For people who don’t know about mmap, mmap can be used to map a file into your process’ memory. You get a C/C++ pointer back, from which you can read the data as-if it’s in memory. With mmap, when you create a pagefault, the kernel will pull pages into your memory (from the file, or whichever resource is behind the mapping).

In Tracker all database operations used to be much like how using g_file_get_contents works: you read the entire thing into memory, and then you operate on that memory. Internally it used the database’s cursor API too, of course. The sqlite3_step is sqlite’s cursor API too.

First the database has filled up its pagecache with this data, then you copied it to your application’s memory, then you used it, then you freed it.

That’s kinda silly! Why not use it straight from the database’s caches instead? That’s what you use a DB cursor for.

The result is less copying of memory. This means less memory fragmentation and fewer memory operations to perform (which should result in a small performance improvement).

This effort is ongoing but a lot of Tracker’s internal loops over resultsets are now using a cursor instead of a in-memory result-set.

A reason to get up in the morning

Ever since Nokia contacted me about improving Tinymail to make it suitable for their Modest E-mail client have they given me a reason to get up in the morning, to work on something of which I knew would someday kick ass.

With the Maemo5 based Nokia N900 device we’ll have Modest shipped by default, and Tracker being actively used by several of its softwares. Future is going to shine even brighter for Tracker. Hard to brag about it, Tracker is inherently a background thing. Ah, well, technical people know about it.

Having worked on Tracker for more than a year, I now understand Tracker’s potential. At first, while I was trying to make an API for- and store the summary of E-mail envelope headers, so that E-mail clients can access this in a memory efficient way, I was critical of this Tracker stuff.

But then I joined Ivan, Urho, Ottela, Martyn and Carlos who were working on Tracker. Later Jürg joined and at the Berlin Hackfest people like Rob Taylor, Jürg and Urho discussed replacing Tracker’s poorer own ontology with Nepomuk and replacing its query language with SPARQL.

Given the implied complexity I was again critical, but then that crazy Jürg guy in a few weeks time turned Tracker into 99.9% pure fine awesomeness. I quickly joined working on this crazy “vstore” branch. Since a few months we have convinced the other Tracker guys to just start calling it “master”.

Ever since I feel again like a student who is learning how to develop software. Jürg is utilizing so many good techniques and we’re implementing so many specifications that are just “the right thing to do”, that the beautify of it all could sometimes make me cry of happiness.

Thanks to creating the opportunity to develop on software that will be used on for example their N900 device, Nokia continues giving people like me a reason to get up in the morning.

Don’t tell the native Nokians, but that’s why the N900 announcements secretly also made me a little bit proud. To whoever of us that worked on this stuff: guys, we’re all doing a great job. Let’s make the next one even better!

As it should be

Last week I wrote down why I believe the model should not have anything about columns. In .NET many people only ever used DataTables as their models. Because of that they often believe that in .NET the model must contain the columns.

They forget that DataGridView ‘s DataSource doesn’t require a DataTable at all, DataTable just happens to implement what DataSource needs: IList. It’s correct that if the model has all information that .NET’s many databinding components will get all the information they need out of your model. But it ain’t true that this is the only way nor a by-design in .NET. In .NET the by-design is that the view has all this and the model *can* pass it, if it has it, but it doesn’t have to.

In this example I illustrate that in .NET you can do a databinding with a simple .NET array. In .NET simple arrays implement IList. When a column of the DataGridView isn’t ReadOnly the property setter of the instance in the array will be called after the user edited the cell. I’ll illustrate this in the dataGridView1_CellEndEdit method: the property setter of the property Age of the Person instance being edited will be called. The view will as a result of a Refresh fetch the model’s new values. The Changed property will be rendered as True, for the Person that got changed.

People with VS.NET can drag a DataGridView and a Button on a Windows Form, and copypaste the Person class, button1_Click’s and dataGridView1_CellEndEdit’s code over. It’ll work.

// No DataTable, I'm not even importing System.Data, IList is fine
using System;
using System.Windows.Forms;

public partial class Form1 : Form
{
    public Form1() [+]

    private void button1_Click(object sender, EventArgs e) {
        dataGridView1.AutoGenerateColumns = false;

        DataGridViewColumn column;

        column = new DataGridViewTextBoxColumn();

        column.DataPropertyName = "Name";
        column.HeaderText = "The name of the person";
        column.Width = 180;

        // As you can see we are not doing anything on the model
        // to tell the view what the columns are.

        dataGridView1.Columns.Add(column);

        column = new DataGridViewTextBoxColumn();
        column.DataPropertyName = "Age";
        column.HeaderText = "The age";
        column.Width = 70;

        // Let's make this one editable
        column.ReadOnly = false;

        // We're just telling the view about the properties it
        // needs to bind, using the DataPropertyName member of
        // a DataGridViewColumn

        dataGridView1.Columns.Add(column);

        // Let's add a column that will show us that the view
        // will fetch property values at refresh

        column = new DataGridViewTextBoxColumn();
        column.DataPropertyName = "Changed";
        column.HeaderText = "?";
        column.Width = 45;

        dataGridView1.Columns.Add(column);

        // This is a normal array in .NET: it implements IList.
        // An IList is a collection with a known order.

        Person[] people = new Person[2];

        // Let's create two people in this array

        people[0] = new Person();
        people[0].Name = "Jos";
        people[0].Age = 30;
        people[0].Changed = false;

        people[1] = new Person();
        people[1].Name = "Jan";
        people[1].Age = 25;
        people[1].Changed = false;

        // And let's set the model of the view to be that array

        dataGridView1.DataSource = people;
        dataGridView1.CellEndEdit += new DataGridViewCellEventHandler(dataGridView1_CellEndEdit);
    }

    void dataGridView1_CellEndEdit(object sender, DataGridViewCellEventArgs e)
    {
        // This makes the view refresh its currently visible values, by reading
        // them from the model again. This callback happens after the user is
        // done editing a cell.

        dataGridView1.Refresh();
    }
}

public class Person {
    private string name, city;
    private uint age;
    private bool changed;
    public string Name {
        get { return name; }
        set { name = value; }
    }
    public bool Changed {
        get { return changed; }
        set { changed = value; }
    }
    public uint Age {
        get { return age; }
        set {
            age = value;
            Changed = true;
        }
    }
    public string City {
        get { return city; }
        set { city = value; }
    }
}

You can compare a GtkTreeModel with a DataTable in .NET: it’s a model that has its own memory storage and it contains both rows and columns. This means that GtkTreeModel isn’t a generic model, like IList in .NET actually is. With GtkTreeModel you must always represent your data as rows and columns. Even if the data ain’t rows and columns.

I indeed believe that Microsoft got databinding right in their .NET platform, and that Gtk+’s GtkTreeView and GtkTreeModel got it wrong.

Also feel free to have a huge array of Person instances. It’ll only read property values of the visible ones (plus a few more, shouldn’t be much). Fun tip: write something to the console in the property getters of the Person class, and start scrolling. Now you can easily discover yourself how to do lazy loading tricks with MVC in .NET, and make things scale.

TreeModel ZERO, a taste of life as it should be

If bugmasters are allowed to blog wishlists, then developers should also be allowed to write them! Which is why I wrote my wishlist!

Gtk.TreeModel was in my humble opinion designed wrong. In API design should an interface be just one thing.

A little bit of history

Many framework designers have repeated this in the past. Two of the best framework designers that we have on this planet, Krysztof Cwalina and Brad Abrams from Microsoft, added the meme to one of their books. It would be unfair to only mention those two guys and not the other people at Microsoft, and before that at the Delphi team at Borland. Brian Pepin notes at page 83 of Framework Design Guidelines: “Another sign that you’ve got a well defined interface is that the interface does exactly one thing. If you have an interface that has a grab bag of functionality, that’s a warning sign.”

The problem

What are the things a Gtk.TreeModel are or represent?

  • It’s something that is iterable
  • It’s something that is an iterator
  • It’s apparently something that has columns, which should have been at the View’s side of the story
  • It’s something that can be a tree
  • It’s something that emits row changes

That’s not one thing, and therefore we have a warning sign. If I count it correctly that’s at least five things, so that’s a big warning sign.

I’m sure I can come up with a few other things that a Gtk.TreeModel actually represents. For example its unref_node and ref_node make me think that it’s a garbage collector or something, too.

This is absolutely not good. I believe it is what makes the interface shockingly complicated. Because none of those five things can be made reusable this way.

What I think would be the right way

A prerequisite for this, and presumably also the reason why Gtk+ developers decided to do Gtk.TreeModel the way they did it a few years ago, is a collection framework.

Sadly is this proposal being ~ignored by the current GLib maintainers. Understandable because everybody is overloaded and busy, but in my opinion it’s nonetheless blocking us from heading in the right direction.

There are by the way quite a lot of other reasons mentioned on the proposal. This is just one of them.

interface GLib.Iterable {
	Iterator iterator();
}

interface GLib.Iterator {
	bool next ();
	object current;
}

Next would be recursive iterators or trees. There are many ways to represent these, but I’ll just take a simple route. Remember that when picking an API design, the most simple idea is often the most right one. But yeah, you can probably improve this.

interface GLib.TreeIterable : GLib.Iterable {
	GLib.TreeIterable get_children ();
	int n_children;
	bool has_child (GLib.TreeIterable e);
	GLib.TreeIterable parent;
}

In Gtk+ we would have the view, of course. It would hold the columns, as it should be.

class Gtk.TreeView {
	int n_columns;
	GLib.Type get_column_type (int n);
	GLib.TreeModel model;
	Gtk.ColumnBinding binding;
	Gtk.TreeView (GLib.TreeModel m);
	GLib.ColumnBinding column_binding;
}

We don’t have guaranteed introspection in Gtk+. To do the binding between a column in the view and a property of an instance in the model we need some code. In Gtk.TreeModel this is the get_value function.

It shouldn’t be part of the Gtk.TreeModel: That way it ain’t reusable and will it require each person implementing a Gtk.TreeModel to reinvent the code.

abstract class GLib.ColumnBinding {
	abstract GLib.Value get_value (GLib.TreeModel model,
	                               GLib.TreeIterable e,
	                               int column);
}

Let’s have some concrete column bindings:

class Gtk.TreeStoreColumnBinding : GLib.ColumnBinding {
}

class Gtk.ListStoreColumnBinding : Gtk.TreeStoreColumnBinding {
}

If we do have introspection we can do the same thing .NET offers: Link up the column number with a property name that can be found in the type of the instances that the model holds.

class GI.IntrospectColumnBinding : GLib.ColumnBinding {
	void add_column (int column, string prop_name);
}

These wouldn’t change at all, except that they implement GLib.TreeModel instead of Gtk.TreeModel

class Gtk.TreeStore : GLib.TreeModel {
}

class Gtk.ListStore : GLib.TreeModel {
}

And then we are at Gtk.TreeModel, of course. Well just take everything that we don’t do yet. That’s the row change emissions, right? Personally I think rows are too specific. A model is something that can be iterated. Being iterable doesn’t mean that you have rows, it just means that you have things that the consumer, the view in a model’s case, can iterate to. Let’s call them nodes.

Gtk.TreePath sounds to me like serializing and deserializing a location. It’s nothing special, just a way to formulate pointing to a node in the tree. It’s the model that exposes this capability.

I’m not sure about flags. Maybe it should just be moved to Gtk.TreeView. I don’t get the point of the flags anyway. Both ITERS_PERSIST and LIST_ONLY sound like an implementation detail to me: not something you want to expose to the API anyway. But fine, for sake of completeness I’ll put it here.

interface GLib.TreeModel : GLib.TreeIterable {
	signal node_changed (GLib.TreeIterable e);
	signal node_inserted (GLib.TreeIterable e);
	signal node_deleted  (GLib.TreeIterable e);
	signal node_reordered (GLib.TreeIterable e);
	GLib.TreeModelFlags flags;
	GLib.TreePath get_path (GLib.TreeIterable e);
	GLib.TreeIterable get_node (GLib.TreePath p);
}

Who’ll start GLib 4.0? Let’s do this stuff while the desktop guys play with GNOME 3.0? Why not?