RE: Scudraketten

Wanneer Isabel Albers iets schrijft ben ik aandachtig: wij moeten investeren in infrastructuur.

Dit creëert welvaart, distribueert efficiënt het geld en investeert in onze kinderen hun toekomst: iets wat nodig is; en waar we voor staan.

De besparingsinspanningen kunnen we beperken wat betreft investeringen in infrastructuur; we moeten ze des te meer doorvoeren wat betreft andere overheidsuitgaven.

Misschien moeten we bepaalde scudraketten lanceren? Een scudraket op de overheidsomvang zou geen slecht doen.

Een week mediastorm meemaken over hoe hard we snoeien in bepaalde overheidssectoren: laten we dat hard en ten gronde doen.

Laten we tegelijk investeren in de Belgische infrastructuur. Laten we veel investeren.

Rusland

De cultuur die heerst, leeft,  ploert en zweet op het grondgebied van Rusland zal niet verdwijnen. Zelfs niet na een nucleaire aanval tenzij die echt verschrikkelijk wreedaardig is. En dan zou ik ons verdommen dat we dat gedaan hebben zoals ik hoor te doen. Wij, Europeanen, moeten daar mee leven zoals zij met ons moeten leven.

We delivered

Damned guys, we’re too shy about what we delivered. When the N900 was made public we flooded the planets with our blogs about it. And now?

I’m proud of the software on this device. It’s good. Look at what Engadget is writing about it! Amazing. We should all be proud! And yes, I know about the turbulence in Nokia-land. Deal with it, it’s part of our job. Para-commandos don’t complain that they might get shot. They just know. It’s called research and development! (I know, bad metaphor)

I don’t remember that many good reviews about even the N900, and that phone was by many of its owners seen as among the best they’ve ever owned. Now is the time to support Harmattan the same way we passionately worked on the N900 and its predecessor tablets (N810, N800 and 770). Even if the N9’s future is uncertain: who cares? It’s mostly open source! And not open source in the ‘Android way’. You know what I mean.

The N9 will be a good phone. The Harmattan software is awesome. Note that Tracker and QSparql are being used by many of its standard applications. We have always been allowed to develop Tracker the way it’s supposed to be done. Like many other similar projects: in upstream.

As for short term future I can announce that we’re going to make Michael Meeks happy by finally solving the ever growing journal problem. Michael repeatedly and rightfully complained about this to us at conferences. Thanks Michael. I’ll write about how we’ll do it, soon. We have some ideas.

We have many other plans for long term future. But let’s for now work step by step. Our software, at least what goes to Harmattan, must be rock solid and very stable from now on. Introducing a serious regression would be a catastrophe.

I’m happy because with that growing journal – problem, I can finally focus on a tough coding problem again. I don’t like bugfixing-only periods. But yeah, I have enough experience to realize that sometimes this is needed.

And now, now we’re going to fight.

IPC performance improvements for insert queries

Although with SQLite WAL we have direct-access now, we don’t support direct-access for insert and delete SPARQL queries. Those queries when made using libtracker-sparql still go over D-Bus using Adrien’s FD passing D-Bus IPC technique. The library will do that for you.

After investigating a performance analysis by somebody from Intel we learned that there is still a significant overhead per each IPC call. In the analysis the person made miner-fs combine multiple insert transactions together and then send it over as a single big transaction. This was noticeably faster than making many individual IPC requests.

The problem with this is that if one of the many insert queries fail, they all fail: not good.

We’re now experimenting with a private API that allows you to pass n individual insert transactions, and get n errors back, using one IPC call.

The numbers are promising even on Desktop D-Bus (the test):

$ cd tests/functional-tests/
$ ./update-array-performance-test
First run (first update then array)
Array: 0.103675, Update: 0.139094
Reversing run (first array then update)
Array: 0.290607, Update: 0.161749
$ ./update-array-performance-test
First run (first update then array)
Array: 0.105920, Update: 0.137554
Reversing run (first array then update)
Array: 0.118785, Update: 0.130630
$ ./update-array-performance-test
First run (first update then array)
Array: 0.108501, Update: 0.136524
Reversing run (first array then update)
Array: 0.117308, Update: 0.151192
$

We’re now deciding whether or not the API will become public; returning arrays of errors isn’t exactly ‘nice’ or ‘standard’.

LRU cache for prepared statements in Tracker’s RDF store

While trying to handle a bug that had a description like “if I do this, tracker-store’s memory grows to 80MB and my device starts swapping”, we where surprised to learn that a sqlite3_stmt consumes about 5 kb heap. Auwch.

Before we didn’t think that those prepared statements where very large, so we threw all of them in a hashtable for in case the query was ran again later. However, if you collect thousands of such statements, memory consumption obviously grows.

We decided to implement a LRU cache for these prepared statements. For clients that access the database using direct-access the cache will be smaller, so that max consumption is only a few megabytes. Because our INSERT and DELETE queries are more reusable than SELECT queries, we split it into two different caches per thread.

The implementation is done with a simple intrinsic linked ring list. We’re still testing it a little bit to get good cache-size numbers. I guess it’ll go in master soon. For your testing pleasure you can find the branch here.

Support for SPARQL IN and NOT IN, the new class signals

I made some documentation about our SPARQL-IN feature that we recently added. I added some interesting use-cases like doing an insert and a delete based on in values.

For the new class signal API that we’re developing this and next week, we’ll probably emit the IDs that tracker:id() would give you if you’d use that on a resource. This means that IN is very useful for the purpose of giving you metadata of resources that are in the list of IDs that you just received from the class signal.

We never documented tracker:id() very much, as it’s not an RDF standard; rather it’s something Tracker specific. But neither are the class signals a RDF standard; they are Tracker specific too. I guess here that makes it usable in combo and turns the status of ‘internal API’, irrelevant.

We’re right now prototyping the new class signals API. It’ll probably be a “sa(iii)a(iii)”:

That’s class-name and two arrays of subject-id, predicate-id, object-id. The class-name is to allow D-Bus filtering. The first array are the deletes and the second are the inserts. We’ll only give you object-ids of non-literal objects (literal objects have no internal object-id). This means that we don’t throw literals to you in the signal (you need to make a query to get them, we’ll throw 0 to you in the signal).

We give you the object-ids because of a use-case that we didn’t cover yet:

Given triple <a> nie:isLogicalPartOf <b>. When <a> is deleted, how do you know <b> during the signal? So the feature request was to do a select ?b { <a> nie:isLogicalPartOf ?b } when <a> is deleted (so the client couldn’t do that query anymore).

With the new signal we’ll give you the ID of <b> when <a> is deleted. We’ll also implement a tracker:uri(integer id) allowing you to get <b> out of that ID. It’ll do something like this, but then much faster: select ?subject { ?subject a rdfs:Resource . FILTER (tracker:id(?subject) IN (%d)) }

I know there will be people screaming for all objects, also literals, in the signals, but we don’t want to flood your D-Bus daemon with all that data. Scream all you want. Really, we don’t. Just do a roundtrip query.

Domain indexes finished, technical conclusions

The support for domain specific indexes is, awaiting review / finished. Although we can further optimize it now. More on that later in this post. Image that you have this ontology:

nie:InformationElement a rdfs:Class .

nie:title a rdf:Property ;
  nrl:maxCardinality 1 ;
  rdfs:domain nie:InformationElement ;
  rdfs:range xsd:string .

nmm:MusicPiece a rdfs:Class ;
  rdfs:subClassOf nie:InformationElement .

nmm:beatsPerMinute a rdf:Property ;
  nrl:maxCardinality 1 ;
  rdfs:domain nmm:MusicPiece ;
  rdfs:range xsd:integer .

With that ontology there are three tables called “Resource”, “nmo:MusicPiece” and “nie:InformationElement” in SQLite’s schema:

  • The “Resource” table has ID and the subject string
  • The “nie:InformationElement” has ID and “nie:title”
  • The “nmm:MusicPiece” one has ID and “nmm:beatsPerMinute”

That’s fairly simple, right? The problem is that when you ORDER BY “nie:title” that you’ll cause a full table scan on “nie:InformationElement”. That’s not good, because there are less “nmm:MusicPiece” records than “nie:InformationElement” ones.

Imagine that we do this SPARQL query:

SELECT ?title WHERE {
   ?resource a nmm:MusicPiece ;
             nie:title ?title
} ORDER BY ?title

We translate that, for you, to this SQL on our schema:

SELECT   "title_u" FROM (
  SELECT "nmm:MusicPiece1"."ID" AS "resource_u",
         "nie:InformationElement2"."nie:title" AS "title_u"
  FROM   "nmm:MusicPiece" AS "nmm:MusicPiece1",
         "nie:InformationElement" AS "nie:InformationElement2"
  WHERE  "nmm:MusicPiece1"."ID" = "nie:InformationElement2"."ID"
  AND    "title_u" IS NOT NULL
) ORDER BY "title_u"

OK, so with support for domain indexes we change the ontology like this:

nmm:MusicPiece a rdfs:Class ;
  rdfs:subClassOf nie:InformationElement ;
  tracker:domainIndex nie:title .

Now we’ll have the three tables called “Resource”, “nmo:MusicPiece” and “nie:InformationElement” in SQLite’s schema. But they will look like this:

  • The “Resource” table has ID and the subject string
  • The “nie:InformationElement” has ID and “nie:title”
  • The “nmm:MusicPiece” table now has three columns called ID, “nmm:beatsPerMinute” and “nie:title”

The same data, for titles of music pieces, will be in both “nie:InformationElement” and “nmm:MusicPiece”. We copy to the mirror column during ontology change coping, and when new inserts happen.

When now the rdf:type is known in the SPARQL query as a nmm:MusicPiece, like in the query mentioned earlier, we know that we can use the “nie:title” from the “nmm:MusicPiece” table in SQLite. That allows us to generate you this SQL query:

SELECT   "title_u" FROM (
  SELECT "nmm:MusicPiece1"."ID" AS "resource_u",
         "nmm:MusicPiece1"."nie:title" AS "title_u"
  FROM   "nmm:MusicPiece" AS "nmm:MusicPiece1"
  WHERE  "title_u" IS NOT NULL
) ORDER BY "title_u"

A remaining optimization is when you request a rdf:type that is a subclass of nmm:MusicPiece, like this:

SELECT ?title WHERE {
  ?resource a nmm:MusicPiece, nie:InformationElement ;
            nie:title ?title
} ORDER BY ?title

It’s still not as bad as now the “nie:title” is still taken from the “nmm:MusicPiece” table. But the join with “nie:InformationElement” is still needlessly there (we could just do the earlier SQL query in this case):

SELECT   "title_u" FROM (
  SELECT "nmm:MusicPiece1"."ID" AS "resource_u",
         "nmm:MusicPiece1"."nie:title" AS "title_u"
  FROM   "nmm:MusicPiece" AS "nmm:MusicPiece1",
         "nie:InformationElement" AS "nie:InformationElement2"
  WHERE  "nmm:MusicPiece1"."ID" = "nie:InformationElement2"."ID"
  AND    "title_u" IS NOT NULL
) ORDER BY "title_u"

We will probably optimize this specific use-case further later this week.

Friday’s performance improvements in Tracker

The crawler’s modification time queries

Yesterday we optimized the crawler’s query that gets the modification time of files. We use this timestamp to know whether or not a file must be reindexed.

Originally, we used a custom SQLite function called tracker:uri-is-parent() in SPARQL. This, however, caused a full table scan. As long as your SQL table for nfo:FileDataObjects wasn’t too large, that wasn’t a huge problem. But it didn’t scale linear. I started with optimizing the function itself. It was using a strlen() so I replaced that with a sqlite3_value_bytes(). We only store UTF-8, so that worked fine. It gained me ~ 10%; not enough.

So this commit was a better improvement. First it makes nfo:belongsToContainer an indexed property. The x nfo:belongsToContainer p means x is in a directory p for file resources. The commit changes the query to use the property that is now indexed.

The original query before we started with this optimization took 1.090s when you had ~ 300,000 nfo:FileDataObject resources. The new query takes about 0.090s. It’s of course an unfair comparison because now we use an indexed property. Adding the index only took a total of 10s for a ~ 300,000 large table and the table is being queried while we index (while we insert into it). Do the math, it’s a huge win in all situations. For the SQLite freaks; the SQLite database grew by 4 MB, with all items in the table indexed.

PDF extractor

Another optimization I did earlier was the PDF extractor. Originally, we used the poppler-glib library. This library doesn’t allow us to set the OutputDev at runtime. If compiled with Cairo, the OutputDev is in some versions a CairoOutputDev. We don’t want all images in the PDF to be rendered to a Cairo surface. So I ported this back to C++ and made it always use a TextOutputDev instead. In poppler-glib master this appears to have improved (in git master poppler_page_get_text_page is always using a TextOutputDev).

Another major problem with poppler-glib is the huge amount of copying strings in heap. The performance to extract metadata and content text for a 70 page PDF document without any images went from 1.050s to 0.550s. A lot of it was caused by copying strings and GValue boxing due to GObject properties.

Table locked problem

Last week I improved D-Bus marshaling by using a database cursor. I forgot to handle SQLITE_LOCKED while Jürg and Carlos had been introducing multithreaded SELECT support. Not good. I fixed this; it was causing random Table locked errors.

Performance DBus handling of the query results in Tracker’s RDF service

Before

For returning the results of a SPARQL SELECT query we used to have a callback like this. I removed error handling, you can find the original here.

We need to marshal a database result_set to a GPtrArray because dbus-glib fancies that. This is a lot of boxing the strings into GValue and GStrv. It does allocations, so not good.

static void
query_callback(TrackerDBResultSet *result_set,GError *error,gpointer user_data)
{
  TrackerDBusMethodInfo *info = user_data;
  GPtrArray *values = tracker_dbus_query_result_to_ptr_array (result_set);
  dbus_g_method_return (info->context, values);
  tracker_dbus_results_ptr_array_free (&values);
}

void
tracker_resources_sparql_query (TrackerResources *self, const gchar *query,
                                DBusGMethodInvocation *context, GError **error)
{
  TrackerDBusMethodInfo *info = ...; guint request_id;
  TrackerResourcesPrivate *priv= ...; gchar *sender;
  info->context = context;
  tracker_store_sparql_query (query, TRACKER_STORE_PRIORITY_HIGH,
                              query_callback, ...,
                              info, destroy_method_info);
}

After

Last week I changed the asynchronous callback to return a database cursor. In SQLite that means an sqlite3_step(). SQLite returns const pointers to the data in the cell with its sqlite3_column_* APIs.

This means that now we’re not even copying the strings out of SQLite. Instead, we’re using them as const to fill in a raw DBusMessage:

static void
query_callback(TrackerDBCursor *cursor,GError *error,gpointer user_data)
{
  TrackerDBusMethodInfo *info = user_data;
  DBusMessage *reply; DBusMessageIter iter, rows_iter;
  guint cols; guint length = 0;
  reply = dbus_g_method_get_reply (info->context);
  dbus_message_iter_init_append (reply, &iter);
  cols = tracker_db_cursor_get_n_columns (cursor);
  dbus_message_iter_open_container (&iter, DBUS_TYPE_ARRAY,
                                    "as", &rows_iter);
  while (tracker_db_cursor_iter_next (cursor, NULL)) {
    DBusMessageIter cols_iter; guint i;
    dbus_message_iter_open_container (&rows_iter, DBUS_TYPE_ARRAY,
                                      "s", &cols_iter);
    for (i = 0; i < cols; i++, length++) {
      const gchar *result_str = tracker_db_cursor_get_string (cursor, i);
      dbus_message_iter_append_basic (&cols_iter,
                                      DBUS_TYPE_STRING,
                                      &result_str);
    }
    dbus_message_iter_close_container (&rows_iter, &cols_iter);
  }
  dbus_message_iter_close_container (&iter, &rows_iter);
  dbus_g_method_send_reply (info->context, reply);
}

Results

The test is a query on 13500 resources where we ask for two strings, repeated eleven times. I removed a first repeat from each round, because the first time the sqlite3_stmt still has to be created. This means that our measurement would get a few more milliseconds. I also directed the standard out to /dev/null to avoid the overhead created by the terminal. The results you see below are the value for “real”.

There is of course an overhead created by the “tracker-sparql” program. It does demarshaling using normal dbus-glib. If your application uses DBusMessage directly, then it can avoid the same overhead. But since for both rounds I used the same “tracker-sparql” it doesn’t matter for the measurement.

$ time tracker-sparql -q "SELECT ?u  ?m { ?u a rdfs:Resource ;
          tracker:modified ?m }" > /dev/null

Without the optimization:

0.361s, 0.399s, 0.327s, 0.355s, 0.340s, 0.377s, 0.346s, 0.380s, 0.381s, 0.393s, 0.345s

With the optimization:

0.279s, 0.271s, 0.305s, 0.296s, 0.295s, 0.294s, 0.295s, 0.244s, 0.289s, 0.237s, 0.307s

The improvement ranges between 7% and 40% with average improvement of 22%.

Focus on query performance

Every (good) developer knows that copying of memory and boxing, especially when dealing with a large amount of pieces like members of collections and the cells in a table, are a bad thing for your performance.

More experienced developers also know that novice developers tend to focus on just their algorithms to improve performance, while often the single biggest bottleneck is needless boxing and allocating. Experienced developers come up with algorithms that avoid boxing and copying; they master clever pragmatical engineering and know how to improve algorithms. A lot of newcomers use virtual machines and script languages that are terrible at giving you the tools to control this and then they start endless religious debates about how great their programming language is (as if it matters). (Anti-.NET people don’t get on your horses too soon: if you know what you are doing, C# is actually quite good here).

We were of course doing some silly copying ourselves. Apparently it had a significant impact on performance.

Once Jürg and Carlos have finished the work on parallelizing SELECT queries we plan to let the code that walks the SQLite statement fill in the DBusMessage directly without any memory copying or boxing (for marshalling to DBus). We found the get_reply and send_reply functions; they sound useful for this purpose.

I still don’t really like DBus as IPC for data transfer of Tracker’s RDF store’s query results. Personally I think I would go for a custom Unix socket here. But Jürg so far isn’t convinced. Admittedly he’s probably right; he’s always right. Still, DBus to me doesn’t feel like a good IPC for this data transfer..

We know about the requests to have direct access to the SQLite database from your own process. I explained in the bug that SQLite3 isn’t MVCC and that this means that your process will often get blocked for a long time on our transaction. A longer time than any IPC overhead takes.

Supporting ontology changes in Tracker

It used to be in Tracker that you couldn’t just change the ontology. When you did, you had to reboot the database. This means loosing all the non-embedded data. For example your tags or other such information that’s uniquely stored in Tracker’s RDF store.

This was of course utterly unacceptable and this was among the reasons why we kept 0.8 from being released for so long: we were afraid that we would need to make ontology changes during the 0.8 series.

So during 0.7 I added support for what I call modest ontology changes. This means adding a class, adding a property. But just that. Not changing an existing property. This was sufficient for 0.8 because now we could at least do some changes like adding a property to a class, or adding a new class. You know, making implementing the standard feature requests possible.

Last two weeks I worked on supporting more intrusive ontology changes. The branch that I’m working on currently supports changing tracker:notify for the signals on changes feature, tracker:writeback for the writeback features and tracker:indexed which controls the indexes in the SQLite tables.

But also certain range changes are supported. For example integer to string, double and boolean. String to integer, double and boolean. Double to integer, string and boolean. Range changes will sometimes of course mean data loss.

Plenty of code was also added to detect an unsupported ontology change and to ensure that we just abort the process and don’t do any changes in that case.

It’s all quite complex so it might take a while before the other team members have tested and reviewed all this. It should probably take even longer before it hits the stable 0.8 branch.

We wont yet open the doors to custom ontologies. Several reasons:

  • We want more testing on the support for ontology changes. We know that once we open the doors to custom ontologies that we’ll see usage of this rather sooner than later.
  • We don’t yet support removing properties and classes. This would be easy (drop the table and columns away and log the event in the journal) but it’s not yet supported. Mostly because we don’t need it ourselves (which is a good reason).
  • We don’t want you to meddle with the standard ontologies (we’ll do that, don’t worry). So we need a bit of ontology management code to also look in other directories, etc.
  • The error handling of unsupported ontology changes shouldn’t abort like mentioned above. Another piece of software shouldn’t make Tracker unusable just because they install junk ontologies.
  • We actually want to start using OSCAF‘s ontology format. Perhaps it’s better that we wait for this instead of later asking everybody to convert their custom ontologies?
  • We’re a bunch of pussies who are afraid of the can of worms that you guys’ custom ontologies will open.

But yes, you could say that the basics are being put in place as we speak.

Zürichsee

Today after I brought Tinne to the airport I drove around Zürichsee. She can’t stay in Switzerland the entire month; she has to go back to school on Monday.

While driving on the Seestrasse I started counting luxury cars. After I reached two for Lamborgini and three for Ferrari I started thinking: Zimmerberg Sihltal and Pfannenstiel must be expensive districts tooAnd yes, they are.

I was lucky today that it was nice weather. But wow, what a nice view on the mountain tops when you look south over Zürichsee. People from Zürich, you guys are so lucky! Such immense calming feeling the view gives me! For me, it beats sauna. And I’m a real sauna fan.

I’m thinking to check it out south of Zürich. But not the canton. I think the house prices are just exaggerated high in the canton of Zürich. I was thinking Sankt Gallen, Toggenburg. I’ve never been there; I’ll check it out tomorrow.

Hmmr, meteoswiss gives rain for tomorrow. Doesn’t matter.

Actually, when I came back from the airport the first thing I really did was fix coping with property changes in ontologies for Tracker. Yesterday it wasn’t my day, I think. I couldn’t find this damn problem in my code! And in the evening I lost three chess games in a row against Tinne. That’s really a bad score for me. Maybe after two weeks of playing chess almost every evening, she got better than me? Hmmrr, that’s a troubling idea.

Anyway, so when I got back from the airport I couldn’t resist beating the code problem that I didn’t find on Friday. I found it! It works!

I guess I’m both a dreamer and a realist programmer. But don’t tell my customers that I’m such a dreamer.

Bern, an idyllic capital city

Today Tinne and I visited Switzerland’s capital, Bern.

We were really surprised; we’d never imagined that a capital city could offer so much peace and calm. It felt good to be there.

The fountains, the old houses, the river and the snowy mountain peaks give the city an idyllic image.

Standing on the bridge, you see the roofs of all these lovely small houses.

The bear is the symbol of Bern. Near the House of Parliament there was this statue of a bear. Tinne just couldn’t resist to give it a hug. Bern has also got real bears. Unfortunately, Tinne was not allowed to cuddle those bears.

The House of Parliament is a truly impressive building. It looks over the snowy mountains, its people and its treasury, the National Bank of Switzerland.


As you can imagine, the National Bank building is a master piece as well. And even more impressive; it issues a world leading currency.

On the market square in Oerlikon we first saw this chess board on the street; black and white stones and giant chess pieces. In Bern there was also a giant chess board in the backyard of the House of Parliament. Tinne couldn’t resist to challenge me for a game of chess. (*edit*, Armin noted in a comment that the initial position of knight and bishop are swapped. And OMG, he’s right!)

And she won!

At the House of Parliament you get a stunning, idyllic view on the mountains of Switzerland.


Writeback, writing metadata back into your files

Today, I feel like exposing you to some bleeding edge development going on as we speak at the Tracker team. I know you’re scared of that and that’s precisely why I want to expose you! Hah.

We are prototyping writeback support for Tracker.

With writeback we mean writing metadata that the user passes to us via SPARQL UPDATE into the file that he’s describing.

This means that it must be about a thing that is stored, that it must update a property that we want to writeback and it means that we need to support the format.

OK, that’s three requirements before we write anything back. Let’s explain how this stuff works in the prototype!

In our prototype you mark properties that are eligible for being written into the files using tracker:writeback.

It goes like this:

nie:title a rdf:Property ;
   rdfs:label "Title" ;
   rdfs:comment "The title of the document" ;
   rdfs:subPropertyOf dc:title ;
   nrl:maxCardinality 1 ;
   rdfs:domain nie:InformationElement ;
   rdfs:range xsd:string ;
   tracker:fulltextIndexed true ;
   tracker:weight 10 ;
   tracker:writeback true .

Next you need a writeback module for tracker-writeback. We implemented a prototype one that can only write the title of MP3 files. It uses ID3lib‘s C API.

When the user is describing a file, the resource must have nie:isStoredAs. The property being changed ‘s tracker:writeback must be true. We want the value of the property too. That’s simple in SPARQL, right? Sure it is!

SELECT ?url ?predicate ?object {
    <$subject> ?predicate ?object ;
               nie:isStoredAs ?url .
    ?predicate tracker:writeback true
 }

You’ll find this query in the code, go look!

Now it’s simple: using ID3lib we map Nepomuk to ID3 and write it.

No don’t be afraid, we’re not going to writeback metadata that we found ourselves. We’ll only writeback data that the user provided in the form of a SPARQL Update on the default graph. No panic. Besides, using tracker-writeback is going to be completely optional (just don’t run it).

This is a prototype, I repeat, this is a prototype. No expectations yet please. Just feel exposed to scary stuff, get overly excited and then join us by contributing. It’s all public what we’re doing in the branch ‘writeback’.

ps. Whether this will be Maemo’s future metadata-write stuff? Hmm, I don’t know. Do you know? ;-)

Keeping the autotools guys happy with qmake

I’m still figuring out how to do the same thing with cmake, but various bloggers and comments appear to be promising that it’ll be even more easy.

But this is a message for probably all Nokia teams who are making Qt-based libraries:

First open your src/src.pro file and add this stuff:

CONFIG += create_pc create_prl
QMAKE_PKGCONFIG_REQUIRES = QtGui
QMAKE_PKGCONFIG_DESTDIR = pkgconfig
INSTALLS += target headers

Now open your debian/$package-dev.install file and add this line:

usr/lib/pkgconfig

You’ll be doing all the autotools people a tremendous favor.

Next, open the README file and document that you need to use qmake-qt4 on Debian or make either qmake-qt3 or qmake-qt4 work flawlessly with your build environment. Perhaps also mention how to set the install prefix, how to make qmake find and install .pc files in another location, stuff like that. I find that this is lacking for almost every Qt-based library.

You’ll be doing everybody who wants to use your software a tremendous favor.

Indentation

People,

Let’s all stop doing this:

static void
my_calling_function_wrong (void)
{
[tab]MyItem1 *item1;
[tab]MyItem2 *item2;
[tab]MyItem3 *item3;

[tab]my_long_funcion (item1,
[tab][tab][tab][tab]..item2,
[tab][tab][tab][tab]..item3);
}

And start doing this:

static void
my_calling_function_right (void)
{
[tab]MyItem1 *item1;
[tab]MyItem2 *item2;
[tab]MyItem3 *item3;

[tab]my_long_funcion (item1,
[tab].................item2,
[tab].................item3);
}

The former doesn’t make sense unless each and every code viewing text display understands Mode lines’ tab-width property. The latter just always works, with every normal text editor.

ps. The super cool guys at Anjuta have already fixed this for me. I’m sure the even more cool EMacsers and the uber cool vimers can also fix their text editors?

Unnecessary note: [tab] is a tab and . is a space in the examples.

A reason to get up in the morning

Ever since Nokia contacted me about improving Tinymail to make it suitable for their Modest E-mail client have they given me a reason to get up in the morning, to work on something of which I knew would someday kick ass.

With the Maemo5 based Nokia N900 device we’ll have Modest shipped by default, and Tracker being actively used by several of its softwares. Future is going to shine even brighter for Tracker. Hard to brag about it, Tracker is inherently a background thing. Ah, well, technical people know about it.

Having worked on Tracker for more than a year, I now understand Tracker’s potential. At first, while I was trying to make an API for- and store the summary of E-mail envelope headers, so that E-mail clients can access this in a memory efficient way, I was critical of this Tracker stuff.

But then I joined Ivan, Urho, Ottela, Martyn and Carlos who were working on Tracker. Later Jürg joined and at the Berlin Hackfest people like Rob Taylor, Jürg and Urho discussed replacing Tracker’s poorer own ontology with Nepomuk and replacing its query language with SPARQL.

Given the implied complexity I was again critical, but then that crazy Jürg guy in a few weeks time turned Tracker into 99.9% pure fine awesomeness. I quickly joined working on this crazy “vstore” branch. Since a few months we have convinced the other Tracker guys to just start calling it “master”.

Ever since I feel again like a student who is learning how to develop software. Jürg is utilizing so many good techniques and we’re implementing so many specifications that are just “the right thing to do”, that the beautify of it all could sometimes make me cry of happiness.

Thanks to creating the opportunity to develop on software that will be used on for example their N900 device, Nokia continues giving people like me a reason to get up in the morning.

Don’t tell the native Nokians, but that’s why the N900 announcements secretly also made me a little bit proud. To whoever of us that worked on this stuff: guys, we’re all doing a great job. Let’s make the next one even better!

As it should be

Last week I wrote down why I believe the model should not have anything about columns. In .NET many people only ever used DataTables as their models. Because of that they often believe that in .NET the model must contain the columns.

They forget that DataGridView ‘s DataSource doesn’t require a DataTable at all, DataTable just happens to implement what DataSource needs: IList. It’s correct that if the model has all information that .NET’s many databinding components will get all the information they need out of your model. But it ain’t true that this is the only way nor a by-design in .NET. In .NET the by-design is that the view has all this and the model *can* pass it, if it has it, but it doesn’t have to.

In this example I illustrate that in .NET you can do a databinding with a simple .NET array. In .NET simple arrays implement IList. When a column of the DataGridView isn’t ReadOnly the property setter of the instance in the array will be called after the user edited the cell. I’ll illustrate this in the dataGridView1_CellEndEdit method: the property setter of the property Age of the Person instance being edited will be called. The view will as a result of a Refresh fetch the model’s new values. The Changed property will be rendered as True, for the Person that got changed.

People with VS.NET can drag a DataGridView and a Button on a Windows Form, and copypaste the Person class, button1_Click’s and dataGridView1_CellEndEdit’s code over. It’ll work.

// No DataTable, I'm not even importing System.Data, IList is fine
using System;
using System.Windows.Forms;

public partial class Form1 : Form
{
    public Form1() [+]

    private void button1_Click(object sender, EventArgs e) {
        dataGridView1.AutoGenerateColumns = false;

        DataGridViewColumn column;

        column = new DataGridViewTextBoxColumn();

        column.DataPropertyName = "Name";
        column.HeaderText = "The name of the person";
        column.Width = 180;

        // As you can see we are not doing anything on the model
        // to tell the view what the columns are.

        dataGridView1.Columns.Add(column);

        column = new DataGridViewTextBoxColumn();
        column.DataPropertyName = "Age";
        column.HeaderText = "The age";
        column.Width = 70;

        // Let's make this one editable
        column.ReadOnly = false;

        // We're just telling the view about the properties it
        // needs to bind, using the DataPropertyName member of
        // a DataGridViewColumn

        dataGridView1.Columns.Add(column);

        // Let's add a column that will show us that the view
        // will fetch property values at refresh

        column = new DataGridViewTextBoxColumn();
        column.DataPropertyName = "Changed";
        column.HeaderText = "?";
        column.Width = 45;

        dataGridView1.Columns.Add(column);

        // This is a normal array in .NET: it implements IList.
        // An IList is a collection with a known order.

        Person[] people = new Person[2];

        // Let's create two people in this array

        people[0] = new Person();
        people[0].Name = "Jos";
        people[0].Age = 30;
        people[0].Changed = false;

        people[1] = new Person();
        people[1].Name = "Jan";
        people[1].Age = 25;
        people[1].Changed = false;

        // And let's set the model of the view to be that array

        dataGridView1.DataSource = people;
        dataGridView1.CellEndEdit += new DataGridViewCellEventHandler(dataGridView1_CellEndEdit);
    }

    void dataGridView1_CellEndEdit(object sender, DataGridViewCellEventArgs e)
    {
        // This makes the view refresh its currently visible values, by reading
        // them from the model again. This callback happens after the user is
        // done editing a cell.

        dataGridView1.Refresh();
    }
}

public class Person {
    private string name, city;
    private uint age;
    private bool changed;
    public string Name {
        get { return name; }
        set { name = value; }
    }
    public bool Changed {
        get { return changed; }
        set { changed = value; }
    }
    public uint Age {
        get { return age; }
        set {
            age = value;
            Changed = true;
        }
    }
    public string City {
        get { return city; }
        set { city = value; }
    }
}

You can compare a GtkTreeModel with a DataTable in .NET: it’s a model that has its own memory storage and it contains both rows and columns. This means that GtkTreeModel isn’t a generic model, like IList in .NET actually is. With GtkTreeModel you must always represent your data as rows and columns. Even if the data ain’t rows and columns.

I indeed believe that Microsoft got databinding right in their .NET platform, and that Gtk+’s GtkTreeView and GtkTreeModel got it wrong.

Also feel free to have a huge array of Person instances. It’ll only read property values of the visible ones (plus a few more, shouldn’t be much). Fun tip: write something to the console in the property getters of the Person class, and start scrolling. Now you can easily discover yourself how to do lazy loading tricks with MVC in .NET, and make things scale.

TreeModel ZERO, a taste of life as it should be

If bugmasters are allowed to blog wishlists, then developers should also be allowed to write them! Which is why I wrote my wishlist!

Gtk.TreeModel was in my humble opinion designed wrong. In API design should an interface be just one thing.

A little bit of history

Many framework designers have repeated this in the past. Two of the best framework designers that we have on this planet, Krysztof Cwalina and Brad Abrams from Microsoft, added the meme to one of their books. It would be unfair to only mention those two guys and not the other people at Microsoft, and before that at the Delphi team at Borland. Brian Pepin notes at page 83 of Framework Design Guidelines: “Another sign that you’ve got a well defined interface is that the interface does exactly one thing. If you have an interface that has a grab bag of functionality, that’s a warning sign.”

The problem

What are the things a Gtk.TreeModel are or represent?

  • It’s something that is iterable
  • It’s something that is an iterator
  • It’s apparently something that has columns, which should have been at the View’s side of the story
  • It’s something that can be a tree
  • It’s something that emits row changes

That’s not one thing, and therefore we have a warning sign. If I count it correctly that’s at least five things, so that’s a big warning sign.

I’m sure I can come up with a few other things that a Gtk.TreeModel actually represents. For example its unref_node and ref_node make me think that it’s a garbage collector or something, too.

This is absolutely not good. I believe it is what makes the interface shockingly complicated. Because none of those five things can be made reusable this way.

What I think would be the right way

A prerequisite for this, and presumably also the reason why Gtk+ developers decided to do Gtk.TreeModel the way they did it a few years ago, is a collection framework.

Sadly is this proposal being ~ignored by the current GLib maintainers. Understandable because everybody is overloaded and busy, but in my opinion it’s nonetheless blocking us from heading in the right direction.

There are by the way quite a lot of other reasons mentioned on the proposal. This is just one of them.

interface GLib.Iterable {
	Iterator iterator();
}

interface GLib.Iterator {
	bool next ();
	object current;
}

Next would be recursive iterators or trees. There are many ways to represent these, but I’ll just take a simple route. Remember that when picking an API design, the most simple idea is often the most right one. But yeah, you can probably improve this.

interface GLib.TreeIterable : GLib.Iterable {
	GLib.TreeIterable get_children ();
	int n_children;
	bool has_child (GLib.TreeIterable e);
	GLib.TreeIterable parent;
}

In Gtk+ we would have the view, of course. It would hold the columns, as it should be.

class Gtk.TreeView {
	int n_columns;
	GLib.Type get_column_type (int n);
	GLib.TreeModel model;
	Gtk.ColumnBinding binding;
	Gtk.TreeView (GLib.TreeModel m);
	GLib.ColumnBinding column_binding;
}

We don’t have guaranteed introspection in Gtk+. To do the binding between a column in the view and a property of an instance in the model we need some code. In Gtk.TreeModel this is the get_value function.

It shouldn’t be part of the Gtk.TreeModel: That way it ain’t reusable and will it require each person implementing a Gtk.TreeModel to reinvent the code.

abstract class GLib.ColumnBinding {
	abstract GLib.Value get_value (GLib.TreeModel model,
	                               GLib.TreeIterable e,
	                               int column);
}

Let’s have some concrete column bindings:

class Gtk.TreeStoreColumnBinding : GLib.ColumnBinding {
}

class Gtk.ListStoreColumnBinding : Gtk.TreeStoreColumnBinding {
}

If we do have introspection we can do the same thing .NET offers: Link up the column number with a property name that can be found in the type of the instances that the model holds.

class GI.IntrospectColumnBinding : GLib.ColumnBinding {
	void add_column (int column, string prop_name);
}

These wouldn’t change at all, except that they implement GLib.TreeModel instead of Gtk.TreeModel

class Gtk.TreeStore : GLib.TreeModel {
}

class Gtk.ListStore : GLib.TreeModel {
}

And then we are at Gtk.TreeModel, of course. Well just take everything that we don’t do yet. That’s the row change emissions, right? Personally I think rows are too specific. A model is something that can be iterated. Being iterable doesn’t mean that you have rows, it just means that you have things that the consumer, the view in a model’s case, can iterate to. Let’s call them nodes.

Gtk.TreePath sounds to me like serializing and deserializing a location. It’s nothing special, just a way to formulate pointing to a node in the tree. It’s the model that exposes this capability.

I’m not sure about flags. Maybe it should just be moved to Gtk.TreeView. I don’t get the point of the flags anyway. Both ITERS_PERSIST and LIST_ONLY sound like an implementation detail to me: not something you want to expose to the API anyway. But fine, for sake of completeness I’ll put it here.

interface GLib.TreeModel : GLib.TreeIterable {
	signal node_changed (GLib.TreeIterable e);
	signal node_inserted (GLib.TreeIterable e);
	signal node_deleted  (GLib.TreeIterable e);
	signal node_reordered (GLib.TreeIterable e);
	GLib.TreeModelFlags flags;
	GLib.TreePath get_path (GLib.TreeIterable e);
	GLib.TreeIterable get_node (GLib.TreePath p);
}

Who’ll start GLib 4.0? Let’s do this stuff while the desktop guys play with GNOME 3.0? Why not?