Tracker this, Tracker that, everything Tracker

Busy handling

I made an article about reporting busy status in Tracker before.

But then it wasn’t yet possible to queue a query while Tracker’s RDF store is busy. We’re making this possible following next unstable release. Yeah I know you guys hate that Tracker’s RDF store can be busy. But you tell us what else to do while restoring a backup, or while replaying a journal?

While we are replaying the journal, or restoring a backup, we’ll accept your result-hungry queries into our queue. Meanwhile you get progress and status indication over a DBus signal. Some documentation about this is available here.

SPARQL 1.1 Draft features: IN and NOT IN

We had a feature requests for supporting SPARQL IN and NOT IN. As usual, we’re ahead of the SPARQL Draft specification. But I don’t think IN and NOT IN will look much different in the end. Anyway, it was straightforward so I just implemented both.

It goes like this:

SELECT ?abc { ?abc a nie:InformationElement ;
                   nie:title ?title .
               FILTER (?title IN ('abc', 'def')) }
SELECT ?abc { ?abc a nie:InformationElement ;
                   nie:title ?title .
               FILTER (?title NOT IN ('xyz', 'def')) }

It’s particularly useful to get metadata about a defined set of resources (give me the author of this, this and that file)

Direct access

This work is progressing nicely. Most of the guys on the team are working on this, and it’s going to be awesome thanks to SQLite’s WAL journal mode. SQLite’s WAL mode is still under development and probably unstable here and there, but we’re trusting the SQLite guys with this anyway.

What is left to do for direct-access is cleaning up a bit, getting the small nasty things right. You know. The basics are all in place now.

We’re doing most of the library code in Vala, but clever people can easily imagine the C API valac makes from the .vala files here. That’s the abstract API that client developers will use. Unless you use a higher level API like libqttracker, QSparql, Hormiga or sparql-glib.

All of which still need to be adapted to the direct-access work that we’re doing. But we’re in close contact with all of the developers involved in those libraries. And they’re all thrilled to implement backends for the new stuff.

Plans

We plan to change the signals-on-changes or class-signals feature a bit so that the three signals are merged into one. The problem with three is that you can’t reliably identify a change-transaction this way (a rename of a file, for example).

Another thing on our list is merging Zeitgeist’s ontology. To the other team members at Tracker: guys, Zeitgeist has been waiting for three months now. Let’s just get this done!

Oh there are a lot of plans, to be honest.

I wonder when, if ever, we go in feature freeze. Hehe. I guess we’ll just have very short feature-freeze periods. Whatever, it’s fun.

MeeGo in cars

Hey BMW & co, if you guys want to learn how to write music players and playlists for car entertainment on MeeGo, get in touch! This Tracker that I’m talking about is on that MeeGo OS; being the Music’s metadata database is among its purposes.

I can’t wait to have a better music player playlist my car.

Or maybe some integration with the in-car GPS and the car owner’s appointments and meetings? With geo-tagged photos on the car owner’s phone? Automatic and instant synchronization with Nokia’s future phones? Sounds all very doable, even easy, to me. I’d want all that stuff. Use-cases!

Let’s talk!

Julian on TED

I try to avoid posting about the same subject twice in a row. But I also really think that Wikileaks is worth violating about any such rule in existence. Maybe I should make a category on my blog just for Wikileaks?

So TED has decided to do an interview with Julian Assange:

I’d like to point out that I congratulate and thank everybody, not just but also Julian, who’s involved. Thank you.

That today ‘s gonna be a good day

Today is the day the world is witnessing the most significant military leak in the history of mankind, so I have a feeling that today ‘s gonna be a good day.

To all the people at Wikileaks, and to all whistle blowers in past, present and future: you are heroes. You guy’s ideas will be with us for centuries ahead of us. You’ll be remembered in history books. Let’s make sure you guys will.

Why make things complicated?

There are no open source companies. There are companies and there are open source projects.

Some companies work on open source projects, some parent open source projects, some don’t.

Some of those companies are good at fostering a community that contributes to these open source projects. Others are unwilling and some don’t yet understand the process. And again others have many open source projects being done by teams that do get it and have at the same time other projects being done by teams that don’t get it. Actually that last dual situation is the most common among the large companies. You know, the ones that often sponsor your community’s main conference and the ones that employ your heros.

If you do a quick reality-check then you’ll conclude there are no black / white companies. Actually, nothing in life nor in ethics is black / white. Nothing at all.

What you do have is a small group of amazingly disturbing purists who do zero coding themselves (that is, near zero) but do think black / white, and consequently write a lot of absurd nonsense in blog post-comments, on slashdot in particular, forums and mailing lists. These people are the reason numéro uno why many companies quit trying to understand open source.

It’s sad that the actual (open source) developers have to waste time explaining companies, for whom they do consultancy, that these people can be ignored. It’s also sad that these purists have turned so vocal, even violent, that they often can’t really be ignored anymore: people’s employers have been harassed.

“You have to fire somebody because he’s being unethical by disagreeing with my religious believe-system that Microsoft is evil!”. Maybe it’s just me who’s behind on ethics in this world? Well, those people can still get lost because I, in ethics, disagree with them.

Now, let’s get back to the projects and away from the open source vs. open core debates. We have a lot of work to do. And a lot of companies to convince opening their projects.

Open source developers succeeded in (for example) getting some software on phones. The people who did aren’t the religious black / white people. Maybe the media around open source should track down the people who did, and write quite a bit more about their work, ideas and passion?

Finally, the best companies are driven by the ideas and passions of their best employees. Those are the people who you should admire. Not their company’s open core PR.

Neelie Kroes on open source


Video link

Wrapping up 4.57 billion years

In 4.57 billion years our solar system went from creating simple bacteria to a large group of species. Several of which highly capable of making fairly intelligent decisions, one of which capable of having the indulgence of believing that it can think. That’s us.

The sun has an estimated 5 billion years to go before it turns into a Red Giant that in its very early stages will wipe out truly every single idea that exists inside at least our own solar system.

Unless radio waves that our planet started emitting since we invented radio are seen and understood (which requires a recipient in the first place), that will be the ultimate end of all of our ideas and culture. Unless we figure out a way to let the ideas cultivate outside of our solar system. Just the ideas would already be an insane achievement.

But imagine going from bacteria to beings, colonized by bacteria, that think that they can think, in far less time than the current age of our sun. Unless, of course, bacteria somehow arrived into our solar system from outside (unlikely, but perhaps equally unlikely than us ever exporting our ideas and culture to another solar system).

Imagine what could happen in the next 5 billion years …

Domain indexes finished, technical conclusions

The support for domain specific indexes is, awaiting review / finished. Although we can further optimize it now. More on that later in this post. Image that you have this ontology:

nie:InformationElement a rdfs:Class .

nie:title a rdf:Property ;
  nrl:maxCardinality 1 ;
  rdfs:domain nie:InformationElement ;
  rdfs:range xsd:string .

nmm:MusicPiece a rdfs:Class ;
  rdfs:subClassOf nie:InformationElement .

nmm:beatsPerMinute a rdf:Property ;
  nrl:maxCardinality 1 ;
  rdfs:domain nmm:MusicPiece ;
  rdfs:range xsd:integer .

With that ontology there are three tables called “Resource”, “nmo:MusicPiece” and “nie:InformationElement” in SQLite’s schema:

  • The “Resource” table has ID and the subject string
  • The “nie:InformationElement” has ID and “nie:title”
  • The “nmm:MusicPiece” one has ID and “nmm:beatsPerMinute”

That’s fairly simple, right? The problem is that when you ORDER BY “nie:title” that you’ll cause a full table scan on “nie:InformationElement”. That’s not good, because there are less “nmm:MusicPiece” records than “nie:InformationElement” ones.

Imagine that we do this SPARQL query:

SELECT ?title WHERE {
   ?resource a nmm:MusicPiece ;
             nie:title ?title
} ORDER BY ?title

We translate that, for you, to this SQL on our schema:

SELECT   "title_u" FROM (
  SELECT "nmm:MusicPiece1"."ID" AS "resource_u",
         "nie:InformationElement2"."nie:title" AS "title_u"
  FROM   "nmm:MusicPiece" AS "nmm:MusicPiece1",
         "nie:InformationElement" AS "nie:InformationElement2"
  WHERE  "nmm:MusicPiece1"."ID" = "nie:InformationElement2"."ID"
  AND    "title_u" IS NOT NULL
) ORDER BY "title_u"

OK, so with support for domain indexes we change the ontology like this:

nmm:MusicPiece a rdfs:Class ;
  rdfs:subClassOf nie:InformationElement ;
  tracker:domainIndex nie:title .

Now we’ll have the three tables called “Resource”, “nmo:MusicPiece” and “nie:InformationElement” in SQLite’s schema. But they will look like this:

  • The “Resource” table has ID and the subject string
  • The “nie:InformationElement” has ID and “nie:title”
  • The “nmm:MusicPiece” table now has three columns called ID, “nmm:beatsPerMinute” and “nie:title”

The same data, for titles of music pieces, will be in both “nie:InformationElement” and “nmm:MusicPiece”. We copy to the mirror column during ontology change coping, and when new inserts happen.

When now the rdf:type is known in the SPARQL query as a nmm:MusicPiece, like in the query mentioned earlier, we know that we can use the “nie:title” from the “nmm:MusicPiece” table in SQLite. That allows us to generate you this SQL query:

SELECT   "title_u" FROM (
  SELECT "nmm:MusicPiece1"."ID" AS "resource_u",
         "nmm:MusicPiece1"."nie:title" AS "title_u"
  FROM   "nmm:MusicPiece" AS "nmm:MusicPiece1"
  WHERE  "title_u" IS NOT NULL
) ORDER BY "title_u"

A remaining optimization is when you request a rdf:type that is a subclass of nmm:MusicPiece, like this:

SELECT ?title WHERE {
  ?resource a nmm:MusicPiece, nie:InformationElement ;
            nie:title ?title
} ORDER BY ?title

It’s still not as bad as now the “nie:title” is still taken from the “nmm:MusicPiece” table. But the join with “nie:InformationElement” is still needlessly there (we could just do the earlier SQL query in this case):

SELECT   "title_u" FROM (
  SELECT "nmm:MusicPiece1"."ID" AS "resource_u",
         "nmm:MusicPiece1"."nie:title" AS "title_u"
  FROM   "nmm:MusicPiece" AS "nmm:MusicPiece1",
         "nie:InformationElement" AS "nie:InformationElement2"
  WHERE  "nmm:MusicPiece1"."ID" = "nie:InformationElement2"."ID"
  AND    "title_u" IS NOT NULL
) ORDER BY "title_u"

We will probably optimize this specific use-case further later this week.

SQLite’s WAL, deleting a domain specific index

SQLite’s WAL

SQLite is working on WAL, which stands for Write Ahead Logging.

The new logging technique means that we can probably keep read statements open for multiple processes. It’s not full MVCC yet as writes are still not doable simultaneously. But in our use-case is reading with multiple processes vastly more important anyway.

We’re investigating WAL mode of SQLite thoroughly these next few days. Jürg is working most on this at the moment. If WAL is fit for our purpose then we’ll probably also start developing a direct-access library that’ll allow your process to connect directly with our SQLite database, avoiding any form of IPC.

Adrien‘s FD-passing is in master, though. And it’s performing quite well!

We’re thrilled that SQLite’s team is taking this direction with WAL. Very awesome guys!

Domain specific indexes

Yesterday I worked on support for deleting a domain specific index from the ontology. Because SQLite doesn’t support dropping a column with its ALTER support, I had to do it by renaming the original table, recreating the table without the mirror column, and then copying the data from the renamed table. And finally dropping the renamed table. It’s nasty, but it works. I think SQLite should just add DROP COLUMN to ALTER. Why is this so hard to add?

I finally got it working, now it must of course be tested and then again tested.

Next for the feature is adapting the SPARQL engine to start using the indexed mirror column and produce better performing SQL queries.

Working on domain specific indexes

So … what is involved in a “simple change” like what I wrote about yesterday?

First you add support for annotating the domain specific index in the ontology files. This is straight forward as we of course have a generic Turtle parser, and it’s just a matter of adding properties to certain classes, and filling the values from the ontology in in the instances in our in-memory representation of the ontology. You of course also need to change the CREATE-TABLE statements. Trivial.

Then you implement detecting changes in the ontology. And more complex; coping with the changes. This means doing ALTER on the SQL tables. You also need to copy from the InformationElement table to the MusicPiece table (I’m using MusicPiece to clarify, it’s of course generic) in case of such a domain specific index being added during an ontology change, and put an implicit index on the column. After all, that index is why we’re doing this.

I finished those two yesterday. I have not finished detecting a deletion of a domain specifix index yet. That will have to ALTER the table with a DROP of the column. The most difficult here is detecting the deletion itself. We don’t yet have any code to diff on multivalue properties in the ontology (the ontology is a collection of RDF statements like everything else, describing itself).

Today I finished writing copy values to the MusicPiece table’s mirror column

Next few days will be about adapting the SPARQL engine and of course coping with a deletion of a domain specific index. And then testing, and again testing. Mind that this has to work from a journal replay situation too. In which case no ontology is involved (it’s all stored in the history of the persistent journal).

Where’s my Redbull? Ah, waiting for me in the fridge. Good!