Tracker, our near future plans

Apparently

Apparently this hasn’t been echoed enough times. A lot of teams are still wondering what they should use if they want to store RDF metadata in Nepomuk and how to query it.

What happened before

We have refactoring to bring Tracker’s codebase into a better state. This is being released as Tracker 0.6.9x. This one sentence is really not enough to describe the changes. We can’t continue talking about the past forever. Sorry guys.

We have introduced support for SPARQL and Nepomuk in Tracker. We also added the class-signals feature, Turtle import & export, and many other features like SPARQL UPDATE support. Making the storage engine effectively a generic Nepomuk RDF store that can be used to store and query RDF data.

What will happen

We are at this moment planning to rearchitect Tracker a little bit.
Among our plans we want to make the RDF metadata store standalone. The store stores your metadata using Nepomuk as ontology and enables the application developer to query in SPARQL. This means that it’ll be possible to use this storage service without the indexer even installed. This is already possible but right now we do the crawling and monitoring in the storage service.

We plan to move the crawling and monitoring to the indexer. One idea is that the indexer will instruct the extractor to do an analysis and then the extractor will push the extracted metadata to the RDF storage service. Making the indexer and extractor a provider & consumer like any other. Making them optional and separately packagable.

This because we get requests from other teams who don’t want the indexing. Modularizing is usually a good thing, so we now have plans to make this possible as a feature.

Other plans

Other plans that we haven’t thoroughly planned yet include support for custom ontologies. We have a good idea for this, though. We want to wait for it until after the rearchitecturing. Support for custom ontologies will include removing ontologies, installing ontologies and asking for a backup that’ll contain the metadata that is specific for an installed ontology.

Support for custom ontologies doesn’t mean that application developers should all go spastic and start making ontologies. I know you guys! Don’t do it! We want applications to reuse as much of the Nepomuk set as possible. The more Nepomuk gets reused, the more interopability between apps is possible.

Volume support in experimental Tracker

In Tracker’s trunk we have support for volumes. This means that we track removable devices appearing and disappearing. A removable device that disappears means that in your search result you wont see the resources that are on the disconnected removable device.

In trunk we keep a separate table with volume registrations for this.

In master we simply use the ontology. Which also means that we now make this information available to you as metadata in a clean way.

Some examples. List all the volumes that we know about:

tracker-sparql -q "SELECT ?o ?m ?z WHERE {
   ?o a tracker:Volume ;
   tracker:mountPoint ?m ;
   tracker:unmountDate ?z }"

That will return something like this (wrapping the line):

urn:nepomuk:datasource:/org/freed../Hal/devices/volume_uuid_XXX_ABCDE,
    file:///media/USBStick, 2009-04-01T13:38:20

The ones that we know about but aren’t mounted:

tracker-sparql -q "SELECT ?o ?m WHERE {
   ?o a tracker:Volume ;
   tracker:mountPoint ?m ;
   tracker:isMounted false } "

The ones that we know about and are mounted:

tracker-sparql -q "SELECT ?o ?m WHERE {
   ?o a tracker:Volume ;
   tracker:mountPoint ?m ;
   tracker:isMounted true } "

Let’s just for fun list the volumes that got unmounted before a specific date and didn’t get mounted anymore:

tracker-sparql -q "SELECT ?o ?m WHERE {
   ?o a tracker:Volume ;
   tracker:mountPoint ?m ;
   tracker:isMounted false ;
   tracker:unmountDate ?z .
   FILTER (?z < \"`date -u +"%Y-%m-%dT%H:%M:%S%z"`\") }"

I'd like to add that replacing HAL isn't our intention. Not at all. We depend on HAL to track this information ourselves. We need to know the availability of a volume and because we link every file resource to the volume's resource we can that way know about the availability of a resource.

It's under Tracker's own ontology prefix, and it's not really to be considered as decided or stable ontology API yet. We might change some things about it. It wasn't even a design goal to make this publicly accessible.

Why am I showing it? Well, because it's a nice way to explain one of our sprint tasks for the the next two weeks. I promised you guys that I would talk about what we are up to at Tracker. So here you have it!