Tracker experimental merged to main development tree, Ivan’s presentation

I’m currently involved in the Tracker project and our project will be presented by Ivan Frade at the Desktop Summit this Sunday.

We merged our experimental branch tracker-store to master. This means that our reachitecture plans for Tracker have mostly been implemented and are being pushed forward into the main development tree.

I will start with a comparison with Tracker’s 0.6.x series.

Tracker master:

  • Uses SPARQL as query language
  • Uses Nepomuk for its base ontologies
  • Supports SPARQL Update
  • Supports aggregates COUNT, AVG, SUM, MIN and MAX in SPARQL
  • Operates for all its storage functionality as a separate binary
  • Operates all its indexing, crawling and monitoring functionalities in a separately packagable binary

Tracker 0.6.9x:

  • Uses RDFQuery as query language
  • Has its own ontology
  • Has very limited support for storing your own data
  • Supports several aggregate functions in its query language
  • Operates for all its storage functionality in the indexer
  • Operates for all its query functionality in the permanent daemon
  • Does file monitoring and crawling in the permanent daemon
  • Operates all its indexing functionality in a separately packagable binary

Tracker master:

Architecture

The storage service uses the Nepomuk ontologies as schema. It allows you to both query and insert or delete data.

The fs-miner finds, crawls and monitors file resources. It also analyses those files and extracts the metadata. It instructs the storage service to store the metadata.

External applications and other such miners are allowed to both query and insert using the storage service. Priority is given to queries over stores.

Plugins that run in process of the application can push information into Tracker. We indeed don’t try to scan Evolution’s cache formats, we have a plugin that gets it out of Evolution and into Tracker.

Storage service’s API and IPC

The storage service gives priority to SELECT queries to ensure that apps in need of metadata get serviced quickly.

INSERT and DELETE calls get queued. SELECT ones get executed immediately. For apps that require consistency and/or insertion speed we provide a batch mode that has a commit barrier. When the commit calls back you know that everything that came before it, is in a consistent shape. We don’t support full transactions with rollback.

The standard API operates over DBus. This means while using it you are subject to DBus’s performance limitations. In SPARQL Update it is possible to group a lot of writes. Due to DBus’s latency overhead this is recommended when inserting larger sets of data. We’re experimenting with a custom IPC system, based on unix sockets, to get increased throughput for apps that want to put a lot of INSERTs onto our queue.

We provide a feature that signals on changes happening to certain types. You can see this as a poor man’s live search. Full live search for SPARQL is fairly complicated. Maybe in future we’ll implement something like that.

Ontology

We support the majority of the Nepomuk base ontologies and our so called filesystem miners will store found metadata using Nepomuk’s ontologies. We support static custom ontologies right now. This means that it’s impossible to dynamically add a new ontology unless you reset the entire database first.

We’re planning to support dynamically adding and removing ontologies. The ontology format that we use is Turtle.

Backup and import

Right now we support loading data into our database using either SPARQL Update, an experimental unix-socket based IPC, and by passing us a Turtle file.

We currently have no support for making a backup. Support for this is on priority planning. It will write a Turtle file (which can be loaded afterward).

Backup and import of ontology specific metadata

When we introduce support for custom ontologies it’ll be useful for apps that provided their own custom ontology to get a backup of just the data that has relevance to said ontology. We plan to provide a method to do that.

Volume support

Having a static custom ontology for volume support, volumes and their status is queryable over SPARQL. File resources also get linked to said volumes. This makes it possible to get the availability of a file resource. For example: return metadata about all photos that are located on a specific camera, although the camera isn’t connected to this device.

Volume support is a work in progress at this moment.

2 thoughts on “Tracker experimental merged to main development tree, Ivan’s presentation”

  1. Would it be possible to look at enhancing the performance of D-Bus in your use-cases, rather than having to write a new IPC system? That makes me very sad… :(

  2. @Robert McQueen: I think it’s important for me to point out that we’re not planning to switch to another IPC. The unix-socket one is just experimental and if it would ever be used, then it would only be for SPARQL-Update (not query).

    We’re also very interested in improving DBus itself, instead. We of course want to avoid having to implement our own IPC stuff (but note that the experimental ipc thingy is fairly simple in both code and purpose, you can find it at tracker-store-ipc branch btw).

    Anyway, right now the experiment is more for testing than serious purposes. Don’t worry too much, we’re not stupid enough to actively want to implement our own IPC :-)

Comments are closed.