I’m currently involved in the Tracker project and our project will be presented by Ivan Frade at the Desktop Summit this Sunday.
We merged our experimental branch tracker-store to master. This means that our reachitecture plans for Tracker have mostly been implemented and are being pushed forward into the main development tree.
I will start with a comparison with Tracker’s 0.6.x series.
Tracker master:
- Uses SPARQL as query language
- Uses Nepomuk for its base ontologies
- Supports SPARQL Update
- Supports aggregates COUNT, AVG, SUM, MIN and MAX in SPARQL
- Operates for all its storage functionality as a separate binary
- Operates all its indexing, crawling and monitoring functionalities in a separately packagable binary
Tracker 0.6.9x:
- Uses RDFQuery as query language
- Has its own ontology
- Has very limited support for storing your own data
- Supports several aggregate functions in its query language
- Operates for all its storage functionality in the indexer
- Operates for all its query functionality in the permanent daemon
- Does file monitoring and crawling in the permanent daemon
- Operates all its indexing functionality in a separately packagable binary
Tracker master:
Architecture
The storage service uses the Nepomuk ontologies as schema. It allows you to both query and insert or delete data.
The fs-miner finds, crawls and monitors file resources. It also analyses those files and extracts the metadata. It instructs the storage service to store the metadata.
External applications and other such miners are allowed to both query and insert using the storage service. Priority is given to queries over stores.
Plugins that run in process of the application can push information into Tracker. We indeed don’t try to scan Evolution’s cache formats, we have a plugin that gets it out of Evolution and into Tracker.
Storage service’s API and IPC
The storage service gives priority to SELECT queries to ensure that apps in need of metadata get serviced quickly.
INSERT and DELETE calls get queued. SELECT ones get executed immediately. For apps that require consistency and/or insertion speed we provide a batch mode that has a commit barrier. When the commit calls back you know that everything that came before it, is in a consistent shape. We don’t support full transactions with rollback.
The standard API operates over DBus. This means while using it you are subject to DBus’s performance limitations. In SPARQL Update it is possible to group a lot of writes. Due to DBus’s latency overhead this is recommended when inserting larger sets of data. We’re experimenting with a custom IPC system, based on unix sockets, to get increased throughput for apps that want to put a lot of INSERTs onto our queue.
We provide a feature that signals on changes happening to certain types. You can see this as a poor man’s live search. Full live search for SPARQL is fairly complicated. Maybe in future we’ll implement something like that.
Ontology
We support the majority of the Nepomuk base ontologies and our so called filesystem miners will store found metadata using Nepomuk’s ontologies. We support static custom ontologies right now. This means that it’s impossible to dynamically add a new ontology unless you reset the entire database first.
We’re planning to support dynamically adding and removing ontologies. The ontology format that we use is Turtle.
Backup and import
Right now we support loading data into our database using either SPARQL Update, an experimental unix-socket based IPC, and by passing us a Turtle file.
We currently have no support for making a backup. Support for this is on priority planning. It will write a Turtle file (which can be loaded afterward).
Backup and import of ontology specific metadata
When we introduce support for custom ontologies it’ll be useful for apps that provided their own custom ontology to get a backup of just the data that has relevance to said ontology. We plan to provide a method to do that.
Volume support
Having a static custom ontology for volume support, volumes and their status is queryable over SPARQL. File resources also get linked to said volumes. This makes it possible to get the availability of a file resource. For example: return metadata about all photos that are located on a specific camera, although the camera isn’t connected to this device.
Volume support is a work in progress at this moment.