Tracker’s write back support now in master

Whoohoo!

We just committed the support for write back in master.

What is it?

Tracker has a limited capability to write metadata back into the data resource. In case of a file that means writing it back into the file. For example writing some of the metadata the user sets using a SPARQL Update back into an MP3 file as ID3 tags.

Which ones do we support already?

Right now the write back capability is under development and only supports a bunch of fields for a few XMP formats (JPEG, PNG and TIFF) and the Title of MP3 files. In near future we will start supporting increasingly more fields.

Documentation?

For people who want to write support for their properties and file formats, read the documentation.

Party like it’s 2009!

Handling triplets arriving in tracker-store, CouchDB integration as use-case

At GCDS Jamie told us that he wants to make a plugin for tracker-store that writes all the triplets to a CouchDB instance.

Letting a CouchDB be a sort of offline backup isn’t very interesting. You want triples to go into the CouchDB at the moment of guaranteed storage: at commit time.

For the purpose of developing this we provide the following internal API.

typedef void (*TrackerStatementCallback) (const gchar *graph,
                                          const gchar *subject,
                                          const gchar *predicate,
                                          const gchar *object,
                                          GPtrArray   *rdf_types,
                                          gpointer     user_data);
typedef void (*TrackerCommitCallback)    (gpointer     user_data);

tracker_data_add_insert_statement_callback (TrackerStatementCallback callback,
                                            gpointer                 user_data);
tracker_data_add_delete_statement_callback (TrackerStatementCallback callback,
                                            gpointer                 user_data);
tracker_data_add_commit_statement_callback (TrackerCommitCallback callback,
                                            gpointer              user_data);

You’ll need to make a plugin for tracker-store and make the hook at the initialization of your plugin.

Current behaviour is when graph is NULL, it means that the default graph is being used. If it’s not NULL, it means that you probably don’t want the data in CouchDB: it’s data that’s coming from a miner. You probably only want to store data that is coming from the user. His applications won’t use FROM and INTO for their SPARQL Update queries, meaning that graph is NULL.

Very important is that your callback handler works with bottom halves: put your expensive task on a queue and handle the queued item somewhere else. You can for example use a GThreadPool or a GQueue plus a g_idle_add_full with G_PRIORITY_LOW callback picking items one by one on the mainloop. You should never have a TrackerStatementCallback or a TrackerCommitCallback that blocks. Not even a tiny tiny bit of blocking: it’ll bring everything in tracker-store on its knees. It’s why we aren’t giving you a public plugin API with a way to install your own plugins outside of the Tracker project.

By the way: we want to see code instead of talk before we further optimize things for this purpose.

Writeback, writing metadata back into your files

Today, I feel like exposing you to some bleeding edge development going on as we speak at the Tracker team. I know you’re scared of that and that’s precisely why I want to expose you! Hah.

We are prototyping writeback support for Tracker.

With writeback we mean writing metadata that the user passes to us via SPARQL UPDATE into the file that he’s describing.

This means that it must be about a thing that is stored, that it must update a property that we want to writeback and it means that we need to support the format.

OK, that’s three requirements before we write anything back. Let’s explain how this stuff works in the prototype!

In our prototype you mark properties that are eligible for being written into the files using tracker:writeback.

It goes like this:

nie:title a rdf:Property ;
   rdfs:label "Title" ;
   rdfs:comment "The title of the document" ;
   rdfs:subPropertyOf dc:title ;
   nrl:maxCardinality 1 ;
   rdfs:domain nie:InformationElement ;
   rdfs:range xsd:string ;
   tracker:fulltextIndexed true ;
   tracker:weight 10 ;
   tracker:writeback true .

Next you need a writeback module for tracker-writeback. We implemented a prototype one that can only write the title of MP3 files. It uses ID3lib‘s C API.

When the user is describing a file, the resource must have nie:isStoredAs. The property being changed ‘s tracker:writeback must be true. We want the value of the property too. That’s simple in SPARQL, right? Sure it is!

SELECT ?url ?predicate ?object {
    <$subject> ?predicate ?object ;
               nie:isStoredAs ?url .
    ?predicate tracker:writeback true
 }

You’ll find this query in the code, go look!

Now it’s simple: using ID3lib we map Nepomuk to ID3 and write it.

No don’t be afraid, we’re not going to writeback metadata that we found ourselves. We’ll only writeback data that the user provided in the form of a SPARQL Update on the default graph. No panic. Besides, using tracker-writeback is going to be completely optional (just don’t run it).

This is a prototype, I repeat, this is a prototype. No expectations yet please. Just feel exposed to scary stuff, get overly excited and then join us by contributing. It’s all public what we’re doing in the branch ‘writeback’.

ps. Whether this will be Maemo’s future metadata-write stuff? Hmm, I don’t know. Do you know? ;-)