Bypassing Tracker’s file system miner, for example for MTP daemons

Recapping from my last blog article; I worked a bit on this concept during the weekend.

When a program is responsible for delivery of a file to the file system that program knows precisely when the rename syscall, completing the file transfer transaction, takes place.

An example of such a program is an MTP daemon. I quote from wikipedia: A main reason for using MTP rather than, for example, the USB mass-storage device class (MSC) is that the latter operates at the granularity of a mass storage device block (usually in practice, a FAT block), rather than at the logical file level.

One solution for metadata extraction for those files is to have file monitoring on the target storage directory with Tracker’s FS miner. The unfortunate thing with such a solution is that file monitoring will inevitably always trigger after the rename syscall. This means that only moments after the transfer has completed, the system can update the RDF storage. Not during and not just in time.

With this new feature I plan to allow a software like an MTP daemon to be ahead of that. For example while the file is being transferred or just in time upfront and / or just after the rename syscall depending on the use-case and how the developer plans to use the feature.

The API might still change. I plan to for example allow passing the value of tracker:available among other useful properties for which a MTP daemon might want to safely tamper with the values (edit: this is done and API in this blog article is adapted). The tracker:available property can be used to indicate to other software the availability of a file. For example while the file is being transferred you could set it to false and right after the rename you set it to true.

When you are building a device that has no other entry points for user files or documents than MTP, this feature helps you turning off Tracker’s FS miner completely. This could be ideal for certain tablets and phones.

Currently it looks like this. Branch is available here:

static void
on_finished (GObject *none, GAsyncResult *result, gpointer user_data) {
    GMainLoop *loop = user_data;
    GError *error = NULL;
    gchar *sparql = tracker_extract_get_sparql_finish (result, &error);
    if (error == NULL) {
        g_print ("%s", sparql);
        g_free (sparql);
    } else
        g_error("%s", error->message);
    g_clear_error (&error);
    g_main_loop_quit (loop);
}   

int main (int argc, char **argv) {
    const gchar *file = "/tmp/file.png";
    const gchar *dest = "file:///home/pvanhoof/Documents/Photos/photo.png"
    const gchar *graph = "urn:mygraph"
    GMainLoop *loop;
    g_type_init();
    loop = g_main_loop_new (NULL, FALSE);
    tracker_extract_get_sparql (file, dest, graph, time(0), time(0),
                                TRUE, on_finished, loop);
    g_main_loop_run (loop);
    g_object_unref (loop);
}

This will result in something like this:

INSERT SILENT { GRAPH  <urn:mygraph> {
    _:file a nfo:FileDataObject , nie:InformationElement ;
	 nfo:fileName "photo.png" ;
	 nfo:fileSize 38155 ;
	 nfo:fileLastModified "2012-12-17T09:20:18Z" ;
	 nfo:fileLastAccessed "2012-12-17T09:20:18Z" ;
	 nie:isStoredAs _:file ;
	 nie:url "file:///home/pvanhoof/Documents/Photos/photo.png" ;
	 nie:mimeType "image/png" ;
	 a nfo:FileDataObject ;
	 nie:dataSource <urn:nepomuk:datasource:9291a450-etc-etc> ;
	 tracker:available true .
    _:file a nfo:Image , nmm:Photo ;
	 nfo:width 150 ;
	 nfo:height 192 ;
	 nmm:dlnaProfile "PNG_LRG" ;
         # more extracted metadata
	 nmm:dlnaMime "image/png" .
  } }

As usual with stuff that I blog about: this feature isn’t finished, it’s not in master yet, not even reviewed. The API might change. All the usual stuff.