Putting an LRU in your code

For the ones who didn’t find the LRU in Tracker’s code (and for the ones who where lazy).

Let’s say we will have instances of something called a Statement. Each of those instances is relatively big. But we can recreate them relatively cheap. We will have a huge amount of them. But at any time we only need a handful of them.

The ones that are most recently used are most likely to be needed again soon.

First you make a structure that will hold some administration of the LRU:

typedef struct {
	Statement *head;
	Statement *tail;
	unsigned int size;
	unsigned int max;
} StatementLru;

Then we make the user of a Statement (a view or a model). I’ll be using a Map here. You can in Qt for example use QMap for this. Usually I want relatively fast access based on a key. You could also each time loop the stmt_lru to find the instance you want in useStatement based on something in Statement itself. That would rid yourself of the overhead of a map.

class StatementUser
{
	StatementUser();
	~StatementUser();
	void useStatement(KeyType key);
private:
	StatementLru stmt_lru;
	Map<KeyType, Statement*> stmts;
	StatementFactory stmt_factory;
}

Then we will add to the private fields of the Statement class the members prev and next: We’ll make a circular doubly linked list.

class Statement: QObject {
	Q_OBJECT
    ...
private:
	Statement *next;
	Statement *prev;
};

Next we initialize the LRU:

StatementUser::StatementUser() 
{
	stmt_lru.max = 500;
	stmt_lru.size = 0;		
}

Then we implement using the statements

void StatementUser::useStatement(KeyType key)
{
	Statement *stmt;

	if (!stmts.get (key, &stmt)) {

		stmt = stmt_factory.createStatement(key);

		stmts.insert (key, stmt);

		/* So the ring looks a bit like this: *
		 *                                    *
		 *    .--tail  .--head                *
		 *    |        |                      *
		 *  [p-n] -> [p-n] -> [p-n] -> [p-n]  *
		 *    ^                          |    *
		 *    `- [n-p] <- [n-p] <--------'    */

		if (stmt_lru.size >= stmt_lru.max) {
			Statement *new_head;

		/* We reached max-size of the LRU stmt cache. Destroy current
		 * least recently used (stmt_lru.head) and fix the ring. For
		 * that we take out the current head, and close the ring.
		 * Then we assign head->next as new head. */

			new_head = stmt_lru.head->next;
			auto to_del = stmts.find (stmt_lru.head);
			stmts.remove (to_del);
			delete stmt_lru.head;
			stmt_lru.size--;
			stmt_lru.head = new_head;
		} else {
			if (stmt_lru.size == 0) {
				stmt_lru.head = stmt;
				stmt_lru.tail = stmt;
			}
		}

	/* Set the current stmt (which is always new here) as the new tail
	 * (new most recent used). We insert current stmt between head and
	 * current tail, and we set tail to current stmt. */

		stmt_lru.size++;
		stmt->next = stmt_lru.head;
		stmt_lru.head->prev = stmt;

		stmt_lru.tail->next = stmt;
		stmt->prev = stmt_lru.tail;
		stmt_lru.tail = stmt;

	} else {
		if (stmt == stmt_lru.head) {

		/* Current stmt is least recently used, shift head and tail
		 * of the ring to efficiently make it most recently used. */

			stmt_lru.head = stmt_lru.head->next;
			stmt_lru.tail = stmt_lru.tail->next;
		} else if (stmt != stmt_lru.tail) {

		/* Current statement isn't most recently used, make it most
		 * recently used now (less efficient way than above). */

		/* Take stmt out of the list and close the ring */
			stmt->prev->next = stmt->next;
			stmt->next->prev = stmt->prev;

		/* Put stmt as tail (most recent used) */
			stmt->next = stmt_lru.head;
			stmt_lru.head->prev = stmt;
			stmt->prev = stmt_lru.tail;
			stmt_lru.tail->next = stmt;
			stmt_lru.tail = stmt;
		}

	/* if (stmt == tail), it's already the most recently used in the
	 * ring, so in this case we do nothing of course */
	}

	/* Use stmt */

	return;
}

In case StatementUser and Statement form a composition (StatementUser owns Statement, which is what makes most sense), don’t forget to delete the instances in the destructor of StatementUser. In the example’s case we used heap objects. You can loop the stmt_lru or the map here.

StatementUser::~StatementUser()
{
	Map<KeyType, Statement*>::iterator i;
    	for (i = stmts.begin(); i != stmts.end(); ++i) {
		delete i.value();
	}
}

Secretly reusing my own LRU code

Last week, I secretly reused my own LRU code in the model of the editor of a CNC machine (has truly huge files, needs a statement editor). I rewrote my own code, of course. It’s Qt based, not GLib. Wouldn’t work in original form anyway. But the same principle. Don’t tell Jürg who helped me write that, back then.

Extra points and free beer for people who can find it in Tracker’s code.

Up next: PADI master diver

A year or so after my PADI rescue, my diving club has convinced me to wake up about twelve times early in the morning on Saturday, to get me trained to become a PADI master diver.

The jokers in my club told me it’s not so hard as PADI Rescue training. The only hard part is Saturday morning. We’ll see, I wonder. heh.

Oh, and to our security services: well done catching those guys from Molenbeek and Vorst. Good job!

Oh, I got invited to NLGG to talk about Tracker in Utrecht tomorrow.

Hey guys

Have you guys stopped debating systemd like a bunch of morons already? Because I’ve been keeping myself away from the debate a bit: the amount of idiot was just too large for my mind.People who know me also know that quite a bit of idiot fits into it.

I remember when I was younger, somewhere in the beginning of the century, that we first debated ORBit-2, then Bonobo, then software foolishly written with it like Evolution, Mono (and the idea of rewriting Evolution in C#. But first we needed a development environment MonoDevelop to write it in – oh the gnomes). XFree86 and then the X.Org fork. Then Scaffolding and Anjuta. Beagle and Tracker (oh the gnomes). Rhythmbox versus Banshee (oh the gnomes). Desktop settings in gconf, then all sorts of gnome services, then having a shared mainloop implementation with Qt.

Then god knows what. Dconf, udev, gio, hal, FS monitoring: a lot of things that were silently but actually often bigger impact changes than systemd is because much much more real code had to be rewritten, not just badly written init.d-scripts. The Linux eco-system has reinvented itself several times without most people having noticed it.

Then finally D-Bus came. And yes, evil Lennart was there too. He was also one of those young guys working on stuff. I think evil evil pulseaudio by that time (thank god Lennart replaced the old utter crap we had with it). You know, working on stuff.

D-Bus’s debate began a bit like systemd’s debate: Everybody had feelings about their own IPC system being better because of this and that (most of which where really bad copies of xmms’s remote control infrastructure). It turned out that KDE got it mostly right with DCOP, so D-Bus copied a lot from it. It also opened a lot of IPC author’s eyes that message based IPC, uniform activation of services, introspection and a uniform way of defining the interface are all goddamned important things. Also other things, like tools for monitoring and debugging plus libraries for all goddamn popular programming environments and most importantly for IPC their mainloops, appeared to be really important. The uniformity between Qt/KDE and Gtk+/GNOME application’s IPC systems was quite a nice thing and a real enabler: suddenly the two worlds’ softwares could talk with each other. Without it, Tracker could not have happened on the N900 and N9. Or how do you think qt/qsparql talks with it?

Nowadays everybody who isn’t insane or has a really, really, really good reason (like awesome latency or something, although kdbus solves that too), and with exception of all Belgian Linux programmers (who for inexplicable reasons all want their own personal IPC – and then endlessly work on bridges to all other Belgian Linux programmer’s IPC systems), won’t write his own IPC system. They’ll just use D-Bus and get it over with (or they initially bridge to D-Bus, and refactor their own code away over time).

But anyway.

The D-Bus debate was among software developers. And sometimes teh morons joined. But they didn’t understand what the heck we where doing. It was easy to just keep the debate mostly technical. Besides, we had some (for them) really difficult to understand stuff to reply like “you have file descriptor passing for that”, “study it and come back”. Those who came back are now all expert D-Bus users (I btw think and/or remember that evil Lennart worked on FD passing in D-Bus).

Good times. Lot’s of debates.

But the systemd debate, not the software, the debate, is only moron.

Recently I’ve seen some people actually looking into it and learning it. And then reporting about what they learned. That’s not really moron of course. But then their blogs get morons in the comments. Morons all over the place.

Why aren’t they on their fantastic *BSD or Devuan or something already?

ps. Lennart, if you read this (I don’t think so): I don’t think you are evil. You’re super nice and fluffy. Thanks for all the fish!

Maalwarkstrodon

It’s a mythical beast that speaks in pornographic subplots and maintains direct communication with your girlfriends every wants and desires so as better to inform you on how to best please her. It has the feet of bonzi buddy, the torso of that man who uses 1 weird trick to perfect his abs, and the arms of the scientists that hate her. Most impressively, Maalwarkstrodon has a skull made from a Viagra, Levitra, Cialis, and Propecia alloy. This beast of malware belches sexy singles from former east-bloc soviet satellite states and is cloaked in the finest fashions from paris and milan, imported directly from Fujian china.

Maalwarkstrodon is incapable of offering any less than the best deals at 80% to 90% off, and will not rest until your 2 million dollar per month work-at-home career comes to fruition and the spoils of all true nigerian royalty are delivered unto those most deserving of a kings riches.

Maalwarkstrodon will also win the malware arms race.

nrl:maxCardinality one-to-many ontology changes

I added support for changing the nrl:maxCardinality property of an rdfs:Property from one to many. Earlier Martyn Russel reverted such an ontology change as this was a blocker for the Debian packaging by Michael Biebl.

We only support going from one to many. That’s because going from many to one would obviously imply data-loss (a string-list could work with CSV, but an int-list can’t be stored as CSV in a single-value int type – instead of trying to support nonsense I decided to just not do it at all).

More supported ontology changes can be found here.

Not sure if people care but this stuff was made while listening to Infected Mushroom.

Tracker supports volume management under a minimal environment

While Nemo Mobile OS doesn’t ship with udisks2 nor with the GLib/GIO GVfs2 modules that interact with it, we still wanted removable volume management working with the file indexer.

It means that types like GVolume and GVolumeMonitor in GLib’s GIO will fall back to GUnixVolume and GUnixVolumeMonitor using GUnixMount and GUnixMounts instead of using the more competent GVfs2 modules.

The GUnixMounts fallback uses the _PATH_MNTTAB, which generally points to /proc/mounts, to know what the mount points are.

Removable volumes usually aren’t configured in the /etc/fstab file, which would or could affect /proc/mounts, plus if you’d do it this way the UUID label can’t be known upfront (you don’t know which sdcard the user will insert). Tracker’s FS miner needs this label to uniquely identify a removable volume to know if a previously seen volume is returning.

If you look at gunixvolume.c’s g_unix_volume_get_identifier you’ll notice that it always returns NULL in case the UUID label isn’t set in the mtab file: the pure-Unix fall back implementations aren’t fit for non-typical desktop usage; it’s what udisks2 and GVfs2 normally provide for you. But we don’t have it on the Nemo Mobile OS.

The mount_add in miners/fs/tracker-storage.c luckily has an alternative that uses the mountpoint’s name (line ~592). We’ll use this facility to compensate for the lacking UUID.

Basically, we add the UUID of the device to the mountpoint’s directory name and Tracker’s existing volume management will generate a unique UUID using MD5 for each unique mountpoint directory. What follows is specific for Nemo Mobile and its systemd setup.

We added some udev rules to /etc/udev/rules.d/90-mount-sd.rules:

SUBSYSTEM=="block", KERNEL=="mmcblk1*", ACTION=="add", MODE="0660", TAG+="systemd", 
  ENV{SYSTEMD_WANTS}="mount-sd@%k.service", ENV{SYSTEMD_USER_WANTS}="tracker-miner-fs.service
  tracker-store.service"

We added /etc/systemd/system/mount-sd@.service:

[Unit]
Description=Handle sdcard
After=init-done.service dev-%i.device
BindsTo=dev-%i.device

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/sbin/mount-sd.sh add %i
ExecStop=/usr/sbin/mount-sd.sh remove %i

And we created mount-sd.sh:

if [ "$ACTION" = "add" ]; then
    eval "$(/sbin/blkid -c /dev/null -o export /dev/$2)"
    test -d $MNT/${UUID} || mkdir -p $MNT/${UUID}
    chown $DEF_UID:$DEF_GID $MNT $MNT/${UUID}
    touch $MNT/${UUID}
    mount ${DEVNAME} $MNT/${UUID} -o $MOUNT_OPTS || /bin/rmdir $MNT/${UUID}
    test -d $MNT/${UUID} && touch $MNT/${UUID}
else
    DIR=$(mount | grep -w ${DEVNAME} | cut -d \  -f 3)
    if [ -n "${DIR}" ] ; then
        umount $DIR || umount -l $DIR
    fi
fi

Now we just have to configure Tracker right:

gsettings set org.freedesktop.Tracker.Miner.Files index-removable-devices true

Let’s try that:

# Insert sdcard
[nemo@Jolla ~]$ mount | grep sdcard
/dev/mmcblk1 on /media/sdcard/F6D0-FC42 type vfat (rw,nosuid,nodev,noexec,...
[nemo@Jolla ~]$ 

[nemo@Jolla ~]$ touch  /media/sdcard/F6D0-FC42/test.txt
[nemo@Jolla ~]$ tracker-sparql -q "select tracker:available(?s) nfo:fileName(?s) \
     { ?s nie:url 'file:///media/sdcard/F6D0-FC42/test.txt' }"
Results:
  true, test.txt

# Take out the sdcard

[nemo@Jolla ~]$ mount | grep sdcard
[nemo@Jolla ~]$ tracker-sparql -q "select tracker:available(?s) nfo:fileName(?s) \
     { ?s nie:url 'file:///media/sdcard/F6D0-FC42/test.txt' }"
Results:
  (null), test.txt
[nemo@Jolla ~]$

FOSDEM presentation about Metadata Tracker

I will be doing a presentation about Tracker at FOSDEM this year.

Metadata Tracker is now being used not only on GNOME, the N900 and N9, but is also being used on the Jolla Phone. On top a software developer for several car brands, Pelagicore, claims to be using it with custom made ontologies; SerNet told us they are integrating Tracker for use as search engine backend for Apple OS X SMB clients and last year Tracker integration with Netatalk was done by NetAFP. Other hardware companies have approached the team about integrating the software with their products. In this presentation I’d like to highlight the difficulties those companies encountered and how the project deals with them, dependencies to get a minimal system up and running cleanly, recent things the upstream team is working on and I’d like to propose some future ideas.

Link on fosdem.org

Mr. Dillon; smartphone innovation in Europe ought to be about people’s privacy

Dear Mark,

Your team and you yourself are working on the Jolla Phone. I’m sure that you guys are doing a great job and although I think you’ve been generating hype and vaporware until we can actually buy the damn thing, I entrust you with leading them.

As their leader you should, I would like to, allow them to provide us with all of the device’s source code and build environments of their projects so that we can have the exact same binaries. With exactly the same I mean that it should be possible to use MD5 checksums. I’m sure you know what that means and you and I know that your team knows how to provide geeks like me with this. I worked with some of them together during Nokia’s Harmattan and Fremantle and we both know that you can easily identify who can make this happen.

The reason why is simple: I want Europe to develop a secure phone similar to how, among other open source projects, the Linux kernel can be trusted. By peer review of the source code.

Kind regards,

A former Harmattan developer who worked on a component of the Nokia N9 that stores the vast majority of user’s privacy.

ps. I also think that you should reunite Europe’s finest software developers and secure the funds to make this workable. But that’s another discussion which I’m eager to help you with.

A use-case for SPARQL and Nepomuk

As I got contacted by two different companies last few days who both had questions about integrating Tracker into their device, I started thinking that perhaps I should illustrate what Tracker can already do today.

I’m going to make a demo for the public transportation industry in combination with contacts and places of interest. Tracker’s ontologies cross many domains, of course (this is just an example).

I agree that in principle what I’m showing here isn’t rocket science. You can do this with almost any database technology. What is interesting is that as soon as many domains start sharing the ontology and store their data in a shared way, interesting queries and use-cases are made possible.

So let’s first insert a place of interest: the Pizza Hut in Nossegem

tracker-sparql -uq "
INSERT { _:1 a nco:PostalAddress ; nco:country 'Belgium';
               nco:streetAddress 'Weiveldlaan 259 Zaventem' ;
               nco:postalcode '1930' .
        _:2 a slo:Landmark; nie:title 'Pizza Hut Nossegem';
              slo:location [ a slo:GeoLocation;
                  slo:latitude '50.869949'; slo:longitude '4.490477';
                  slo:postalAddress _:1 ];
              slo:belongsToCategory slo:predefined-landmark-category-food-beverage  }"

And let’s add some busstops:

tracker-sparql -uq "
INSERT { _:1 a nco:PostalAddress ; nco:country 'Belgium';
               nco:streetAddress 'Leuvensesteenweg 544 Zaventem' ;
               nco:postalcode '1930' .
         _:2 a slo:Landmark; nie:title 'Busstop Sint-Martinusweg';
               slo:location [ a slo:GeoLocation;
                   slo:latitude '50.87523'; slo:longitude '4.49426';
                   slo:postalAddress _:1 ];
               slo:belongsToCategory slo:predefined-landmark-category-transport  }"
tracker-sparql -uq "
INSERT  { _:1 a nco:PostalAddress ; nco:country 'Belgium';
                nco:streetAddress 'Leuvensesteenweg 550 Zaventem' ;
                nco:postalcode '1930' .
          _:2 a slo:Landmark; nie:title 'Busstop Hoge-Wei';
                slo:location [ a slo:GeoLocation;
                    slo:latitude '50.875988'; slo:longitude '4.498208';
                    slo:postalAddress _:1 ];
                slo:belongsToCategory slo:predefined-landmark-category-transport  }"
tracker-sparql -uq "
INSERT  { _:1 a nco:PostalAddress ; nco:country 'Belgium';
                nco:streetAddress 'Guldensporenlei Turnhout' ;
                nco:postalcode '2300' .
          _:2 a slo:Landmark; nie:title 'Busstop Guldensporenlei';
                slo:location [ a slo:GeoLocation;
                    slo:latitude '51.325463'; slo:longitude '4.938047';
                    slo:postalAddress _:1 ];
                slo:belongsToCategory slo:predefined-landmark-category-transport  }"

Let’s now get all the busstops nearby the Pizza Hut in Nossegem:

tracker-sparql -q "
SELECT ?name ?lati ?long WHERE {
   ?p slo:belongsToCategory slo:predefined-landmark-category-food-beverage;
       slo:location [ slo:latitude ?plati; slo:longitude ?plong ] .
   ?b slo:belongsToCategory slo:predefined-landmark-category-transport ;
       slo:location [ slo:latitude ?lati; slo:longitude ?long ] ;
      nie:title ?name .
   FILTER (tracker:cartesian-distance (?lati, ?plati, ?long, ?plong) < 1000)
}"
Results:
  Busstop Sint-Martinusweg, 50.87523, 4.49426
  Busstop Hoge-Wei, 50.875988, 4.498208

This of course was an example with only slo:Landmark. But that slo:location property can be placed on any nie:InformationElement. Meaning that for example a nco:PersonContact can also be involved in such a cartesian-distance query (which is of course just an example).

Let’s make an example use-case: We want contact details of friends (with publicized coordinates) who are nearby a slo:Landmark that is in a food and beverage landmark category, so that the messenger application can prepare a text message window where you’ll type that you want to get together to get lunch at the Pizza Hut.

Ok, so let’s add some nco:PersonContact to our SPARQL endpoint who are nearby the Pizza Hut:

tracker-sparql -uq "
INSERT { _:1 a nco:PersonContact ; nco:fullname 'John Carmack';
               slo:location [ a slo:GeoLocation;
                   slo:latitude '51.325413'; slo:longitude '4.938037' ];
               nco:hasEmailAddress [ a nco:EmailAddress;
                 nco:emailAddress 'john.carmack@somewhere.com'] }"
tracker-sparql -uq "
INSERT { _:1 a nco:PersonContact ; nco:fullname 'Greg Kroah-Hartman';
               slo:location [ a slo:GeoLocation;
                   slo:latitude '51.325453'; slo:longitude '4.938027' ];
               nco:hasEmailAddress [ a nco:EmailAddress;
                 nco:emailAddress 'greg.kroah@somewhere.com'] }"

And let’s add one person who isn’t nearby the Pizza Hut in Nossegem:

tracker-sparql -uq "
INSERT { _:1 a nco:PersonContact ; nco:fullname 'Jean Pierre';
               slo:location [ a slo:GeoLocation;
                   slo:latitude '50.718091'; slo:longitude '4.880134' ];
               nco:hasEmailAddress [ a nco:EmailAddress;
                 nco:emailAddress 'jean.pierre@somewhere.com'] }"

And now, the query:

tracker-sparql -q "
SELECT ?name ?email ?lati ?long WHERE {
   ?p slo:belongsToCategory slo:predefined-landmark-category-food-beverage;
       slo:location [ slo:latitude ?plati; slo:longitude ?plong ] ;
      nie:title ?pname .
   ?b a nco:PersonContact;
        slo:location [ slo:latitude ?lati; slo:longitude ?long ] ;
      nco:fullname ?name ; nco:hasEmailAddress [ nco:emailAddress ?email ].
   FILTER (tracker:cartesian-distance (?lati, ?plati, ?long, ?plong) < 10000)
}"
Results:
  Greg Kroah-Hartman, greg.kroah@somewhere.com, 50.874715, 4.49158
  John Carmack, john.carmack@somewhere.com, 50.874715, 4.49154

These use-cases of course only illustrate the simplified location ontology in combination with the Nepomuk contacts ontology. There are many such domains in Nepomuk and when defining your own platform and/or a new domain on the desktop you can add (your own) ontologies. Mind that for the desktop you should preferably talk to Nepomuk first.

The strength of such a platform is also its weakness: if no information sources put their data into the SPARQL endpoint, no information sink can do queries that’ll yield meaningful results. You of course don’t have this problem in a contained environment where you define what does and what doesn’t get stored and where, like an embedded device.

A desktop like KDE or GNOME shouldn’t have this problem either, if only everybody would agree on the technology and share the ontologies. Which isn’t necessarily happening (fair point), although both KDE with Nepomuk-KDE and GNOME with Tracker share most of Nepomuk.

But indeed; if you don’t store anything in Tracker, it’s useless. That’s why Tracker comes with a file system miner and provides a framework for writing your own miners. The idea is that with time more and more applications will use Tracker, making it increasingly useful. Hopefully.

 

Bypassing Tracker’s file system miner, for example for MTP daemons

Recapping from my last blog article; I worked a bit on this concept during the weekend.

When a program is responsible for delivery of a file to the file system that program knows precisely when the rename syscall, completing the file transfer transaction, takes place.

An example of such a program is an MTP daemon. I quote from wikipedia: A main reason for using MTP rather than, for example, the USB mass-storage device class (MSC) is that the latter operates at the granularity of a mass storage device block (usually in practice, a FAT block), rather than at the logical file level.

One solution for metadata extraction for those files is to have file monitoring on the target storage directory with Tracker’s FS miner. The unfortunate thing with such a solution is that file monitoring will inevitably always trigger after the rename syscall. This means that only moments after the transfer has completed, the system can update the RDF storage. Not during and not just in time.

With this new feature I plan to allow a software like an MTP daemon to be ahead of that. For example while the file is being transferred or just in time upfront and / or just after the rename syscall depending on the use-case and how the developer plans to use the feature.

The API might still change. I plan to for example allow passing the value of tracker:available among other useful properties for which a MTP daemon might want to safely tamper with the values (edit: this is done and API in this blog article is adapted). The tracker:available property can be used to indicate to other software the availability of a file. For example while the file is being transferred you could set it to false and right after the rename you set it to true.

When you are building a device that has no other entry points for user files or documents than MTP, this feature helps you turning off Tracker’s FS miner completely. This could be ideal for certain tablets and phones.

Currently it looks like this. Branch is available here:

static void
on_finished (GObject *none, GAsyncResult *result, gpointer user_data) {
    GMainLoop *loop = user_data;
    GError *error = NULL;
    gchar *sparql = tracker_extract_get_sparql_finish (result, &error);
    if (error == NULL) {
        g_print ("%s", sparql);
        g_free (sparql);
    } else
        g_error("%s", error->message);
    g_clear_error (&error);
    g_main_loop_quit (loop);
}   

int main (int argc, char **argv) {
    const gchar *file = "/tmp/file.png";
    const gchar *dest = "file:///home/pvanhoof/Documents/Photos/photo.png"
    const gchar *graph = "urn:mygraph"
    GMainLoop *loop;
    g_type_init();
    loop = g_main_loop_new (NULL, FALSE);
    tracker_extract_get_sparql (file, dest, graph, time(0), time(0),
                                TRUE, on_finished, loop);
    g_main_loop_run (loop);
    g_object_unref (loop);
}

This will result in something like this:

INSERT SILENT { GRAPH  <urn:mygraph> {
    _:file a nfo:FileDataObject , nie:InformationElement ;
	 nfo:fileName "photo.png" ;
	 nfo:fileSize 38155 ;
	 nfo:fileLastModified "2012-12-17T09:20:18Z" ;
	 nfo:fileLastAccessed "2012-12-17T09:20:18Z" ;
	 nie:isStoredAs _:file ;
	 nie:url "file:///home/pvanhoof/Documents/Photos/photo.png" ;
	 nie:mimeType "image/png" ;
	 a nfo:FileDataObject ;
	 nie:dataSource <urn:nepomuk:datasource:9291a450-etc-etc> ;
	 tracker:available true .
    _:file a nfo:Image , nmm:Photo ;
	 nfo:width 150 ;
	 nfo:height 192 ;
	 nmm:dlnaProfile "PNG_LRG" ;
         # more extracted metadata
	 nmm:dlnaMime "image/png" .
  } }

As usual with stuff that I blog about: this feature isn’t finished, it’s not in master yet, not even reviewed. The API might change. All the usual stuff.

Battery drain on N9 caused by a combination of Battery-Icon, Tracker and Smartsearch

Tired of the fact that my N9 had few battery time I decided to “as a developer” investigate my device a little bit. Last time I did that I was still contracted by Nokia and a few days later I had to fly to Helsinki to help fix a Tracker in combination with contactsd bug. I’m btw. no longer working for Nokia since a few months. So this time I can’t fix it for everyone. Lemme write it here instead.

It’s pretty funny what is going on: I installed Battery-Icon at some point. The software is writing periodically to /usr/share/applications/battery-icon.desktop. Having been a developer at Nokia for the metadata subsystem I know that tracker-miner-fs will reindex .desktop files that change. You don’t really need to be a developer to know that: Tracker’s FS miner is, among other things, responsible for keeping up to date a list of known applications.

Because of Battery-Icon, which people are probably installing to monitor their battery, tracker-miner-fs wakes up to update the metadata. That in turn wakes up tracker-store to store the metadata. That in turn wakes up smartsearch which will fetch from Tracker some textual data. All three will consume power periodically because of this .desktop file write trigger. I’m guessing the power consumption is triggering Battery-Icon to update the .desktop file. And circular power consumption was born.

I guess I should file a bug on Battery-Icon and tell its author to update the .desktop file less often. I think he could  for example wait ten minutes before doing that write. Or is the user really interested in accurate battery information each and every second? Looks like Battery-Icon is even writing to the file more frequent every hour. Interesting behavior for a tool monitoring battery to do things in a way that influences power consumption significantly.

Btw, while it’s not fixed: devel-su (enable developer mode, install terminal and password for devel-su is rootme) on your N9 and chmod -x /usr/bin/smartsearch, reboot, then uninstall Battery-Icon and your battery will last longer. I know the guys who were or are on the smartsearch team are going to hate me for that advice. Sorry guys.

Avoiding duplicate album art storage on the N9

At Tracker (core component of Nokia N9‘s MeeGo Harmattan’s Content Framework) we extract album art out of music files like MP3s, and we do a heuristic scan in the same directory of the music files for files like cover.jpg.

Right now we use the media art storage spec which we at a Boston Summit a few years ago, together with the Banshee guys, came up with. This specification allows for artist + album media art.

This is a bit problematic now on the N9 because (embedded) album art is getting increasingly bigger. We’ve seen music stores with album art of up to 2MB. The storage space for this kind of data isn’t unlimited on the device. In particular is it a problem that for an album with say 20 songs by 20 different artists, with each having embedded album art, 20 times the same album art is stored. Just each time for a different artist-album combination.

To fix this we’re working on a solution that compares the MD5 of the image data of the file album-md5(space)-md5(album).jpg with the MD5 of the image data of the file album-md5(artist)-md5(album).jpg. If the contents are the same we will make a symlink from the latter to the former instead of creating a normal new album art file.

When none exist yet, we first make album-md5(space)-md5(album).jpg and then symlink album-md5(artist)-md5(album).jpg to it. And when the contents aren’t the same we create a normal file called album-md5(artist)-md5(album).jpg.

Consumers of the album art can now choose between using a space for artist if they are only interested in ‘just album’ album art, or filling in both artist and album for artist-album album art.

This is a first idea to solve this issue, we have some other ideas in mind for in case this solution comes with unexpected problems.

I usually blog about unfinished stuff. Also this time. You can find the work in progress here.

Null support for INSERT OR REPLACE available in master

About

Last week I wrote about adding a feature to our SPARQL Update’s INSERT OR REPLACE. With that feature it’s not needed to put a DELETE upfront the INSERT to clear a field. This makes our SPARQL-ish INSERT OR REPLACE in some ways more powerful than SQL’s UPDATE. Note, however, that all of INSERT OR REPLACE is non-standard in the SPARQL language. And this new null support certainly isn’t.

Support for null with INSERT OR REPLACE is now available in Tracker‘s master branch. How to use it is illustrated in the functional test. I’ll briefly explain the test.

For single value properties:

This is of course very simple.

INSERT { <subject> nie:title 'new test' }
INSERT OR REPLACE { <subject> nie:title null }

If you now select nie:title for <subject> then of course you’ll get that its nie:title field is unset.

For multi value properties:

Begin situation:

INSERT { <subject> a nie:DataObject, nie:InformationElement }
INSERT { <ds1> a nie:DataSource }
INSERT { <ds2> a nie:DataSource }
INSERT { <ds3> a nie:DataSource }
INSERT { <subject> nie:dataSource <ds1>, <ds2>, <ds3> }

This will be the test query I’ll use for all cases:

SELECT ?ds WHERE { <subject> nie:dataSource ?ds }

For the begin situation that of course gives us <ds1>, <ds2> and <ds3>.

With null upfront, reset of list, rewrite of new list:

INSERT OR REPLACE { <subject> nie:dataSource null, <ds1>, <ds2> }

This will give us <ds1> and <ds2> for the test query. The first null resets the existing list, then <ds1> and <ds2> are added. This is probably the most sensible one to use for multi value properties.

With null in the middle, rewrite of new list:

INSERT OR REPLACE { <subject> nie:dataSource <ds1>, null, <ds2>, <ds3> }

This also gives us <ds2> and <ds3>. First <ds1> is added, but the null that follows clears it again. Then <ds2> and <ds3> get added. So the <ds1> there doesn’t make much sense, indeed.

With null at the end:

INSERT OR REPLACE { <subject> nie:dataSource <ds1>, <ds2>, <ds3>, null }

This one doesn’t make much sense either. The <ds1>, <ds2> and <ds3> get cleared by the null at the end. So the query gives us zero results.

With null as only element:

INSERT OR REPLACE { <subject> nie:dataSource null }

This one makes sense, you can use it to clear a multi value property of a resource. The query gives us zero results.

Multiple nulls:

INSERT OR REPLACE { <subject> nie:dataSource null, <ds1>, null, <ds2>, <ds3> }

Again doesn’t make much sense. First the list is cleared, then <ds1> is added, then it’s again cleared, then <ds2> and <ds3> are added. So the query gives <ds2> and <ds3>.

Support for null with Tracker’s INSERT OR REPLACE feature.

I believe it was the QtContacts Tracker team who requested this feature. When they have to unset the value of a resource’s property and at the same time set a bunch of other properties, they need to use a DELETE statement upfront an INSERT OR REPLACE. The DELETE increases the amount of queries and introduces a SQL SELECT internally for solving the SPARQL DELETE’s WHERE.

Instead of that they wanted a way to express this in the INSERT OR REPLACE, and that way gain a bit of performance. Today I implemented this.

So let’s say we start with:

INSERT { <subject> a nie:InformationElement ; nie:title 'test' }

And then we replace the nie:title:

INSERT OR REPLACE { <subject> nie:title 'new test' }

Then of course we get ‘new test’ for the nie:title of the resource:

SELECT ?title { <subject> nie:title ?title }

Then let’s say we want to unset the nie:title, we can either use:

DELETE { <subject> nie:title ?title } WHERE { <subject> nie:title ?title }

or we can now also use this (and avoid an extra internal SQL SELECT to solve the SPARQL DELETE’s WHERE):

INSERT OR REPLACE { <subject> nie:title null }

For multi value properties will a null object in INSERT OR REPLACE results in a reset of the entire list of objects. There is still a SQL SELECT happening internally to get the so called old values, but that one is sort of unavoidable and is also used by a normal DELETE. I hope this feature helps the QtContacts Tracker team gain performance for their contact synchronization use cases.

You can find this in a branch, it might take some time before it reaches master as most of the Tracker team is at the Berlin Desktop Summit; it must be reviewed, of course. Since it doesn’t really change any of the existing APIs, as it only adds a feature, we might also bring it to 0.10. Although now that we started with 0.11, I think it probably belongs in 0.11 only. Distributions should probably just upgrade, wait for the new features until they decide to bump the version of their packages, or backport features themselves.