Avoiding duplicate album art storage on the N9

At Tracker (core component of Nokia N9‘s MeeGo Harmattan’s Content Framework) we extract album art out of music files like MP3s, and we do a heuristic scan in the same directory of the music files for files like cover.jpg.

Right now we use the media art storage spec which we at a Boston Summit a few years ago, together with the Banshee guys, came up with. This specification allows for artist + album media art.

This is a bit problematic now on the N9 because (embedded) album art is getting increasingly bigger. We’ve seen music stores with album art of up to 2MB. The storage space for this kind of data isn’t unlimited on the device. In particular is it a problem that for an album with say 20 songs by 20 different artists, with each having embedded album art, 20 times the same album art is stored. Just each time for a different artist-album combination.

To fix this we’re working on a solution that compares the MD5 of the image data of the file album-md5(space)-md5(album).jpg with the MD5 of the image data of the file album-md5(artist)-md5(album).jpg. If the contents are the same we will make a symlink from the latter to the former instead of creating a normal new album art file.

When none exist yet, we first make album-md5(space)-md5(album).jpg and then symlink album-md5(artist)-md5(album).jpg to it. And when the contents aren’t the same we create a normal file called album-md5(artist)-md5(album).jpg.

Consumers of the album art can now choose between using a space for artist if they are only interested in ‘just album’ album art, or filling in both artist and album for artist-album album art.

This is a first idea to solve this issue, we have some other ideas in mind for in case this solution comes with unexpected problems.

I usually blog about unfinished stuff. Also this time. You can find the work in progress here.

Null support for INSERT OR REPLACE available in master

About

Last week I wrote about adding a feature to our SPARQL Update’s INSERT OR REPLACE. With that feature it’s not needed to put a DELETE upfront the INSERT to clear a field. This makes our SPARQL-ish INSERT OR REPLACE in some ways more powerful than SQL’s UPDATE. Note, however, that all of INSERT OR REPLACE is non-standard in the SPARQL language. And this new null support certainly isn’t.

Support for null with INSERT OR REPLACE is now available in Tracker‘s master branch. How to use it is illustrated in the functional test. I’ll briefly explain the test.

For single value properties:

This is of course very simple.

INSERT { <subject> nie:title 'new test' }
INSERT OR REPLACE { <subject> nie:title null }

If you now select nie:title for <subject> then of course you’ll get that its nie:title field is unset.

For multi value properties:

Begin situation:

INSERT { <subject> a nie:DataObject, nie:InformationElement }
INSERT { <ds1> a nie:DataSource }
INSERT { <ds2> a nie:DataSource }
INSERT { <ds3> a nie:DataSource }
INSERT { <subject> nie:dataSource <ds1>, <ds2>, <ds3> }

This will be the test query I’ll use for all cases:

SELECT ?ds WHERE { <subject> nie:dataSource ?ds }

For the begin situation that of course gives us <ds1>, <ds2> and <ds3>.

With null upfront, reset of list, rewrite of new list:

INSERT OR REPLACE { <subject> nie:dataSource null, <ds1>, <ds2> }

This will give us <ds1> and <ds2> for the test query. The first null resets the existing list, then <ds1> and <ds2> are added. This is probably the most sensible one to use for multi value properties.

With null in the middle, rewrite of new list:

INSERT OR REPLACE { <subject> nie:dataSource <ds1>, null, <ds2>, <ds3> }

This also gives us <ds2> and <ds3>. First <ds1> is added, but the null that follows clears it again. Then <ds2> and <ds3> get added. So the <ds1> there doesn’t make much sense, indeed.

With null at the end:

INSERT OR REPLACE { <subject> nie:dataSource <ds1>, <ds2>, <ds3>, null }

This one doesn’t make much sense either. The <ds1>, <ds2> and <ds3> get cleared by the null at the end. So the query gives us zero results.

With null as only element:

INSERT OR REPLACE { <subject> nie:dataSource null }

This one makes sense, you can use it to clear a multi value property of a resource. The query gives us zero results.

Multiple nulls:

INSERT OR REPLACE { <subject> nie:dataSource null, <ds1>, null, <ds2>, <ds3> }

Again doesn’t make much sense. First the list is cleared, then <ds1> is added, then it’s again cleared, then <ds2> and <ds3> are added. So the query gives <ds2> and <ds3>.

Support for null with Tracker’s INSERT OR REPLACE feature.

I believe it was the QtContacts Tracker team who requested this feature. When they have to unset the value of a resource’s property and at the same time set a bunch of other properties, they need to use a DELETE statement upfront an INSERT OR REPLACE. The DELETE increases the amount of queries and introduces a SQL SELECT internally for solving the SPARQL DELETE’s WHERE.

Instead of that they wanted a way to express this in the INSERT OR REPLACE, and that way gain a bit of performance. Today I implemented this.

So let’s say we start with:

INSERT { <subject> a nie:InformationElement ; nie:title 'test' }

And then we replace the nie:title:

INSERT OR REPLACE { <subject> nie:title 'new test' }

Then of course we get ‘new test’ for the nie:title of the resource:

SELECT ?title { <subject> nie:title ?title }

Then let’s say we want to unset the nie:title, we can either use:

DELETE { <subject> nie:title ?title } WHERE { <subject> nie:title ?title }

or we can now also use this (and avoid an extra internal SQL SELECT to solve the SPARQL DELETE’s WHERE):

INSERT OR REPLACE { <subject> nie:title null }

For multi value properties will a null object in INSERT OR REPLACE results in a reset of the entire list of objects. There is still a SQL SELECT happening internally to get the so called old values, but that one is sort of unavoidable and is also used by a normal DELETE. I hope this feature helps the QtContacts Tracker team gain performance for their contact synchronization use cases.

You can find this in a branch, it might take some time before it reaches master as most of the Tracker team is at the Berlin Desktop Summit; it must be reviewed, of course. Since it doesn’t really change any of the existing APIs, as it only adds a feature, we might also bring it to 0.10. Although now that we started with 0.11, I think it probably belongs in 0.11 only. Distributions should probably just upgrade, wait for the new features until they decide to bump the version of their packages, or backport features themselves.