June 2010 – How easy it is to make people believe a lie, and how hard it is to undo that work again

We store our data in a decomposed way. For single value properties we create a table per class and have a column per property. Multi value properties go in a separate table. For now I’ll focus on those single value properties.

Imagine you have a MusicPiece. In Nepomuk that’s a subclass of InformationElement. InformationElement adds properties like title and subject. MusicPiece has performer, which is a Contact, and duration, an integer. A Contact has a fullname.

Alright, that looks like this in our internal storage.

Querying that in SPARQL goes like this. I’ll add the Nepomuk prefixes.

SELECT ?musicpiece ?title ?subject ?performer {
   ?musicpiece a nmm:MusicPiece ;
               nmm:performer ?p ;
               nie:title ?title ;
               nie:subject ?subject .
   ?p nco:fullname ?performer .
} ORDER BY ?title

A problem if you ORDER BY the title field is that Tracker needs to make a join and a full table scan with that InformationElement table.

So we’re working on what we’ll call domain specific indexes. It means that we’ll for certain properties have a redundant mirror column, on which we’ll place the index. The native SQL query will be generated to use that mirror column instead. A good example is nie:title for nmm:MusicPiece.

ps. A normal triple store has instead a huge table with just three columns: subject, predicate and object. That wouldn’t help you much with optimizing of course.

M	T	W	T	F	S	S
« May				Jul »
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Month: June 2010

Domain specific indexes