The support for domain specific indexes is, awaiting review / finished. Although we can further optimize it now. More on that later in this post. Image that you have this ontology:
nie:InformationElement a rdfs:Class .
nie:title a rdf:Property ;
nrl:maxCardinality 1 ;
rdfs:domain nie:InformationElement ;
rdfs:range xsd:string .
nmm:MusicPiece a rdfs:Class ;
rdfs:subClassOf nie:InformationElement .
nmm:beatsPerMinute a rdf:Property ;
nrl:maxCardinality 1 ;
rdfs:domain nmm:MusicPiece ;
rdfs:range xsd:integer .
With that ontology there are three tables called “Resource”, “nmo:MusicPiece” and “nie:InformationElement” in SQLite’s schema:
- The “Resource” table has ID and the subject string
- The “nie:InformationElement” has ID and “nie:title”
- The “nmm:MusicPiece” one has ID and “nmm:beatsPerMinute”
That’s fairly simple, right? The problem is that when you ORDER BY “nie:title” that you’ll cause a full table scan on “nie:InformationElement”. That’s not good, because there are less “nmm:MusicPiece” records than “nie:InformationElement” ones.
Imagine that we do this SPARQL query:
SELECT ?title WHERE {
?resource a nmm:MusicPiece ;
nie:title ?title
} ORDER BY ?title
We translate that, for you, to this SQL on our schema:
SELECT "title_u" FROM (
SELECT "nmm:MusicPiece1"."ID" AS "resource_u",
"nie:InformationElement2"."nie:title" AS "title_u"
FROM "nmm:MusicPiece" AS "nmm:MusicPiece1",
"nie:InformationElement" AS "nie:InformationElement2"
WHERE "nmm:MusicPiece1"."ID" = "nie:InformationElement2"."ID"
AND "title_u" IS NOT NULL
) ORDER BY "title_u"
OK, so with support for domain indexes we change the ontology like this:
nmm:MusicPiece a rdfs:Class ;
rdfs:subClassOf nie:InformationElement ;
tracker:domainIndex nie:title .
Now we’ll have the three tables called “Resource”, “nmo:MusicPiece” and “nie:InformationElement” in SQLite’s schema. But they will look like this:
- The “Resource” table has ID and the subject string
- The “nie:InformationElement” has ID and “nie:title”
- The “nmm:MusicPiece” table now has three columns called ID, “nmm:beatsPerMinute” and “nie:title”
The same data, for titles of music pieces, will be in both “nie:InformationElement” and “nmm:MusicPiece”. We copy to the mirror column during ontology change coping, and when new inserts happen.
When now the rdf:type is known in the SPARQL query as a nmm:MusicPiece, like in the query mentioned earlier, we know that we can use the “nie:title” from the “nmm:MusicPiece” table in SQLite. That allows us to generate you this SQL query:
SELECT "title_u" FROM (
SELECT "nmm:MusicPiece1"."ID" AS "resource_u",
"nmm:MusicPiece1"."nie:title" AS "title_u"
FROM "nmm:MusicPiece" AS "nmm:MusicPiece1"
WHERE "title_u" IS NOT NULL
) ORDER BY "title_u"
A remaining optimization is when you request a rdf:type that is a subclass of nmm:MusicPiece, like this:
SELECT ?title WHERE {
?resource a nmm:MusicPiece, nie:InformationElement ;
nie:title ?title
} ORDER BY ?title
It’s still not as bad as now the “nie:title” is still taken from the “nmm:MusicPiece” table. But the join with “nie:InformationElement” is still needlessly there (we could just do the earlier SQL query in this case):
SELECT "title_u" FROM (
SELECT "nmm:MusicPiece1"."ID" AS "resource_u",
"nmm:MusicPiece1"."nie:title" AS "title_u"
FROM "nmm:MusicPiece" AS "nmm:MusicPiece1",
"nie:InformationElement" AS "nie:InformationElement2"
WHERE "nmm:MusicPiece1"."ID" = "nie:InformationElement2"."ID"
AND "title_u" IS NOT NULL
) ORDER BY "title_u"
We will probably optimize this specific use-case further later this week.