The support for domain specific indexes is, awaiting review / finished. Although we can further optimize it now. More on that later in this post. Image that you have this ontology:
nie:InformationElement a rdfs:Class . nie:title a rdf:Property ; nrl:maxCardinality 1 ; rdfs:domain nie:InformationElement ; rdfs:range xsd:string . nmm:MusicPiece a rdfs:Class ; rdfs:subClassOf nie:InformationElement . nmm:beatsPerMinute a rdf:Property ; nrl:maxCardinality 1 ; rdfs:domain nmm:MusicPiece ; rdfs:range xsd:integer .
With that ontology there are three tables called “Resource”, “nmo:MusicPiece” and “nie:InformationElement” in SQLite’s schema:
- The “Resource” table has ID and the subject string
- The “nie:InformationElement” has ID and “nie:title”
- The “nmm:MusicPiece” one has ID and “nmm:beatsPerMinute”
That’s fairly simple, right? The problem is that when you ORDER BY “nie:title” that you’ll cause a full table scan on “nie:InformationElement”. That’s not good, because there are less “nmm:MusicPiece” records than “nie:InformationElement” ones.
Imagine that we do this SPARQL query:
SELECT ?title WHERE { ?resource a nmm:MusicPiece ; nie:title ?title } ORDER BY ?title
We translate that, for you, to this SQL on our schema:
SELECT "title_u" FROM ( SELECT "nmm:MusicPiece1"."ID" AS "resource_u", "nie:InformationElement2"."nie:title" AS "title_u" FROM "nmm:MusicPiece" AS "nmm:MusicPiece1", "nie:InformationElement" AS "nie:InformationElement2" WHERE "nmm:MusicPiece1"."ID" = "nie:InformationElement2"."ID" AND "title_u" IS NOT NULL ) ORDER BY "title_u"
OK, so with support for domain indexes we change the ontology like this:
nmm:MusicPiece a rdfs:Class ; rdfs:subClassOf nie:InformationElement ; tracker:domainIndex nie:title .
Now we’ll have the three tables called “Resource”, “nmo:MusicPiece” and “nie:InformationElement” in SQLite’s schema. But they will look like this:
- The “Resource” table has ID and the subject string
- The “nie:InformationElement” has ID and “nie:title”
- The “nmm:MusicPiece” table now has three columns called ID, “nmm:beatsPerMinute” and “nie:title”
The same data, for titles of music pieces, will be in both “nie:InformationElement” and “nmm:MusicPiece”. We copy to the mirror column during ontology change coping, and when new inserts happen.
When now the rdf:type is known in the SPARQL query as a nmm:MusicPiece, like in the query mentioned earlier, we know that we can use the “nie:title” from the “nmm:MusicPiece” table in SQLite. That allows us to generate you this SQL query:
SELECT "title_u" FROM ( SELECT "nmm:MusicPiece1"."ID" AS "resource_u", "nmm:MusicPiece1"."nie:title" AS "title_u" FROM "nmm:MusicPiece" AS "nmm:MusicPiece1" WHERE "title_u" IS NOT NULL ) ORDER BY "title_u"
A remaining optimization is when you request a rdf:type that is a subclass of nmm:MusicPiece, like this:
SELECT ?title WHERE { ?resource a nmm:MusicPiece, nie:InformationElement ; nie:title ?title } ORDER BY ?title
It’s still not as bad as now the “nie:title” is still taken from the “nmm:MusicPiece” table. But the join with “nie:InformationElement” is still needlessly there (we could just do the earlier SQL query in this case):
SELECT "title_u" FROM ( SELECT "nmm:MusicPiece1"."ID" AS "resource_u", "nmm:MusicPiece1"."nie:title" AS "title_u" FROM "nmm:MusicPiece" AS "nmm:MusicPiece1", "nie:InformationElement" AS "nie:InformationElement2" WHERE "nmm:MusicPiece1"."ID" = "nie:InformationElement2"."ID" AND "title_u" IS NOT NULL ) ORDER BY "title_u"
We will probably optimize this specific use-case further later this week.