Introduction to RDF and SPARQL

Let’s start with a relatively simple graph. The graph shows the relationships between John, Fred, Max and Picca. John and Fred are humans who we’ll refer to as contacts. Max and Picca are pets. Max is a dog and Picca is a parrot. Both Picca and Max are owned by John. Fred claims that John is his friend.

If we would want to represent this story semantically we would first need to make an dictionary that describes pets, contacts, dogs, parrots. The dictionary would also describe possible relationships like ownership of a pet and the friendship between two contacts. Don’t forget, making something semantic means that you want to give meaning to the things that interest you.

Giving meaning is exactly what we’ll start with. We will write the schema for making this story possible. We will call this an ontology.

We describe our ontology using the Turtle format. In Turtle you can have prefixes. The prefix test: for example is the same as using <http://test.org/ontologies/tracker#>.

In Turtle you describe statements by giving a subject, a predicate and then an object. The subject is what you are talking about. The predicate is what about the subject your are talking about. And finally the object is the value. This value can be a resource or a literal.

When you write a . (a dot) in Turtle it means that you end describing the subject. When you write a ; (semicolon) it means that you continue with the same subject, but will start describing a new predicate. When you write a , (comma) it means that you even continue with the same predicate. The same rules apply in the WHERE section of a SPARQL query. But first things first: the ontology.

Note that the “test” ontology is not officially registered at tracker-project.org. It serves merely as an example.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix tracker: <http://www.tracker-project.org/ontologies/tracker#> .
@prefix test: <http://www.tracker-project.org/ontologies/test#> .

test: a tracker:Namespace ;
	tracker:prefix "test" .

test:Entity a rdfs:Class .

test:Contact a rdfs:Class ;
	rdfs:subClassOf test:Entity .

test:Pet a rdfs:Class ;
	rdfs:subClassOf test:Entity .

test:Dog a rdfs:Class ;
	rdfs:subClassOf test:Entity .

test:Parrot a rdfs:Class ;
	rdfs:subClassOf test:Entity .

test:name a rdf:Property ;
	rdfs:domain test:Entity ;
	rdfs:range xsd:string .

test:owns a rdf:Property ;
	rdfs:domain test:Contact ;
	rdfs:range test:Pet .

test:hasFriend a rdf:Property ;
	rdfs:domain test:Contact ;
	rdfs:range test:Contact .

Now that we have meaning, we will introduce the actors: Picca, Max, John and Fred. Copy the @prefix lines of the ontology file from above, put the ontology file in the share/tracker/ontologies directory and run tracker-processes -r before restarting tracker-store in master. After doing all that you can actually store this as a /tmp/import.ttl file and then run tracker-import /tmp/import.ttl and it should import just fine. Ready for the queries below to be executed with the tracker-sparql -q ‘$query’ command.

Note that tracker-processes -r destroys all your RDF data in Tracker. We don’t yet support adding custom ontologies at runtime, so for doing this test you have to start everything from scratch.

<test:Picca> a test:Parrot, test:Pet ;
	test:name "Picca" .

<test:Max> a test:Dog, test:Pet ;
	test:name "Max" .

<test:John> a test:Contact ;
	test:owns <test:Max> ;
	test:owns <test:Picca> ;
	test:name "John" .

<test:Fred> a test:Contact ;
	test:hasFriend <test:John> ;
	test:name "Fred" .

Let’s do some simple SPARQL queries. You can execute these queries this way:

tracker-sparql -q "SELECT ?subject WHERE { ?subject a test:Parrot }"

In this query we ask for the subject of each entity that is a parrot. The query will yield test:Picca because Picca is the only parrot in our situation.

  test:Picca

Usually we aren’t interested in the subject, but in a real property of the parrot. We can ask for such a property this way:

SELECT ?subject ?name WHERE { ?subject a test:Parrot ; test:name ?name}
  test:Picca, Picca

Another simple example, give me all the contacts:

SELECT ?subject WHERE { ?subject a test:Contact }"
  test:John
  test:Fred

Just the contacts doesn’t illustrate much. Give me all contacts that have a friend. And display the contact and the friend’s names:

SELECT ?name ?friend
WHERE { ?subject test:hasFriend ?f ;
                 test:name ?name .
        ?f test:name ?friend }
  Fred, John

Let’s ask for all the pets that are owned:

SELECT ?subject WHERE { ?unknown test:owns ?subject }
  test:Max
  test:Picca

Oh, not the subject. The names. How did we do that again? Right:

SELECT ?name
WHERE { ?unknown test:owns ?subject .
        ?subject test:name ?name }
  Max
  Picca

This will of course yield the same results in our situation:

SELECT ?name
WHERE { <test:John> test:owns ?subject .
        ?subject test:name ?name }
  Max
  Picca

But this wont, Fred doesn’t own any pets. Only John owns pets.

SELECT ?name
WHERE { <test:Fred> test:owns ?subject .
        ?subject test:name ?name }

Let’s print the owner’s and the pet’s names:

SELECT ?owner ?name
 WHERE { ?unknown test:owns ?subject ;
                  test:name ?owner .
         ?subject test:name ?name }"
  John, Max
  John, Picca

Still with me? Let’s now conclude with requesting the names of the contacts who are a friend of the person who owns Picca:

SELECT ?name
WHERE { ?subject test:owns <test:Picca> .
        ?unknown test:hasFriend ?subject ;
                 test:name ?name }
  Fred

Invitation for Jürg and Rob: How about you guys writing a introduction to OPTIONAL, SUM, COUNT, GROUP-BY and FILTER, etc in SPARQL? :-) The more advanced stuff.

12 thoughts on “Introduction to RDF and SPARQL”

  1. Wow, it’s Prolog! Only more wordy! And probably slower! And almost 40 years late!

    Sometimes I really believe the old saying “There are no new ideas in computer science”.

  2. @choeger: Not every dog is necessarily a pet. So no. But that doesn’t mean that a resource can’t be both a Pet and a Dog at the same time, in RDF.

  3. @Mark: it looks like prolog if you think about it as “give me the values of these variables that match a certain graph”. But SparQL is using triplets: no backtracking, no inference, just direct match.

  4. Shouldn’t the range for hasFriend be test:Contact?

    test:hasFriend a rdf:Property ;
    rdfs:domain test:Contact ;
    rdfs:range test:Contact .

  5. @gnublade: oh, yes, that must have been a copy-paste error. I made the blog a bit shorter afterward by removing some stuff from that ontology. Must have gone wrong at that point.

    Thanks for pointing out! I will correct it immediately.

  6. Nice down to earth introduction to rdf/rdfs!

    One notational thing: I think that in Turtle means the rdf node with URI identifier

    test:Picca

    rather than the rdf node with URI identifier

    http://www.tracker-project.org/ontologies/test#Picca

    which I think is what you want and which is exactly what test:Picca (without the angle brackets) means given your @prefix test: definition. In other words, the angle brackets in Turtle (and SPARQL) are like quotes and allow you to use arbitrary URI’s for labeling identifiers for nodes in the rdf graph [1]. Check out

    http://www.w3.org/TeamSubmission/turtle/

    in particular section 2.1.

    By the way there is a convention to use lowercase identifiers for instances (e.g. test:picca instead of test:Picca) and identifiers starting with a capital for classes but that is just a convention.

    [1] People usually use http URL’s. One can also use URI’s which are not URL’s, e.g. is a reasonable way to refer to the rdf node representing the mailbox of John Smith. Personally, I prefer to use non URL’s for “things” like parrots that cannot be obviously retrieved over the internet i.e. I like to use for a nicely namespaced, uniquely labelled rdf node for a parrot Picca, a rdf node for his homepage and for the rdf node of his maibox. I can then say

    @prefix foaf: .
    @prefix rdfs: .

    a test:Parrot;
    rdfs:comment “the rdf node for Picca the Parrot”;
    foaf:homePage ;
    foaf:mailbox .

    to claim that Picca the parrot has a homepage and a mailbox (note that the URI labels of the rdf nodes have no meaning other than giving them a unique name). One could then say other things about that home page or the mailbox e.g.

    a foaf:Document;
    rdfs:comment “the rdf node for the home page of Picca the Parrot”;
    rdfs:seeAlso “http://www.tracker.org/homepage/Picca”^^xsd:anyURI;
    rdfs:seeAlso “http://www.pvanhoof.com/Pets/Picca”^^xsd:anyURI;
    foaf:maker [ a foaf:Person;
    foaf:nick “pvanhoof”].

    to claim that the homepage is a document created by a person with nickname pvanhoof, and that you may want to check the web address (rather than the rdf node) http://www.tracker.org/homepage/Picca and/or http://www.pvanhoof.com/Pets/Picca, if you are interested in that document node. YMMV.

  7. @IvanFrade: Using triplets doesn’t save you from having to do backtracking or something similar (probably less efficient). Consider, for example, this graph: http://gist.github.com/149936 . It’s 21 nodes linked in a linear fashion from n0 to n20, including backlinks. To find the correct way from n0 to n20 you’ll use this query: http://gist.github.com/149938 . Finding N1 is easy, because there’s only one node adjacent to n0. But N2 could be n0 or n2. The only way to know whether it is one or the other (or both) is to try out all possibilities for the rest of the query. Of course you could do this without backtracking by, for example, first generating all possible assignments (the Cartesian product) and then eliminating the ones which contain non-existent triplets. That would be less efficient and would cost much more memory, though.

    As it happens, I’ve tried running that query through rdfproc – it took about 50 seconds to complete. The same query in Prolog (http://gist.github.com/149940), using SWI-Prolog (which is not know for its performance), took a bit more than 100 milliseconds, including start-up, compilation and shut-down.

  8. @Rogier Brussee: Yeh I was aware of that. But while I was making the examples I liked the URIs more (for being short on the blog). I hope it didn’t confuse people too much, though. If those people simply don’t write the < and >, for the subjects, they should get the full URI as subject instead.

  9. Mark, this is definitely an interesting find, although I suspect you can find examples where Prolog would be slower. I hope SPARQL backend writers are aware of this approach also. If not, you’d better let them know :)

  10. You should probably not use bound variables (but anonymous ones) in your examples if you don’t plan to SELECT those variables…

Comments are closed.