meego – Page 3 – How easy it is to make people believe a lie, and how hard it is to undo that work again

Less exciting features also need to be done, return types

We have a feature request to support return types and to give back variable names. We currently return an array (of array) of just strings, with no typing. This doesn’t work very well for knowing whether a cell is (for example) unbound. Empty string isn’t the same as unbound. So what can you do?

With direct-access the implementation is easy, we’ll just read it from the SPARQL engine. We have all this info already anyway. For filedescriptor passing with D-Bus we need to marshal it over the protocol.

Although we might come back to this decision short term, we wont yet do it for our “normal” (non-FD passing) D-Bus query method. SPARQL’s type system is different from D-Bus’s, so we shouldn’t try to match them somehow. Any custom format that we’d come up with, would be arbitrary.

Maybe someday we’ll add another “normal” D-Bus method that gives you a big string with SPARQL Query Results in JSON or SPARQL Query Results in XML back. Right now this has no priority for us, plus it would be a lot slower due to serialization. Post 0.9 everybody should be using libtracker-sparql and that’ll select either FD passing or direct-access.

Anyway, this will likely be the API for Sparql.Cursor. The methods get_value_type and get_variable_name got added.

public enum Tracker.Sparql.ValueType {
	UNBOUND, URI, STRING, INTEGER,
	DOUBLE, DATETIME, BLANK_NODE
}

public abstract class Tracker.Sparql.Cursor : Object {
	public Connection connection { get; }
	public abstract int n_columns { get; }
+	public abstract ValueType get_value_type (int column);
+	public abstract unowned string? get_variable_name (int column);
	public abstract unowned string? get_string (int column, out long length = null);
	public abstract bool next (Cancellable? cancellable = null) throws GLib.Error;
	public async abstract bool next_async (Cancellable? cancellable = null) throws GLib.Error;
	public abstract void rewind ();
}

I usually post about work in progress, not about something that is done. Same this time, of course. You can find the branch where we’re working on this here.

Tracker’s new class signal system being developed

Tracker 0.8’s situation

In Tracker 0.8 we have a signal system that causes quite a bit of overhead. The overhead comes from:

Having to store the URIs of the resources involved in a changeset in tracker-store‘s memory;
Having to store the predicates involved in a changeset in tracker-store‘s memory (less severe than A because we can store a pointer to an instance instead of a string);
Having to UTF-8 validate the strings when we emit them over D-Bus (D-Bus does this implicitly);
DBus’s own copying and handling of string data;
Heavy traffic on D-Bus;
Context switching between tracker-store and dbus-daemon;
We have to wait with turning on the D-Bus objects until after we have the latest ontology. So after journal replay. And we need to reset the situation after a backup restore. Complex!

Not all aggregators show this list as A, B, C, D, E, F and G. Sorry for that. I’ll nevertheless refer to the items as such later in this article.

Consumer’s problems with Tracker 0.8’s signal

Aforementioned overhead: consumes a lot of D-Bus traffic. This is caused by sending over URLs for the subjects and the predicates;
Doesn’t make it possible, in case of a delete of <a>, to know  in <a> nfo:isLogicalPartOf , as <a> is removed at the point of signal emission;
Round trips to know the literals create more D-Bus traffic;
Transactional changes can’t be reliably identified with SubjectsAdded, SubjectsChanged and SubjectsRemoved being separate signals;
A lot of D-Bus objects, instead of letting clients use D-Bus’s filtering system.

The solution that we’re developing for Tracker 0.9

Direct access

With direct-access we remove most of the round-trip cost of a query coming from a consumer that wants a literal object involved in a changeset: by utilizing the TrackerSparqlCursor API with direct-access enabled, you end up doing sqlite3_step() in your own process, directly on meta.db.

For the consumers of the signal, this removes 3.

Sending integer IDs instead of string URIs

A while ago we introduced the SPARQL function tracker:id(resource uri). The tracker:id(resource uri) function gives you a unique number that Tracker’s RDF store uses internally.

Each resource, each class and each predicate (latter are resources like any other) have such an unique internal ID.

Given that Tracker’s class signal system is specific anyway, we decided not to give you subject URL strings. Instead, we’ll give you the integer IDs.

The Writeback signal also got changed to do this, for the same reasons. But this API is entirely internal and shouldn’t be used outside of the project.

This for us removes A, B, C, D and E. For the consumers of the signal, this removes 1.

Merge added, changed and removed into the one signal

We give you two arrays in one signal: inserts and deletes.

For consumers of the signal, this removes 4.

Add the class name to the signal

This allows you to use a string filter on your signal subscription in D-Bus.

For us this removes G. For consumers of the signal, this removes 5.

Pass the object-id for resource objects

You’ll get a third number in the inserts and deletes arrays: object-id. We don’t send object literals, although for integral objects we’re still discussing this. But for resource objects we give without much extra cost the object-id.

For consumers of the signal, this removes 2.

SPARQL IN, tracker:id(resource uri) and tracker:uri(int id)

We recently added support for SPARQL IN, we already had tracker:id(resource uri) and I implemented tracker:uri(int id).

This makes things like this possible:

SELECT ?t { ?r nie:title ?t .
            FILTER (tracker:id(?r) IN (800, 801, 802, 807)) }

Where 800, 801, 802 and 807 will be the IDs that you receive in the class signal. And with tracker:uri(int id) it goes like:

SELECT tracker:uri (800) tracker:uri (801)
       tracker:uri (802) tracker:uri (807) { }

For consumers this removes most of the burden introduced by the IDs.

Context switching of processes

What is left is context switching between tracker-store and dbus-daemon, F. Mostly important for mobile targets (ARM hardware). We reduce them by grouping transactions together and then bursting larger sets. It’s both timeout and data-size based (after either a certain amount of time, or a certain memory limit, we emit). We’re still testing what the most ideal timeouts and sizes are on target hardware.

Where is the stuff?

The work isn’t yet reviewed nor thoroughly tested. This will happen next few days and weeks.

Anyway, here’s the branch, documentation, example in Plain C, example in Vala

Support for SPARQL IN and NOT IN, the new class signals

I made some documentation about our SPARQL-IN feature that we recently added. I added some interesting use-cases like doing an insert and a delete based on in values.

For the new class signal API that we’re developing this and next week, we’ll probably emit the IDs that tracker:id() would give you if you’d use that on a resource. This means that IN is very useful for the purpose of giving you metadata of resources that are in the list of IDs that you just received from the class signal.

We never documented tracker:id() very much, as it’s not an RDF standard; rather it’s something Tracker specific. But neither are the class signals a RDF standard; they are Tracker specific too. I guess here that makes it usable in combo and turns the status of ‘internal API’, irrelevant.

We’re right now prototyping the new class signals API. It’ll probably be a “sa(iii)a(iii)”:

That’s class-name and two arrays of subject-id, predicate-id, object-id. The class-name is to allow D-Bus filtering. The first array are the deletes and the second are the inserts. We’ll only give you object-ids of non-literal objects (literal objects have no internal object-id). This means that we don’t throw literals to you in the signal (you need to make a query to get them, we’ll throw 0 to you in the signal).

We give you the object-ids because of a use-case that we didn’t cover yet:

Given triple <a> nie:isLogicalPartOf . When <a> is deleted, how do you know  during the signal? So the feature request was to do a select ?b { <a> nie:isLogicalPartOf ?b } when <a> is deleted (so the client couldn’t do that query anymore).

With the new signal we’ll give you the ID of  when <a> is deleted. We’ll also implement a tracker:uri(integer id) allowing you to get out of that ID. It’ll do something like this, but then much faster: select ?subject { ?subject a rdfs:Resource . FILTER (tracker:id(?subject) IN (%d)) }

I know there will be people screaming for all objects, also literals, in the signals, but we don’t want to flood your D-Bus daemon with all that data. Scream all you want. Really, we don’t. Just do a roundtrip query.

Domain indexes finished, technical conclusions

The support for domain specific indexes is, awaiting review / finished. Although we can further optimize it now. More on that later in this post. Image that you have this ontology:

nie:InformationElement a rdfs:Class .

nie:title a rdf:Property ;
  nrl:maxCardinality 1 ;
  rdfs:domain nie:InformationElement ;
  rdfs:range xsd:string .

nmm:MusicPiece a rdfs:Class ;
  rdfs:subClassOf nie:InformationElement .

nmm:beatsPerMinute a rdf:Property ;
  nrl:maxCardinality 1 ;
  rdfs:domain nmm:MusicPiece ;
  rdfs:range xsd:integer .

With that ontology there are three tables called “Resource”, “nmo:MusicPiece” and “nie:InformationElement” in SQLite’s schema:

The “Resource” table has ID and the subject string
The “nie:InformationElement” has ID and “nie:title”
The “nmm:MusicPiece” one has ID and “nmm:beatsPerMinute”

That’s fairly simple, right? The problem is that when you ORDER BY “nie:title” that you’ll cause a full table scan on “nie:InformationElement”. That’s not good, because there are less “nmm:MusicPiece” records than “nie:InformationElement” ones.

Imagine that we do this SPARQL query:

SELECT ?title WHERE {
   ?resource a nmm:MusicPiece ;
             nie:title ?title
} ORDER BY ?title

We translate that, for you, to this SQL on our schema:

SELECT   "title_u" FROM (
  SELECT "nmm:MusicPiece1"."ID" AS "resource_u",
         "nie:InformationElement2"."nie:title" AS "title_u"
  FROM   "nmm:MusicPiece" AS "nmm:MusicPiece1",
         "nie:InformationElement" AS "nie:InformationElement2"
  WHERE  "nmm:MusicPiece1"."ID" = "nie:InformationElement2"."ID"
  AND    "title_u" IS NOT NULL
) ORDER BY "title_u"

OK, so with support for domain indexes we change the ontology like this:

nmm:MusicPiece a rdfs:Class ;
  rdfs:subClassOf nie:InformationElement ;
  tracker:domainIndex nie:title .

Now we’ll have the three tables called “Resource”, “nmo:MusicPiece” and “nie:InformationElement” in SQLite’s schema. But they will look like this:

The “Resource” table has ID and the subject string
The “nie:InformationElement” has ID and “nie:title”
The “nmm:MusicPiece” table now has three columns called ID, “nmm:beatsPerMinute” and “nie:title”

The same data, for titles of music pieces, will be in both “nie:InformationElement” and “nmm:MusicPiece”. We copy to the mirror column during ontology change coping, and when new inserts happen.

When now the rdf:type is known in the SPARQL query as a nmm:MusicPiece, like in the query mentioned earlier, we know that we can use the “nie:title” from the “nmm:MusicPiece” table in SQLite. That allows us to generate you this SQL query:

SELECT   "title_u" FROM (
  SELECT "nmm:MusicPiece1"."ID" AS "resource_u",
         "nmm:MusicPiece1"."nie:title" AS "title_u"
  FROM   "nmm:MusicPiece" AS "nmm:MusicPiece1"
  WHERE  "title_u" IS NOT NULL
) ORDER BY "title_u"

A remaining optimization is when you request a rdf:type that is a subclass of nmm:MusicPiece, like this:

SELECT ?title WHERE {
  ?resource a nmm:MusicPiece, nie:InformationElement ;
            nie:title ?title
} ORDER BY ?title

It’s still not as bad as now the “nie:title” is still taken from the “nmm:MusicPiece” table. But the join with “nie:InformationElement” is still needlessly there (we could just do the earlier SQL query in this case):

SELECT   "title_u" FROM (
  SELECT "nmm:MusicPiece1"."ID" AS "resource_u",
         "nmm:MusicPiece1"."nie:title" AS "title_u"
  FROM   "nmm:MusicPiece" AS "nmm:MusicPiece1",
         "nie:InformationElement" AS "nie:InformationElement2"
  WHERE  "nmm:MusicPiece1"."ID" = "nie:InformationElement2"."ID"
  AND    "title_u" IS NOT NULL
) ORDER BY "title_u"

We will probably optimize this specific use-case further later this week.

SQLite’s WAL, deleting a domain specific index

SQLite’s WAL

SQLite is working on WAL, which stands for Write Ahead Logging.

The new logging technique means that we can probably keep read statements open for multiple processes. It’s not full MVCC yet as writes are still not doable simultaneously. But in our use-case is reading with multiple processes vastly more important anyway.

We’re investigating WAL mode of SQLite thoroughly these next few days. Jürg is working most on this at the moment. If WAL is fit for our purpose then we’ll probably also start developing a direct-access library that’ll allow your process to connect directly with our SQLite database, avoiding any form of IPC.

Adrien‘s FD-passing is in master, though. And it’s performing quite well!

We’re thrilled that SQLite’s team is taking this direction with WAL. Very awesome guys!

Domain specific indexes

Yesterday I worked on support for deleting a domain specific index from the ontology. Because SQLite doesn’t support dropping a column with its ALTER support, I had to do it by renaming the original table, recreating the table without the mirror column, and then copying the data from the renamed table. And finally dropping the renamed table. It’s nasty, but it works. I think SQLite should just add DROP COLUMN to ALTER. Why is this so hard to add?

I finally got it working, now it must of course be tested and then again tested.

Next for the feature is adapting the SPARQL engine to start using the indexed mirror column and produce better performing SQL queries.

Working on domain specific indexes

So … what is involved in a “simple change” like what I wrote about yesterday?

First you add support for annotating the domain specific index in the ontology files. This is straight forward as we of course have a generic Turtle parser, and it’s just a matter of adding properties to certain classes, and filling the values from the ontology in in the instances in our in-memory representation of the ontology. You of course also need to change the CREATE-TABLE statements. Trivial.

Then you implement detecting changes in the ontology. And more complex; coping with the changes. This means doing ALTER on the SQL tables. You also need to copy from the InformationElement table to the MusicPiece table (I’m using MusicPiece to clarify, it’s of course generic) in case of such a domain specific index being added during an ontology change, and put an implicit index on the column. After all, that index is why we’re doing this.

I finished those two yesterday. I have not finished detecting a deletion of a domain specifix index yet. That will have to ALTER the table with a DROP of the column. The most difficult here is detecting the deletion itself. We don’t yet have any code to diff on multivalue properties in the ontology (the ontology is a collection of RDF statements like everything else, describing itself).

Today I finished writing copy values to the MusicPiece table’s mirror column

Next few days will be about adapting the SPARQL engine and of course coping with a deletion of a domain specific index. And then testing, and again testing. Mind that this has to work from a journal replay situation too. In which case no ontology is involved (it’s all stored in the history of the persistent journal).

Where’s my Redbull? Ah, waiting for me in the fridge. Good!

Domain specific indexes

We store our data in a decomposed way. For single value properties we create a table per class and have a column per property. Multi value properties go in a separate table. For now I’ll focus on those single value properties.

Imagine you have a MusicPiece. In Nepomuk that’s a subclass of InformationElement. InformationElement adds properties like title and subject. MusicPiece has performer, which is a Contact, and duration, an integer. A Contact has a fullname.

Alright, that looks like this in our internal storage.

Querying that in SPARQL goes like this. I’ll add the Nepomuk prefixes.

SELECT ?musicpiece ?title ?subject ?performer {
   ?musicpiece a nmm:MusicPiece ;
               nmm:performer ?p ;
               nie:title ?title ;
               nie:subject ?subject .
   ?p nco:fullname ?performer .
} ORDER BY ?title

A problem if you ORDER BY the title field is that Tracker needs to make a join and a full table scan with that InformationElement table.

So we’re working on what we’ll call domain specific indexes. It means that we’ll for certain properties have a redundant mirror column, on which we’ll place the index. The native SQL query will be generated to use that mirror column instead. A good example is nie:title for nmm:MusicPiece.

ps. A normal triple store has instead a huge table with just three columns: subject, predicate and object. That wouldn’t help you much with optimizing of course.

IPC performance, the report

The Tracker team will be doing a codecamp this month. Among the subjects we will address is the IPC overhead of tracker-store, our RDF query service.

We plan to investigate whether a direct connection with our SQLite database is possible for clients. Jürg did some work on this. Turns out that due to SQLite not being MVCC we need to override some of SQLite’s VFS functions and perhaps even implement ourselves a custom page cache.

Another track that we are investigating involves using a custom UNIX domain socket and sending the data over in such a way that at either side the marshalling is cheap.

For that idea I asked Adrien Bustany, a computer sciences student who’s doing an internship at Codeminded, to develop three tests: A test that uses D-Bus the way tracker-store does (by using the DBusMessage API directly), a test that uses an as ideal as possible custom protocol and technique to get the data over a UNIX domain socket and a simple program that does the exact same query but connects to SQLite by itself.

Here’s the report:

Exposing a SQLite database remotely: comparison of various IPC methods

By Adrien Bustany
Computer Sciences student
National Superior School of Informatics and Applied Mathematics of Grenoble (ENSIMAG)

This study aims at comparing the overhead of an IPC layer when accessing a SQLite database. The two IPC methods included in this comparison are DBus, a generic message passing system, and a custom IPC method using UNIX sockets. As a reference, we also include in the results the performance of a client directly accessing the SQLite database, without involving any IPC layer.

Comparison methodology

In this section, we detail what the client and server are supposed to do during the test, regardless of the IPC method used.

The server has to:

Open the SQLite database and listen to the client requests
Prepare a query at the client’s request
Send the resulting rows at the client’s request

Queries are only “SELECT” queries, no modification is performed on the database. This restriction is not enforced on server side though.

The client has to:

Connect to the server
Prepare a “SELECT” query
Fetch all the results
Copy the results in memory (not just fetch and forget them), so that memory pages are really used

Test dataset

For testing, we use a SQLite database containing only one table. This table has 31 columns, the first one is the identifier and the 30 others are columns of type TEXT. The table is filled with 300 000 rows, with randomly generated strings of 20 ASCII lowercase characters.

Implementation details

In this section, we explain how the server and client for both IPC methods were implemented.

Custom IPC (UNIX socket based)

In this case, we use a standard UNIX socket to communicate between the client and the server. The socket protocol is a binary protocol, and is detailed below. It has been designed to minimize CPU usage (there is no marshalling/demarshalling on strings, nor intensive computation to decode the message). It is fast over a local socket, but not suitable for other types of sockets, like TCP sockets.

Message types

There are two types of operations, corresponding to the two operations of the test: prepare a query, and fetch results.

Message format

All numbers are encoded in little endian form.

Prepare

Client sends:

Size	Contents
4 bytes	Prepare opcode (0x50)
4 bytes	Size of the query (without trailing \0)
…	Query, in ASCII

Server answers:

Size	Contents
4 bytes	Return code of the sqlite3_prepare_v2 call

Fetch

Client sends:

Size	Contents
4 bytes	Fetch opcode (0x46)

Server sends rows grouped in fixed size buffers. Each buffer contains a variable number of rows. Each row is complete. If some padding is needed (when a row doesn’t fit in a buffer, but there is still space left in the buffer), the server adds an “End of Page” marker. The “End of page” marker is the byte 0xFF. Rows that are larger than the buffer size are not supported.

Each row in a buffer has the following format:

Size	Contents
4 bytes	SQLite return code. This is generally SQLITE_ROW (there is a row to read), or SQLITE_DONE (there are no more rows to read). When the return code is not SQLITE_ROW, the rest of the message must be ignored.
4 bytes	Number of columns in the row
4 bytes	Index of trailing \0 for first column (index is 0 after the “number of columns” integer, that is, index is equal to 0 8 bytes after the message begins)
4 bytes	Index of trailing \0 for second column
…
4 bytes	Index of trailing \0 for last column
…	Row data. All columns are concatenated together, and separated by \0

For the sake of clarity, we describe here an example row

100 4 1 7 13 19 1\0aaaaa\0bbbbb\0ccccc\0

The first 100 is the return code, in this case SQLITE_ROW. This row has 4 columns. The 4 following numbers are the offset of the \0 terminating each column in the row data. Finally comes the row data.

Memory usage

We try to minimize the calls to malloc and memcpy in the client and server. As we know the size of a buffer, we allocate the memory only once, and then use memcpy to write the results to it.

DBus

The DBus server exposes two methods, Prepare and Fetch.

Prepare

The Prepare method accepts a query string as a parameter, and returns nothing. If the query preparation fails, an error message is returned.

Fetch

Ideally, we should be able to send all the rows in one batch. DBus, however, puts a limitation on the message size. In our case, the complete data to pass over the IPC is around 220MB, which is more than the maximum size allowed by DBus (moreover, DBus marshalls data, which augments the message size a little). We are therefore obliged to split the result set.

The Fetch method accepts an integer parameter, which is the number of rows to fetch, and returns an array of rows, where each row is itself an array of columns. Note that the server can return less rows than asked. When there are no more rows to return, an empty array is returned.

Results

All tests are ran against the dataset described above, on a warm disk cache (the database is accessed several time before every run, to be sure the entire database is in disk cache). We use SQLite 3.6.22, on a 64 bit Linux system (kernel 2.6.33.3). All test are ran 5 times, and we use the average of the 5 intermediate results as the final number.

For the custom IPC, we test with various buffer sizes varying from 1 to 256 kilobytes. For DBus, we fetch 75000 rows with every Fetch call, which is close to the maximum we can fetch with each call (see the paragraph on DBus message size limitation).

The first tests were to determine the optimal buffer size for the UNIX socket based IPC. The following graph describes the time needed to fetch all rows, depending on the buffer size:

The graph shows that the IPC is the fastest using 64kb buffers. Those results depend on the type of system used, and might have to be tuned for different platforms. On Linux, a memory page is (generally) 4096 bytes, as a consequence buffers smaller than 4kB will use a full memory page when sent over the socket and waste memory bandwidth. After determining the best buffer size for socket IPC, we run tests for speed and memory usage, using a buffer size of 64kb for the UNIX socket based method.

Speed

We measure the time it takes for various methods to fetch a result set. Without any surprise, the time needed to fetch the results grows linearly with the amount of rows to fetch.

IPC method	Best time
None (direct access)	2910 ms
UNIX socket	3470 ms
DBus	12300 ms

Memory usage

Memory usage varies greatly (actually, so much that we had to use a log scale) between IPC methods. DBus memory usage is explained by the fact that we fetch 75 000 rows at a time, and that it has to allocate all the message before sending it, while the socket IPC uses 64 kB buffers.

Conclusions

The results clearly show that in such a specialized case, designing a custom IPC system can highly reduce the IPC overhead. The overhead of a UNIX socket based IPC is around 19%, while the overhead of DBus is 322%. However, it is important to take into account the fact that DBus is a much more flexible system, offering far more features and flexibility than our socket protocol. Comparing DBus and our custom UNIX socket based IPC is like comparing an axe with a swiss knife: it’s much harder to cut the tree with the swiss knife, but it also includes a tin can opener, a ball pen and a compass (nowadays some of them even include USB keys).

The real conclusion of this study is: if you have to pass a lot of data between two programs and don’t need a lot of flexibility, then DBus is not the right answer, and never intended to be.

The code source used to obtain these results, as well as the numbers and graphs used in this document can be checked out from the following git repository: git://git.mymadcat.com/ipc-performance . Please check the various README files to see how to reproduce them and/or how to tune the parameters.

Friday’s performance improvements in Tracker

The crawler’s modification time queries

Yesterday we optimized the crawler’s query that gets the modification time of files. We use this timestamp to know whether or not a file must be reindexed.

Originally, we used a custom SQLite function called tracker:uri-is-parent() in SPARQL. This, however, caused a full table scan. As long as your SQL table for nfo:FileDataObjects wasn’t too large, that wasn’t a huge problem. But it didn’t scale linear. I started with optimizing the function itself. It was using a strlen() so I replaced that with a sqlite3_value_bytes(). We only store UTF-8, so that worked fine. It gained me ~ 10%; not enough.

So this commit was a better improvement. First it makes nfo:belongsToContainer an indexed property. The x nfo:belongsToContainer p means x is in a directory p for file resources. The commit changes the query to use the property that is now indexed.

The original query before we started with this optimization took 1.090s when you had ~ 300,000 nfo:FileDataObject resources. The new query takes about 0.090s. It’s of course an unfair comparison because now we use an indexed property. Adding the index only took a total of 10s for a ~ 300,000 large table and the table is being queried while we index (while we insert into it). Do the math, it’s a huge win in all situations. For the SQLite freaks; the SQLite database grew by 4 MB, with all items in the table indexed.

PDF extractor

Another optimization I did earlier was the PDF extractor. Originally, we used the poppler-glib library. This library doesn’t allow us to set the OutputDev at runtime. If compiled with Cairo, the OutputDev is in some versions a CairoOutputDev. We don’t want all images in the PDF to be rendered to a Cairo surface. So I ported this back to C++ and made it always use a TextOutputDev instead. In poppler-glib master this appears to have improved (in git master poppler_page_get_text_page is always using a TextOutputDev).

Another major problem with poppler-glib is the huge amount of copying strings in heap. The performance to extract metadata and content text for a 70 page PDF document without any images went from 1.050s to 0.550s. A lot of it was caused by copying strings and GValue boxing due to GObject properties.

Table locked problem

Last week I improved D-Bus marshaling by using a database cursor. I forgot to handle SQLITE_LOCKED while Jürg and Carlos had been introducing multithreaded SELECT support. Not good. I fixed this; it was causing random Table locked errors.

Performance DBus handling of the query results in Tracker’s RDF service

Before

For returning the results of a SPARQL SELECT query we used to have a callback like this. I removed error handling, you can find the original here.

We need to marshal a database result_set to a GPtrArray because dbus-glib fancies that. This is a lot of boxing the strings into GValue and GStrv. It does allocations, so not good.

static void
query_callback(TrackerDBResultSet *result_set,GError *error,gpointer user_data)
{
  TrackerDBusMethodInfo *info = user_data;
  GPtrArray *values = tracker_dbus_query_result_to_ptr_array (result_set);
  dbus_g_method_return (info->context, values);
  tracker_dbus_results_ptr_array_free (&values);
}

void
tracker_resources_sparql_query (TrackerResources *self, const gchar *query,
                                DBusGMethodInvocation *context, GError **error)
{
  TrackerDBusMethodInfo *info = ...; guint request_id;
  TrackerResourcesPrivate *priv= ...; gchar *sender;
  info->context = context;
  tracker_store_sparql_query (query, TRACKER_STORE_PRIORITY_HIGH,
                              query_callback, ...,
                              info, destroy_method_info);
}

After

Last week I changed the asynchronous callback to return a database cursor. In SQLite that means an sqlite3_step(). SQLite returns const pointers to the data in the cell with its sqlite3_column_* APIs.

This means that now we’re not even copying the strings out of SQLite. Instead, we’re using them as const to fill in a raw DBusMessage:

static void
query_callback(TrackerDBCursor *cursor,GError *error,gpointer user_data)
{
  TrackerDBusMethodInfo *info = user_data;
  DBusMessage *reply; DBusMessageIter iter, rows_iter;
  guint cols; guint length = 0;
  reply = dbus_g_method_get_reply (info->context);
  dbus_message_iter_init_append (reply, &iter);
  cols = tracker_db_cursor_get_n_columns (cursor);
  dbus_message_iter_open_container (&iter, DBUS_TYPE_ARRAY,
                                    "as", &rows_iter);
  while (tracker_db_cursor_iter_next (cursor, NULL)) {
    DBusMessageIter cols_iter; guint i;
    dbus_message_iter_open_container (&rows_iter, DBUS_TYPE_ARRAY,
                                      "s", &cols_iter);
    for (i = 0; i < cols; i++, length++) {
      const gchar *result_str = tracker_db_cursor_get_string (cursor, i);
      dbus_message_iter_append_basic (&cols_iter,
                                      DBUS_TYPE_STRING,
                                      &result_str);
    }
    dbus_message_iter_close_container (&rows_iter, &cols_iter);
  }
  dbus_message_iter_close_container (&iter, &rows_iter);
  dbus_g_method_send_reply (info->context, reply);
}

Results

The test is a query on 13500 resources where we ask for two strings, repeated eleven times. I removed a first repeat from each round, because the first time the sqlite3_stmt still has to be created. This means that our measurement would get a few more milliseconds. I also directed the standard out to /dev/null to avoid the overhead created by the terminal. The results you see below are the value for “real”.

There is of course an overhead created by the “tracker-sparql” program. It does demarshaling using normal dbus-glib. If your application uses DBusMessage directly, then it can avoid the same overhead. But since for both rounds I used the same “tracker-sparql” it doesn’t matter for the measurement.

$ time tracker-sparql -q "SELECT ?u  ?m { ?u a rdfs:Resource ;
          tracker:modified ?m }" > /dev/null

Without the optimization:

0.361s, 0.399s, 0.327s, 0.355s, 0.340s, 0.377s, 0.346s, 0.380s, 0.381s, 0.393s, 0.345s

With the optimization:

0.279s, 0.271s, 0.305s, 0.296s, 0.295s, 0.294s, 0.295s, 0.244s, 0.289s, 0.237s, 0.307s

The improvement ranges between 7% and 40% with average improvement of 22%.

Focus on query performance

Every (good) developer knows that copying of memory and boxing, especially when dealing with a large amount of pieces like members of collections and the cells in a table, are a bad thing for your performance.

More experienced developers also know that novice developers tend to focus on just their algorithms to improve performance, while often the single biggest bottleneck is needless boxing and allocating. Experienced developers come up with algorithms that avoid boxing and copying; they master clever pragmatical engineering and know how to improve algorithms. A lot of newcomers use virtual machines and script languages that are terrible at giving you the tools to control this and then they start endless religious debates about how great their programming language is (as if it matters). (Anti-.NET people don’t get on your horses too soon: if you know what you are doing, C# is actually quite good here).

We were of course doing some silly copying ourselves. Apparently it had a significant impact on performance.

Once Jürg and Carlos have finished the work on parallelizing SELECT queries we plan to let the code that walks the SQLite statement fill in the DBusMessage directly without any memory copying or boxing (for marshalling to DBus). We found the get_reply and send_reply functions; they sound useful for this purpose.

I still don’t really like DBus as IPC for data transfer of Tracker’s RDF store’s query results. Personally I think I would go for a custom Unix socket here. But Jürg so far isn’t convinced. Admittedly he’s probably right; he’s always right. Still, DBus to me doesn’t feel like a good IPC for this data transfer..

We know about the requests to have direct access to the SQLite database from your own process. I explained in the bug that SQLite3 isn’t MVCC and that this means that your process will often get blocked for a long time on our transaction. A longer time than any IPC overhead takes.

Supporting ontology changes in Tracker

It used to be in Tracker that you couldn’t just change the ontology. When you did, you had to reboot the database. This means loosing all the non-embedded data. For example your tags or other such information that’s uniquely stored in Tracker’s RDF store.

This was of course utterly unacceptable and this was among the reasons why we kept 0.8 from being released for so long: we were afraid that we would need to make ontology changes during the 0.8 series.

So during 0.7 I added support for what I call modest ontology changes. This means adding a class, adding a property. But just that. Not changing an existing property. This was sufficient for 0.8 because now we could at least do some changes like adding a property to a class, or adding a new class. You know, making implementing the standard feature requests possible.

Last two weeks I worked on supporting more intrusive ontology changes. The branch that I’m working on currently supports changing tracker:notify for the signals on changes feature, tracker:writeback for the writeback features and tracker:indexed which controls the indexes in the SQLite tables.

But also certain range changes are supported. For example integer to string, double and boolean. String to integer, double and boolean. Double to integer, string and boolean. Range changes will sometimes of course mean data loss.

Plenty of code was also added to detect an unsupported ontology change and to ensure that we just abort the process and don’t do any changes in that case.

It’s all quite complex so it might take a while before the other team members have tested and reviewed all this. It should probably take even longer before it hits the stable 0.8 branch.

We wont yet open the doors to custom ontologies. Several reasons:

We want more testing on the support for ontology changes. We know that once we open the doors to custom ontologies that we’ll see usage of this rather sooner than later.
We don’t yet support removing properties and classes. This would be easy (drop the table and columns away and log the event in the journal) but it’s not yet supported. Mostly because we don’t need it ourselves (which is a good reason).
We don’t want you to meddle with the standard ontologies (we’ll do that, don’t worry). So we need a bit of ontology management code to also look in other directories, etc.
The error handling of unsupported ontology changes shouldn’t abort like mentioned above. Another piece of software shouldn’t make Tracker unusable just because they install junk ontologies.
We actually want to start using OSCAF‘s ontology format. Perhaps it’s better that we wait for this instead of later asking everybody to convert their custom ontologies?
We’re a bunch of pussies who are afraid of the can of worms that you guys’ custom ontologies will open.

But yes, you could say that the basics are being put in place as we speak.

Zürichsee

Today after I brought Tinne to the airport I drove around Zürichsee. She can’t stay in Switzerland the entire month; she has to go back to school on Monday.

While driving on the Seestrasse I started counting luxury cars. After I reached two for Lamborgini and three for Ferrari I started thinking: Zimmerberg Sihltal and Pfannenstiel must be expensive districts too… And yes, they are.

I was lucky today that it was nice weather. But wow, what a nice view on the mountain tops when you look south over Zürichsee. People from Zürich, you guys are so lucky! Such immense calming feeling the view gives me! For me, it beats sauna. And I’m a real sauna fan.

I’m thinking to check it out south of Zürich. But not the canton. I think the house prices are just exaggerated high in the canton of Zürich. I was thinking Sankt Gallen, Toggenburg. I’ve never been there; I’ll check it out tomorrow.

Hmmr, meteoswiss gives rain for tomorrow. Doesn’t matter.

Actually, when I came back from the airport the first thing I really did was fix coping with property changes in ontologies for Tracker. Yesterday it wasn’t my day, I think. I couldn’t find this damn problem in my code! And in the evening I lost three chess games in a row against Tinne. That’s really a bad score for me. Maybe after two weeks of playing chess almost every evening, she got better than me? Hmmrr, that’s a troubling idea.

Anyway, so when I got back from the airport I couldn’t resist beating the code problem that I didn’t find on Friday. I found it! It works!

I guess I’m both a dreamer and a realist programmer. But don’t tell my customers that I’m such a dreamer.

Reporting busy status

We’re nearing our first release since very long, so I’ll do another technical blog post about Tracker ;)

When the RDF store is replaying its journal at startup and when the RDF store is restoring a backup it can be in busy state. This means that we can’t handle your DBus requests during that time; your DBus method will be returned late.

Because that’s not very nice from a UI perspective (the uh, what is going on?? -syndrome kicks in) we’re adding a signal emission that emits the progression and status. You can also ask it using DBus methods GetProgress and GetStatus.

The miners already had something like this, so I kept the API more or less the same.

signal sender=:1.99 -> dest=(null destination) serial=1454
  path=/org/freedesktop/Tracker1/Status;
  interface=org.freedesktop.Tracker1.Status; member=Progress
   string "Journal replaying"
   double 0.197824
signal sender=:1.99 -> dest=(null destination) serial=1455
  path=/org/freedesktop/Tracker1/Status;
  interface=org.freedesktop.Tracker1.Status; member=Progress
   string "Journal replaying"
   double 0.698153

Jürg just reviewed the SPARQL regex performance improvement of yesterday, so that’s now in master. If you want this busy status notifying today already you can test with the busy-notifications branch.

Performance improvements for SPARQL’s regex in Tracker

The original SPARQL regex support of Tracker is using a custom SQLite function. But of course back when we wrote it we didn’t yet think much about optimizing. As a result, we were using g_regex_match_simple which of course recompiles the regular expression each time.

Today Jürg and me found out about sqlite3_get_auxdata and sqlite3_set_auxdata which allows us to cache a compiled value for a specific custom SQLite function for the duration of the query.

This is much better:

static void
function_sparql_regex (sqlite3_context *context,
                       int              argc,
                       sqlite3_value   *argv[])
{
  gboolean ret;
  const gchar *text, *pattern, *flags;
  GRegexCompileFlags regex_flags;
  GRegex *regex;

  if (argc != 3) {
    sqlite3_result_error (context, "Invalid argument count", -1);
    return;
  }

  regex = sqlite3_get_auxdata (context, 1);
  text = sqlite3_value_text (argv[0]);
  flags = sqlite3_value_text (argv[2]);
  if (regex == NULL) {
    gchar *err_str;
    GError *error = NULL;
    pattern = sqlite3_value_text (argv[1]);
    regex_flags = 0;
    while (*flags) {
      switch (*flags) {
      case 's': regex_flags |= G_REGEX_DOTALL; break;
      case 'm': regex_flags |= G_REGEX_MULTILINE; break;
      case 'i': regex_flags |= G_REGEX_CASELESS; break;
      case 'x': regex_flags |= G_REGEX_EXTENDED; break;
      default:
        err_str = g_strdup_printf ("Invalid SPARQL regex flag '%c'", *flags);
        sqlite3_result_error (context, err_str, -1);
        g_free (err_str);
        return;
      }
      flags++;
    }
    regex = g_regex_new (pattern, regex_flags, 0, &error);
    if (error) {
      sqlite3_result_error (context, error->message, error->code);
      g_clear_error (&error);
      return;
    }
    sqlite3_set_auxdata (context, 1, regex, (void (*) (void*)) g_regex_unref);
  }
  ret = g_regex_match (regex, text, 0, NULL);
  sqlite3_result_int (context, ret);
  return;
}

Before (this was a test on a huge amount of resources):

$ time tracker-sparql -q "select ?u { ?u a rdfs:Resource . FILTER (regex(?u, '^titl', 'i')) }"
real	0m3.337s
user	0m0.004s
sys	0m0.008s

After:

$ time tracker-sparql -q "select ?u { ?u a rdfs:Resource . FILTER (regex(?u, '^titl', 'i')) }"
real	0m1.887s
user	0m0.008s
sys	0m0.008s

This will hit Tracker’s master today or tomorrow.

Invisible costs

We would rather suffer the visible costs of a few bad decisions than incur the many invisible costs that come from decisions made too slowly – or not at all – because of a stifling bureaucracy.

— Letter by Warren E. Buffett to the shareholders of Berkshire, February 26, 2010

FWD: [Tracker] tracker-miner-rss 0.3

This is the kind of stuff that needs a forward on the planets:

From: Roberto -MadBob- Guido

This is just an update about tracker-miner-rss effort, already mentioned in this list some time ago.

Website, SVN, Last release (0.3)

Since 0.2 we (Michele and me) have just dropped dependency from rss-glib due some limitation found, and created our own Glib-oriented feeds handling library, libgrss, starting from the code of Liferea and adding nice stuffs such as a PubSub subscriber implementation. At the moment it is shipped with tracker-miner-rss itself, in the future may be splitted so to easy usage by other developers.

Next will come integration with libchamplain to describe geographic points found in geo-rss enabled feeds, integration with libedataserver to better handle “person” rappresentation (suggestions for a better PIM-like shared library with useful objects?), and perhaps a first full-featured feed reader using Tracker as backend.

Enjoy :-)

Roberto is doing a demo on FSter at FOSDEM during our presentation. My role in the presentation will be light this year. I decided to give most of the talk away to Rob Taylor and Roberto. I will probably demo Debarshi Ray‘s Solang and if time permits his work on the Nautilus integration. Regretfully Debarshi can’t come and so he asked me to do the demo.

SPARQL subqueries

This style of subqueries will also work (you can do this one without a subquery too, but it’s just an example of course):

SELECT ?name COUNT(?msg)
WHERE {
	?from a nco:Contact  ;
	          nco:hasEmailAddress ?name . {
		SELECT ?from
		WHERE {
			?msg a nmo:Email ;
			         nmo:from ?from .
		}
	}
} GROUP BY ?from

The same query in QtTracker will look like this (I have not tested this, let me know if it’s wrong Iridian):

#include <QObject>
#include <QtTracker/Tracker>
#include <QtTracker/ontologies/nco.h>
#include <QtTracker/ontologies/nmo.h>

void someFunction () {
	RDFSelect outer;
	RDFVariable from;
	RDFVariable name = outer.newColumn<nco::Contact>("name");
	from.isOfType<nco::Contact>();
	from.property<nco::hasEmailAddress>(name);
	RDFSelect inner = outer.subQuery();
	RDFVariable in_from = inner.newColumn("from");
	RDFVariable msg;
	msg.property<nmo::from>(in_from);
	msg.isOfType<nmo::Email>();
	outer.addCountColumn("total messages", msg);
	outer.groupBy(from);
	LiveNodes from_and_count = ::tracker()->modelQuery(outer);
}

What you find in this branch already supports it. You can find early support for subqueries in QtTracker in this branch.

To quickly put some stuff about Emails into your RDF store, read this page (copypaste the turtle examples in a file and use the tracker-import tool). You can also enable our Evolution Tracker plugin, of course.

ps. Yes, somebody should while building a GLib/GObject based client library for Tracker copy ideas from QtTracker.

Bla bla bla, subqueries in SPARQL, bla bla

Coming to you in a few days is what Jürg has been working on for last week.

Yeah, you guess it right by looking at the query below: subqueries!

This example shows you the amount of E-mails each contact has ever sent to you:

SELECT ?address
    (SELECT COUNT(?msg) AS ?msgcnt WHERE { ?msg nmo:from ?from })
WHERE {
    ?from a nco:Contact ;
          nco:hasEmailAddress ?address .
}

The usual warnings apply here: I’m way early with this announcement. It’s somewhat implemented but insanely experimental. The SPARQL spec has something for this in a draft wiki page. Due to lack of error reporting and detection it’s easy to make stuff crash or to get it to generate wrong native SQL queries.

But then again, you guys are developers. You like that!

Why are we doing this? Ah, some team at an undisclosed company was worried about performance and D-Bus overhead: They had to do a lot of small queries after doing a parent query. You know, a bunch of aggregate functions for counts, showing the last message of somebody, stuff like that.

I should probably not mention this feature yet. It’s too experimental. But so exciting!

Anyway, here’s the messy branch and here’s the reviewed stuff for bringing this feature into master.

ps. I wish I could show you guys the query that we support for that team. It’s awesome. I’ll ask around.

Tracker’s write back support now in master

Whoohoo!

We just committed the support for write back in master.

What is it?

Tracker has a limited capability to write metadata back into the data resource. In case of a file that means writing it back into the file. For example writing some of the metadata the user sets using a SPARQL Update back into an MP3 file as ID3 tags.

Which ones do we support already?

Right now the write back capability is under development and only supports a bunch of fields for a few XMP formats (JPEG, PNG and TIFF) and the Title of MP3 files. In near future we will start supporting increasingly more fields.

Documentation?

For people who want to write support for their properties and file formats, read the documentation.

Party like it’s 2009!

August 2025
M	T	W	T	F	S	S
« May
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31