Performance DBus handling of the query results in Tracker’s RDF service – How easy it is to make people believe a lie, and how hard it is to undo that work again

Before

For returning the results of a SPARQL SELECT query we used to have a callback like this. I removed error handling, you can find the original here.

We need to marshal a database result_set to a GPtrArray because dbus-glib fancies that. This is a lot of boxing the strings into GValue and GStrv. It does allocations, so not good.

static void
query_callback(TrackerDBResultSet *result_set,GError *error,gpointer user_data)
{
  TrackerDBusMethodInfo *info = user_data;
  GPtrArray *values = tracker_dbus_query_result_to_ptr_array (result_set);
  dbus_g_method_return (info->context, values);
  tracker_dbus_results_ptr_array_free (&values);
}

void
tracker_resources_sparql_query (TrackerResources *self, const gchar *query,
                                DBusGMethodInvocation *context, GError **error)
{
  TrackerDBusMethodInfo *info = ...; guint request_id;
  TrackerResourcesPrivate *priv= ...; gchar *sender;
  info->context = context;
  tracker_store_sparql_query (query, TRACKER_STORE_PRIORITY_HIGH,
                              query_callback, ...,
                              info, destroy_method_info);
}

After

Last week I changed the asynchronous callback to return a database cursor. In SQLite that means an sqlite3_step(). SQLite returns const pointers to the data in the cell with its sqlite3_column_* APIs.

This means that now we’re not even copying the strings out of SQLite. Instead, we’re using them as const to fill in a raw DBusMessage:

static void
query_callback(TrackerDBCursor *cursor,GError *error,gpointer user_data)
{
  TrackerDBusMethodInfo *info = user_data;
  DBusMessage *reply; DBusMessageIter iter, rows_iter;
  guint cols; guint length = 0;
  reply = dbus_g_method_get_reply (info->context);
  dbus_message_iter_init_append (reply, &iter);
  cols = tracker_db_cursor_get_n_columns (cursor);
  dbus_message_iter_open_container (&iter, DBUS_TYPE_ARRAY,
                                    "as", &rows_iter);
  while (tracker_db_cursor_iter_next (cursor, NULL)) {
    DBusMessageIter cols_iter; guint i;
    dbus_message_iter_open_container (&rows_iter, DBUS_TYPE_ARRAY,
                                      "s", &cols_iter);
    for (i = 0; i < cols; i++, length++) {
      const gchar *result_str = tracker_db_cursor_get_string (cursor, i);
      dbus_message_iter_append_basic (&cols_iter,
                                      DBUS_TYPE_STRING,
                                      &result_str);
    }
    dbus_message_iter_close_container (&rows_iter, &cols_iter);
  }
  dbus_message_iter_close_container (&iter, &rows_iter);
  dbus_g_method_send_reply (info->context, reply);
}

Results

The test is a query on 13500 resources where we ask for two strings, repeated eleven times. I removed a first repeat from each round, because the first time the sqlite3_stmt still has to be created. This means that our measurement would get a few more milliseconds. I also directed the standard out to /dev/null to avoid the overhead created by the terminal. The results you see below are the value for “real”.

There is of course an overhead created by the “tracker-sparql” program. It does demarshaling using normal dbus-glib. If your application uses DBusMessage directly, then it can avoid the same overhead. But since for both rounds I used the same “tracker-sparql” it doesn’t matter for the measurement.

$ time tracker-sparql -q "SELECT ?u  ?m { ?u a rdfs:Resource ;
          tracker:modified ?m }" > /dev/null

Without the optimization:

0.361s, 0.399s, 0.327s, 0.355s, 0.340s, 0.377s, 0.346s, 0.380s, 0.381s, 0.393s, 0.345s

With the optimization:

0.279s, 0.271s, 0.305s, 0.296s, 0.295s, 0.294s, 0.295s, 0.244s, 0.289s, 0.237s, 0.307s

The improvement ranges between 7% and 40% with average improvement of 22%.

6 thoughts on “Performance DBus handling of the query results in Tracker’s RDF service”

tvst says:

April 26, 2010 at 6:11 am

This is great! I have been building a replacement for Gnome’s Application menu that includes search capabilities through tracker. Any improvement to query speed is very welcome! Keep it up :)

Now, a tangentially-related question: What is the best way to do a case-insensitive substring search over all file and folder names using tracker?

Right now I’m using “FILTER fn:contains”, but that is case-sensitive. If tracker supported “fn:lower-case” I could use that, but –alas– it does not.

The second option is to do “fts:match” but that seems like overkill since it does a full-text search. Plus I cannot get it to match some types of documents, like source code.

The third option is to use regex which, again, is overkill for me.

Any thoughts?
Mikkel Kamstrup Erlandsen says:

April 26, 2010 at 7:41 am

Wow. I did wonder precisely how much the boxing madness in DBus glib costs. Turns our the answer is “a lot”. Adding in the trick with the sqlite pointers is a nice touch. Cool!
pvanhoof says:

April 26, 2010 at 10:19 am

@tvst: I was just planning to write: “We’ll add support for that later”, but then I thought “oh comon, this is just five minutes of work”. So here is support for fn:lower-case() :

http://git.gnome.org/browse/tracker/commit/?id=67be56484f33c2feaec9031734d8e2a76f2a5857

Enjoy
tvst says:

April 26, 2010 at 6:00 pm

^ awesome! i will try it out
Mikhail Zabaluev says:

June 17, 2010 at 8:48 am

And of course it is madness to use DBus for retrieving the results in the first place.
pvanhoof says:

June 23, 2010 at 2:45 pm

@Mikhail Zabaluev: Check the FD-passing branches in GNOME’s git. We are working on transferring the data using file descriptor passing. It’s going into master this week.

Comments are closed.

April 2010
M	T	W	T	F	S	S
« Mar				May »
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30