LRU cache for prepared statements in Tracker’s RDF store

While trying to handle a bug that had a description like “if I do this, tracker-store’s memory grows to 80MB and my device starts swapping”, we where surprised to learn that a sqlite3_stmt consumes about 5 kb heap. Auwch.

Before we didn’t think that those prepared statements where very large, so we threw all of them in a hashtable for in case the query was ran again later. However, if you collect thousands of such statements, memory consumption obviously grows.

We decided to implement a LRU cache for these prepared statements. For clients that access the database using direct-access the cache will be smaller, so that max consumption is only a few megabytes. Because our INSERT and DELETE queries are more reusable than SELECT queries, we split it into two different caches per thread.

The implementation is done with a simple intrinsic linked ring list. We’re still testing it a little bit to get good cache-size numbers. I guess it’ll go in master soon. For your testing pleasure you can find the branch here.

Less exciting features also need to be done, return types

We have a feature request to support return types and to give back variable names. We currently return an array (of array) of just strings, with no typing. This doesn’t work very well for knowing whether a cell is (for example) unbound. Empty string isn’t the same as unbound. So what can you do?

With direct-access the implementation is easy, we’ll just read it from the SPARQL engine. We have all this info already anyway. For filedescriptor passing with D-Bus we need to marshal it over the protocol.

Although we might come back to this decision short term, we wont yet do it for our “normal” (non-FD passing) D-Bus query method. SPARQL’s type system is different from D-Bus’s, so we shouldn’t try to match them somehow. Any custom format that we’d come up with, would be arbitrary.

Maybe someday we’ll add another “normal” D-Bus method that gives you a big string with SPARQL Query Results in JSON or SPARQL Query Results in XML back. Right now this has no priority for us, plus it would be a lot slower due to serialization. Post 0.9 everybody should be using libtracker-sparql and that’ll select either FD passing or direct-access.

Anyway, this will likely be the API for Sparql.Cursor. The methods get_value_type and get_variable_name got added.

public enum Tracker.Sparql.ValueType {
	UNBOUND, URI, STRING, INTEGER,
	DOUBLE, DATETIME, BLANK_NODE
}

public abstract class Tracker.Sparql.Cursor : Object {
	public Connection connection { get; }
	public abstract int n_columns { get; }
+	public abstract ValueType get_value_type (int column);
+	public abstract unowned string? get_variable_name (int column);
	public abstract unowned string? get_string (int column, out long length = null);
	public abstract bool next (Cancellable? cancellable = null) throws GLib.Error;
	public async abstract bool next_async (Cancellable? cancellable = null) throws GLib.Error;
	public abstract void rewind ();
}

I usually post about work in progress, not about something that is done. Same this time, of course. You can find the branch where we’re working on this here.