<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>Replicating memes</title>
	<atom:link href="http://pvanhoof.be/blog/index.php/feed/" rel="self" type="application/rss+xml" />
	<link>http://pvanhoof.be/blog</link>
	<description>From the mind of Philip</description>
	<pubDate>Fri, 27 Aug 2010 13:29:13 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
	<language>en</language>
			<item>
		<title>Tracker&#8217;s new class signal system being developed</title>
		<link>http://pvanhoof.be/blog/index.php/2010/08/24/trackers-new-class-signal-system-being-developed</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/08/24/trackers-new-class-signal-system-being-developed#comments</comments>
		<pubDate>Tue, 24 Aug 2010 22:49:53 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Informatics and programming]]></category>

		<category><![CDATA[Tracker]]></category>

		<category><![CDATA[condescending]]></category>

		<category><![CDATA[controversial]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[maemo]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=583</guid>
		<description><![CDATA[Tracker 0.8&#8217;s situation
In Tracker 0.8 we have a signal system that causes quite a bit of overhead. The overhead comes from:

Having to store the URIs of the resources involved in a changeset in tracker-store&#8217;s memory;
Having to store the predicates involved in a changeset in tracker-store&#8217;s memory (less severe than A because we can store a [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Tracker 0.8&#8217;s situation</strong></p>
<p>In Tracker 0.8 we have a <a href="http://live.gnome.org/Tracker/Documentation/SignalsOnChanges#Tracker_0.8">signal system</a> that causes quite a bit of overhead. The overhead comes from:</p>
<ol type="A">
<li>Having to store the URIs of the resources involved in a changeset in <tt>tracker-store</tt>&#8217;s memory;</li>
<li>Having to store the predicates involved in a changeset in <tt>tracker-store</tt>&#8217;s memory (less severe than A because we can store a pointer to an instance instead of a string);</li>
<li>Having to UTF-8 validate the strings when we emit them over D-Bus (D-Bus does this implicitly);</li>
<li>DBus&#8217;s own copying and handling of string data;</li>
<li>Heavy traffic on D-Bus;</li>
<li>Context switching between <tt>tracker-store</tt> and <tt>dbus-daemon</tt>;</li>
<li>We have to wait with turning on the D-Bus objects until after we have the latest ontology. So after journal replay. And we need to reset the situation after a backup restore. Complex!</li>
</ol>
<p><small>Not all aggregators show this list as A, B, C, D, E, F and G. Sorry for that. I&#8217;ll nevertheless refer to the items as such later in this article.</small></p>
<p><strong>Consumer&#8217;s problems with Tracker 0.8&#8217;s signal</strong></p>
<ol type="1">
<li>Aforementioned overhead: consumes a lot of D-Bus traffic. This is caused by sending over URLs for the subjects and the predicates;</li>
<li>Doesn&#8217;t make it possible, in case of a delete of <tt>&lt;a&gt;</tt>, to know <tt>&lt;b&gt;</tt> in <tt>&lt;a&gt; nfo:isLogicalPartOf &lt;b&gt;</tt>, as <tt>&lt;a&gt;</tt> is removed at the point of signal emission;</li>
<li>Round trips to know the literals create more D-Bus traffic;</li>
<li>Transactional changes can&#8217;t be reliably identified with <tt>SubjectsAdded</tt>, <tt>SubjectsChanged</tt> and <tt>SubjectsRemoved</tt> being separate signals;</li>
<li>A lot of D-Bus objects, instead of letting clients use D-Bus&#8217;s filtering system.</li>
</ol>
<p><strong>The <a href="http://git.gnome.org/browse/tracker/log/?h=class-signal">solution that we&#8217;re developing</a> for Tracker 0.9</strong></p>
<p><em><strong>Direct access</strong></em></p>
<p>With direct-access we remove most of the round-trip cost of a query coming from a consumer that wants a literal object involved in a changeset: by utilizing the <a href="http://git.gnome.org/browse/tracker/tree/src/libtracker-sparql/tracker-cursor.vala"><tt>TrackerSparqlCursor</tt></a> API with direct-access enabled, you end up doing <tt>sqlite3_step()</tt> in your own process, directly on meta.db.</p>
<p>For the consumers of the signal, this removes <strong>3</strong>.</p>
<p><em><strong>Sending integer IDs instead of string URIs<br />
</strong></em></p>
<p>A while ago we introduced the SPARQL function <tt>tracker:id(<span style="color: #0000ff;">resource</span> uri)</tt>. The <tt>tracker:id(<span style="color: #0000ff;">resource</span> uri)</tt> function gives you a unique number that Tracker&#8217;s RDF store uses internally.</p>
<p>Each resource, each class and each predicate (latter are resources like any other) have such an unique internal ID.</p>
<p>Given that Tracker&#8217;s class signal system is specific anyway, we decided not to give you subject URL strings. Instead, we&#8217;ll give you the integer IDs.</p>
<p>The <tt>Writeback</tt> signal also got changed to do this, for the same reasons. But this API is entirely internal and shouldn&#8217;t be used outside of the project.</p>
<p>This for us removes <strong>A</strong>, <strong>B</strong>, <strong>C</strong>, <strong>D</strong> and <strong>E</strong>. For the consumers of the signal, this removes <strong>1</strong>.</p>
<p><em><strong>Merge added, changed and removed into the one signal</strong></em></p>
<p>We give you two arrays in one signal: <tt>inserts</tt> and <tt>deletes</tt>.</p>
<p>For consumers of the signal, this removes <strong>4</strong>.</p>
<p><em><strong>Add the class name to the signal</strong></em></p>
<p>This allows you to use a string filter on your signal subscription in D-Bus.</p>
<p>For us this removes <strong>G</strong>. For consumers of the signal, this removes <strong>5</strong>.</p>
<p><em><strong>Pass the object-id for resource objects</strong></em></p>
<p>You&#8217;ll get a third number in the <tt>inserts</tt> and <tt>deletes</tt> arrays: <tt>object-id</tt>. We don&#8217;t send object literals, although for integral objects we&#8217;re still discussing this. But for resource objects we give without much extra cost the <tt>object-id</tt>.</p>
<p>For consumers of the signal, this removes <strong>2</strong>.</p>
<p><strong><em>SPARQL IN, tracker:id(<span style="color: #0000ff;">resource</span> uri) and tracker:uri(<span style="color: #0000ff;">int</span> id)</em></strong></p>
<p>We recently added support for SPARQL IN, we already had <tt>tracker:id(<span style="color: #0000ff;">resource</span> uri)</tt> and I implemented <tt>tracker:uri(<span style="color: #0000ff;">int</span> id)</tt>.</p>
<p>This makes things like this possible:</p>
<pre>SELECT ?t { ?r nie:title ?t .
            FILTER (tracker:id(?r) IN (800, 801, 802, 807)) }</pre>
<p>Where 800, 801, 802 and 807 will be the IDs that you receive in the class signal. And with <tt>tracker:uri(<span style="color: #0000ff;">int</span> id)</tt> it goes like:</p>
<pre>SELECT tracker:uri (800) tracker:uri (801)
       tracker:uri (802) tracker:uri (807) { }</pre>
<p>For consumers this removes most of the burden introduced by the IDs.</p>
<p><strong><em>Context switching of processes</em></strong></p>
<p>What is left is context switching between <tt>tracker-store</tt> and <tt>dbus-daemon</tt>, <strong>F</strong>. Mostly important for mobile targets (ARM hardware). We reduce them by grouping transactions together and then bursting larger sets. It&#8217;s both timeout and data-size based (after either a certain amount of time, or a certain memory limit, we emit). We&#8217;re still testing what the most ideal timeouts and sizes are on target hardware.</p>
<p><strong><em>Where is the stuff?</em></strong></p>
<p>The work isn&#8217;t yet reviewed nor thoroughly tested. This will happen next few days and weeks.</p>
<p>Anyway, here&#8217;s <a title="Warning: branch will be rebased soon" href="http://git.gnome.org/browse/tracker/log/?h=class-signal">the branch</a>, <a href="http://live.gnome.org/Tracker/Documentation/SignalsOnChanges#Tracker_0.9">documentation</a>, <a title="Warning: File might be moved around" href="http://git.gnome.org/browse/tracker/tree/examples/class-signal/class-signal.c?h=class-signal">example in Plain C</a>, <a title="Warning: File might be moved around" href="http://git.gnome.org/browse/tracker/tree/tests/functional-tests/class-signal-test.vala?h=class-signal">example in Vala</a></p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/08/24/trackers-new-class-signal-system-being-developed/feed</wfw:commentRss>
		</item>
		<item>
		<title>Support for SPARQL IN and NOT IN, the new class signals</title>
		<link>http://pvanhoof.be/blog/index.php/2010/08/11/support-for-sparql-in-and-not-in-the-new-class-signals</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/08/11/support-for-sparql-in-and-not-in-the-new-class-signals#comments</comments>
		<pubDate>Wed, 11 Aug 2010 13:01:43 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Informatics and programming]]></category>

		<category><![CDATA[Tracker]]></category>

		<category><![CDATA[condescending]]></category>

		<category><![CDATA[controversial]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[extremely condescending]]></category>

		<category><![CDATA[extremely controversial]]></category>

		<category><![CDATA[extremely personal]]></category>

		<category><![CDATA[maemo]]></category>

		<category><![CDATA[very condescending]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=582</guid>
		<description><![CDATA[I made some documentation about our SPARQL-IN feature that we recently added. I added some interesting use-cases like doing an insert and a delete based on in values.
For the new class signal API that we&#8217;re developing this and next week, we&#8217;ll probably emit the IDs that tracker:id() would give you if you&#8217;d use that on [...]]]></description>
			<content:encoded><![CDATA[<p>I made <a href="http://live.gnome.org/Tracker/Documentation/Examples/SPARQL/InSupport">some documentation</a> about our <a title="One more time, the link. By now everybody should have read this!" href="http://www.w3.org/TR/rdf-sparql-query/">SPARQL</a>-IN feature that we recently added. I added some interesting use-cases like doing an insert and a delete based on in values.</p>
<p>For the new <a title="When I wrote this article, this link was pointing to the old API, not the new yet. Let us first develop it :-)" href="http://live.gnome.org/Tracker/Documentation/SignalsOnChanges">class signal API</a> that we&#8217;re developing this and next week, we&#8217;ll probably emit the IDs that <em>tracker:id()</em> would give you if you&#8217;d use that on a resource. This means that IN is <a href="http://live.gnome.org/Tracker/Documentation/Examples/SPARQL/InSupport#Use_tracker:id.28.29_with_IN">very useful</a> for the purpose of giving you metadata of resources that are in the list of IDs that you just received from the class signal.</p>
<p>We never documented <em>tracker:id()</em> very much, as it&#8217;s not an RDF standard; rather it&#8217;s something <a href="http://www.tracker-project.org/">Tracker</a> specific. But neither are the class signals a RDF standard; they are Tracker specific too. I guess here that makes it usable in combo and turns the status of &#8216;internal API&#8217;, irrelevant.</p>
<p>We&#8217;re right now prototyping the new class signals API. It&#8217;ll probably be a <em>&#8220;sa(iii)a(iii)&#8221;</em>:</p>
<p>That&#8217;s class-name and two arrays of subject-id, predicate-id, object-id. The class-name is to allow D-Bus filtering. The first array are the deletes and the second are the inserts. We&#8217;ll only give you object-ids of non-literal objects (literal objects have no internal object-id). This means that we don&#8217;t throw literals to you in the signal (you need to make a query to get them, we&#8217;ll throw 0 to you in the signal).</p>
<p>We give you the object-ids because of a use-case that we didn&#8217;t cover yet:</p>
<p>Given triple <em>&lt;a&gt; nie:isLogicalPartOf &lt;b&gt;.</em> When <em>&lt;a&gt;</em> is deleted, how do you know <em>&lt;b&gt;</em> during the signal? So the feature request was to do a <em>select ?b { &lt;a&gt; nie:isLogicalPartOf ?b }</em> when <em>&lt;a&gt;</em> is deleted (so the client couldn&#8217;t do that query anymore).</p>
<p>With the new signal we&#8217;ll give you the ID of <em>&lt;b&gt;</em> when <em>&lt;a&gt;</em> is deleted. We&#8217;ll also implement a <em>tracker:uri(integer id)</em> allowing you to get &lt;b&gt; out of that ID. It&#8217;ll do something like this, but then much faster: <em>select ?subject { ?subject a rdfs:Resource . FILTER (tracker:id(?subject) IN (%d)) }</em></p>
<p>I know there will be people screaming for all objects, also literals, in the signals, but we don&#8217;t want to flood your D-Bus daemon with all that data. Scream all you want. Really, we don&#8217;t. Just do a roundtrip query.</p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/08/11/support-for-sparql-in-and-not-in-the-new-class-signals/feed</wfw:commentRss>
		</item>
		<item>
		<title>&#8220;You&#8217;re just making an excuse&#8221; is a relative phrase</title>
		<link>http://pvanhoof.be/blog/index.php/2010/08/10/youre-just-making-an-excuse-is-a-relative-phrase</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/08/10/youre-just-making-an-excuse-is-a-relative-phrase#comments</comments>
		<pubDate>Tue, 10 Aug 2010 19:39:43 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Art &#38; culture]]></category>

		<category><![CDATA[Personal]]></category>

		<category><![CDATA[Philosophy]]></category>

		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[condescending]]></category>

		<category><![CDATA[controversial]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[extremely condescending]]></category>

		<category><![CDATA[extremely controversial]]></category>

		<category><![CDATA[extremely personal]]></category>

		<category><![CDATA[very condescending]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=581</guid>
		<description><![CDATA[I recently stumbled upon this marvelous piece. I title the quote &#8220;making an excuse&#8220;:
Saying that you&#8217;re forced to do something when you really aren&#8217;t is a failure to take responsibility for your actions.  I generally don&#8217;t think users of proprietary software are primarily to blame for the challenges of software freedom — nearly all [...]]]></description>
			<content:encoded><![CDATA[<p>I recently stumbled upon this marvelous piece. I title the quote &#8220;<a href="http://ebb.org/bkuhn/blog/2010/08/09/have-to-use.html">making an excuse</a>&#8220;:</p>
<blockquote><p>Saying that you&#8217;re forced to do something when you really aren&#8217;t is a failure to take responsibility for your actions.  I generally don&#8217;t think users of proprietary software are primarily to blame for the challenges of software freedom — nearly all the blame lies with those who write, market, and distribute proprietary software. However, I think that software users should be clear about why they are using the software.  It&#8217;s quite rare for someone to be compelled under threat of economic (or other) harm to use proprietary software.  Therefore, only rarely is it justifiable to say you have to use proprietary software. In most cases, saying so is just making an excuse.</p>
<p><a href="http://ebb.org/bkuhn/blog/2010/08/09/have-to-use.html">Bradley M. Kuhn - 2010, on his blog</a></p></blockquote>
<p>I&#8217;ll translate this for you to Catholicism. You can definitely adapt this to most religions (for some, add death penalties like stoning here and there):</p>
<blockquote><p>Saying that you&#8217;re forced by your nature to masturbate when you really aren&#8217;t is a failure to take responsibility for your actions. The church generally doesn&#8217;t think masturbaters are primarily to blame for the challenges of sexuality — nearly all the blame lies with pornography. However, I think that people who masturbate should be clear about why they have sex with themselves: It&#8217;s quite rare for someone to be compelled under the desire of sexual pleasure. Therefore, only rarely is it justifiable to say you have to masturbate. In most cases, saying so is just making an excuse.</p>
<p>The translation
</p></blockquote>
<p>There you go.</p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/08/10/youre-just-making-an-excuse-is-a-relative-phrase/feed</wfw:commentRss>
		</item>
		<item>
		<title>Tracker this, Tracker that, everything Tracker</title>
		<link>http://pvanhoof.be/blog/index.php/2010/07/30/tracker-this-tracker-that-everything-tracker</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/07/30/tracker-this-tracker-that-everything-tracker#comments</comments>
		<pubDate>Fri, 30 Jul 2010 13:45:47 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Informatics and programming]]></category>

		<category><![CDATA[Tracker]]></category>

		<category><![CDATA[condescending]]></category>

		<category><![CDATA[controversial]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[extremely condescending]]></category>

		<category><![CDATA[extremely controversial]]></category>

		<category><![CDATA[extremely personal]]></category>

		<category><![CDATA[maemo]]></category>

		<category><![CDATA[very condescending]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=580</guid>
		<description><![CDATA[Busy handling
I made an article about reporting busy status in Tracker before.
But then it wasn&#8217;t yet possible to queue a query while Tracker&#8217;s RDF store is busy. We&#8217;re making this possible following next unstable release. Yeah I know you guys hate that Tracker&#8217;s RDF store can be busy. But you tell us what else to [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Busy handling</strong></p>
<p>I made <a href="http://pvanhoof.be/blog/index.php/2010/03/26/reporting-busy-status">an article about reporting busy status</a> in <a href="http://www.tracker-project.org/">Tracker</a> before.</p>
<p>But then it wasn&#8217;t yet possible to queue a query while Tracker&#8217;s RDF store is busy. We&#8217;re <a href="http://git.gnome.org/browse/tracker/log/?h=busy-handling">making this possible</a> following next unstable release. Yeah I know you guys hate that Tracker&#8217;s RDF store <em>can</em> be busy. But you tell us what else to do while restoring a backup, or while replaying a journal?</p>
<p>While we are replaying the journal, or restoring a backup, we&#8217;ll accept your result-hungry queries into our queue. Meanwhile you get progress and status indication over a DBus signal. Some documentation about this is available <a href="http://live.gnome.org/Tracker/Documentation/BusyHandling">here</a>.</p>
<p><strong>SPARQL 1.1 Draft features: IN and NOT IN</strong></p>
<p>We had a feature requests for supporting SPARQL <a href="http://www.w3.org/TR/sparql11-query/#func-in">IN</a> and <a href="http://www.w3.org/TR/sparql11-query/#func-not-in">NOT IN</a>. As <a href="http://pvanhoof.be/blog/index.php/2009/12/09/sparql-subqueries">usual</a>, we&#8217;re ahead of the <a href="http://www.w3.org/TR/sparql11-query/">SPARQL Draft specification</a>. But I don&#8217;t think IN and NOT IN will look much different in the end. Anyway, it was straightforward so I just <a href="http://git.gnome.org/browse/tracker/commit/?id=fba8b7524b1a2b80425b1dcaf8a5be2377b86570">implemented</a> <a href="http://git.gnome.org/browse/tracker/commit/?id=1801c334db20f31ee7f0ce583a0c9580d05f9a0a">both</a>.</p>
<p>It goes like this:</p>
<pre>SELECT ?abc { ?abc a nie:InformationElement ;
                   nie:title ?title .
               FILTER (?title IN ('abc', 'def')) }</pre>
<pre>SELECT ?abc { ?abc a nie:InformationElement ;
                   nie:title ?title .
               FILTER (?title NOT IN ('xyz', 'def')) }</pre>
<p>It&#8217;s particularly useful to get metadata about a defined set of resources (give me the author of this, this and that file)</p>
<p><strong>Direct access</strong></p>
<p><a href="http://git.gnome.org/browse/tracker/log/?h=direct-access">This work</a> is progressing nicely. Most of the guys on the team are working on this, and it&#8217;s going to be awesome thanks to <a title="And SQLite-has-no-MVCC critics, please educate yourself about this. I'm convinced that with WAL, SQLite covers our use-case just fine" href="http://www.sqlite.org/draft/wal.html">SQLite&#8217;s WAL journal mode</a>. SQLite&#8217;s WAL mode is still under development and probably unstable here and there, but we&#8217;re trusting the SQLite guys with this anyway.</p>
<p>What is left to do for direct-access is cleaning up a bit, getting the small nasty things right. You know. The basics are all in place now.</p>
<p>We&#8217;re doing most of the library code in <a href="http://live.gnome.org/Vala">Vala</a>, but clever people can easily imagine the C API valac makes from the .vala files <a href="http://git.gnome.org/browse/tracker/tree/src/libtracker-sparql?h=direct-access">here</a>. That&#8217;s the abstract API that client developers will use. Unless you use a higher level API like <a href="http://maemo.gitorious.org/maemo-af/libqttracker">libqttracker</a>, <a href="http://maemo.gitorious.org/maemo-af/qsparql">QSparql, </a><a href="http://blogs.gnome.org/abustany/2010/07/22/hormiga-now-with-collections-too">Hormiga</a> or <a href="https://labs.codethink.co.uk/index.php/p/sparql-glib/">sparql-glib</a>.</p>
<p>All of which still need to be adapted to the direct-access work that we&#8217;re doing. But we&#8217;re in close contact with all of the developers involved in those libraries. And they&#8217;re all thrilled to implement backends for the new stuff.</p>
<p><strong>Plans</strong></p>
<p>We plan to change the <a href="http://live.gnome.org/Tracker/Documentation/SignalsOnChanges">signals-on-changes</a> or class-signals feature a bit so that the three signals are merged into one. The problem with three is that you can&#8217;t reliably identify a change-transaction this way (a rename of a file, for example).</p>
<p>Another thing on our list is merging <a href="http://git.gnome.org/browse/tracker/log/?h=zeitgeist">Zeitgeist&#8217;s ontology</a>. To the other team members at Tracker: guys, Zeitgeist has been waiting for three months now. Let&#8217;s just get this done!</p>
<p>Oh there are <a title="A very very short list of the plans" href="http://live.gnome.org/Tracker/Roadmap">a lot of plans</a>, to be honest.</p>
<p>I wonder when, if ever, we go in feature freeze. Hehe. I guess we&#8217;ll just have very short feature-freeze periods. Whatever, it&#8217;s fun.</p>
<p><strong>MeeGo in cars</strong></p>
<p><a href="http://www.genivi.org">Hey BMW &amp; co</a>, if you guys want to learn how to write music players and playlists for car entertainment on MeeGo, get in touch! This <a href="http://www.tracker-project.org/">Tracker</a> that I&#8217;m talking about is on that <a href="http://meego.com">MeeGo OS</a>; being the Music&#8217;s metadata database is among its purposes.</p>
<p>I can&#8217;t wait to have a better music player playlist my car.</p>
<p>Or maybe some integration with the in-car GPS and the car owner&#8217;s appointments and meetings? With geo-tagged photos on the car owner&#8217;s phone? Automatic and instant synchronization with Nokia&#8217;s future phones? Sounds all very doable, even easy, to me. I&#8217;d want all that stuff. Use-cases!</p>
<p>Let&#8217;s talk!</p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/07/30/tracker-this-tracker-that-everything-tracker/feed</wfw:commentRss>
		</item>
		<item>
		<title>De sociale bijdrage voor zelfstandigen</title>
		<link>http://pvanhoof.be/blog/index.php/2010/07/29/de-sociale-bijdrage-voor-zelfstandigen</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/07/29/de-sociale-bijdrage-voor-zelfstandigen#comments</comments>
		<pubDate>Thu, 29 Jul 2010 12:49:53 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Finance]]></category>

		<category><![CDATA[Personal]]></category>

		<category><![CDATA[Politics]]></category>

		<category><![CDATA[condescending]]></category>

		<category><![CDATA[controversial]]></category>

		<category><![CDATA[extremely condescending]]></category>

		<category><![CDATA[extremely controversial]]></category>

		<category><![CDATA[extremely personal]]></category>

		<category><![CDATA[very condescending]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=579</guid>
		<description><![CDATA[Een zelfstandige in België hoort zijn sociale bijdrage (bv. per kwartaal) vooraf te betalen. Je bent een debiel als je dat niet doet, want dan vragen ze na vier jaar lekker veel interest op het hele bedrag.
Iedereen die je tegen het lijf loopt wanneer je je firma opstart zal het je ook opnieuw zeggen. De [...]]]></description>
			<content:encoded><![CDATA[<p>Een zelfstandige in België hoort zijn sociale bijdrage (bv. per kwartaal) vooraf te betalen. Je bent een debiel als je dat niet doet, want dan vragen ze na vier jaar lekker veel interest op het hele bedrag.</p>
<p>Iedereen die je tegen het lijf loopt wanneer je je firma opstart zal het je ook opnieuw zeggen. De mensen bij Unizo, op de cursus boekhouden, mijn boekhouder, de mensen van de bank en zelfs mijn notaris was het aan het uitleggen bij de oprichting. En allemaal met een dringende toon: doe dit, vergeet dat niet. Vergeet dat écht niet. Écht niet!</p>
<p>Je bent dus onwenselijk dom als je het toch niet doet. Maarja, dat er domme mensen bestaan is geen nieuws.</p>
<p>Wat weinig mensen weten is dat je het zelfs kan omdraaien: in tegenstelling tot voorafbetalingen van vennootschapsbelastingen, krijg je voor voorafbetalingen van je sociale bijdrage wél interest op het teveel betaalde bedrag.</p>
<p>En dat is een interest die momenteel hoger ligt dan wat je op een ferme spaarrekening krijgt.</p>
<p>Uiteraard moet je gokken wat je zoal gemiddeld zal verdienen op vier jaar. Dus uiteraard mag je dat vrij hoog inschatten. Weet jij misschien precies hoeveel meer winst je over enkele jaren zal maken? Nou ik niet. En ik geef mezelf uiteraard meer salaris wanneer er meer winst is, meneer de controleur. Maar ik kon het niet weten dat er na vier jaar toch niet zoveel winst was! Tja!</p>
<p>Dus, schat je dat <em>vrij hoog</em> in. En betaal je vier jaar lang te veel sociale bijdrage. Na vier jaar storten ze het teveel terug, mét een hoge interest.</p>
<p>Netjes toch?</p>
<p>Ik denk dat ik dit ga moeten vieren!</p>
<p>Nu niet teveel van jullie freelancers dit gaan doen he! Ik wil nog een paar jaartjes genieten van hun &#8220;probleem&#8221; ;-)</p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/07/29/de-sociale-bijdrage-voor-zelfstandigen/feed</wfw:commentRss>
		</item>
		<item>
		<title>Julian on TED</title>
		<link>http://pvanhoof.be/blog/index.php/2010/07/27/julian-on-ted</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/07/27/julian-on-ted#comments</comments>
		<pubDate>Tue, 27 Jul 2010 19:12:34 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Art &#38; culture]]></category>

		<category><![CDATA[Personal]]></category>

		<category><![CDATA[Philosophy]]></category>

		<category><![CDATA[Politics]]></category>

		<category><![CDATA[condescending]]></category>

		<category><![CDATA[controversial]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[extremely condescending]]></category>

		<category><![CDATA[extremely controversial]]></category>

		<category><![CDATA[extremely personal]]></category>

		<category><![CDATA[very condescending]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=578</guid>
		<description><![CDATA[I try to avoid posting about the same subject twice in a row. But I also really think that Wikileaks is worth violating about any such rule in existence. Maybe I should make a category on my blog just for Wikileaks?
So TED has decided to do an interview with Julian Assange:

I&#8217;d like to point out [...]]]></description>
			<content:encoded><![CDATA[<p>I try to avoid posting about the same subject twice in a row. But I also really think that <a href="http://www.wikileaks.org/">Wikileaks</a> is worth violating about any such rule in existence. Maybe I should make a category on my blog just for Wikileaks?</p>
<p>So <a href="http://www.ted.com">TED</a> has decided to do <a href="http://www.ted.com/talks/julian_assange_why_the_world_needs_wikileaks.html">an interview with Julian Assange</a>:</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="446" height="326" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="wmode" value="transparent" /><param name="bgColor" value="#ffffff" /><param name="flashvars" value="vu=http://video.ted.com/talks/dynamic/JulianAssange_2010G-medium.flv&amp;su=http://images.ted.com/images/ted/tedindex/embed-posters/JulianAssange-2010G.embed_thumbnail.jpg&amp;vw=432&amp;vh=240&amp;ap=0&amp;ti=918&amp;introDuration=15330&amp;adDuration=4000&amp;postAdDuration=830&amp;adKeys=talk=julian_assange_why_the_world_needs_wikileaks;year=2010;theme=media_that_matters;theme=war_and_peace;theme=new_on_ted_com;theme=a_taste_of_tedglobal_2010;event=TEDGlobal+2010;&amp;preAdTag=tconf.ted/embed;tile=1;sz=512x288;" /><param name="src" value="http://video.ted.com/assets/player/swf/EmbedPlayer.swf" /><embed type="application/x-shockwave-flash" width="446" height="326" src="http://video.ted.com/assets/player/swf/EmbedPlayer.swf" flashvars="vu=http://video.ted.com/talks/dynamic/JulianAssange_2010G-medium.flv&amp;su=http://images.ted.com/images/ted/tedindex/embed-posters/JulianAssange-2010G.embed_thumbnail.jpg&amp;vw=432&amp;vh=240&amp;ap=0&amp;ti=918&amp;introDuration=15330&amp;adDuration=4000&amp;postAdDuration=830&amp;adKeys=talk=julian_assange_why_the_world_needs_wikileaks;year=2010;theme=media_that_matters;theme=war_and_peace;theme=new_on_ted_com;theme=a_taste_of_tedglobal_2010;event=TEDGlobal+2010;&amp;preAdTag=tconf.ted/embed;tile=1;sz=512x288;" bgcolor="#ffffff" wmode="transparent" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>I&#8217;d like to point out that I congratulate and thank everybody, not just but also Julian, who&#8217;s involved. Thank you.</p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/07/27/julian-on-ted/feed</wfw:commentRss>
		</item>
		<item>
		<title>That today &#8217;s gonna be a good day</title>
		<link>http://pvanhoof.be/blog/index.php/2010/07/26/that-today-s-gonna-be-a-good-day</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/07/26/that-today-s-gonna-be-a-good-day#comments</comments>
		<pubDate>Mon, 26 Jul 2010 11:56:09 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Art &#38; culture]]></category>

		<category><![CDATA[Personal]]></category>

		<category><![CDATA[Philosophy]]></category>

		<category><![CDATA[Politics]]></category>

		<category><![CDATA[condescending]]></category>

		<category><![CDATA[controversial]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[extremely condescending]]></category>

		<category><![CDATA[extremely controversial]]></category>

		<category><![CDATA[extremely personal]]></category>

		<category><![CDATA[very condescending]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=577</guid>
		<description><![CDATA[Today is the day the world is witnessing the most significant military leak in the history of mankind, so I have a feeling that today &#8217;s gonna be a good day.
To all the people at Wikileaks, and to all whistle blowers in past, present and future: you are heroes. You guy&#8217;s ideas will be with [...]]]></description>
			<content:encoded><![CDATA[<p>Today is the day the world is witnessing the most significant military leak in the history of mankind, so I have a feeling that today &#8217;s gonna be a good day.</p>
<p>To all the people at Wikileaks, and to all whistle blowers in past, present and future: you are heroes. You guy&#8217;s ideas will be with us for centuries ahead of us. You&#8217;ll be remembered in history books. Let&#8217;s make sure you guys will.</p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/07/26/that-today-s-gonna-be-a-good-day/feed</wfw:commentRss>
		</item>
		<item>
		<title>Manderlay, I always wanted to write about it</title>
		<link>http://pvanhoof.be/blog/index.php/2010/07/22/manderlay-i-always-wanted-to-write-about-it</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/07/22/manderlay-i-always-wanted-to-write-about-it#comments</comments>
		<pubDate>Thu, 22 Jul 2010 14:33:49 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Art &#38; culture]]></category>

		<category><![CDATA[Personal]]></category>

		<category><![CDATA[Philosophy]]></category>

		<category><![CDATA[condescending]]></category>

		<category><![CDATA[controversial]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[extremely condescending]]></category>

		<category><![CDATA[extremely controversial]]></category>

		<category><![CDATA[extremely personal]]></category>

		<category><![CDATA[very condescending]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=576</guid>
		<description><![CDATA[I&#8217;m into Lars Von Trier&#8217;s movies last few days. First with Dear Wendy, then The boss of it all and yesterday I was watching Manderlay together with a girlfriend.
It wasn&#8217;t the first time that I saw the movie; I think the third time or something. But I&#8217;m still convinced that the movie is even better [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m into <a href="http://en.wikipedia.org/wiki/Lars_von_Trier">Lars Von Trier</a>&#8217;s movies last few days. First with <a href="http://en.wikipedia.org/wiki/Dear_Wendy">Dear Wendy</a>, then <a href="http://en.wikipedia.org/wiki/The_Boss_of_It_All">The boss of it all</a> and yesterday I was watching <a href="http://en.wikipedia.org/wiki/Manderlay">Manderlay</a> together with a girlfriend.</p>
<p>It wasn&#8217;t the first time that I saw the movie; I think the third time or something. But I&#8217;m still convinced that the movie is even better than <a href="http://en.wikipedia.org/wiki/Dogville">Dogville</a>, about which <a href="http://pvanhoof.be/blog/index.php/2007/03/26/dogville-assertiveness">I wrote</a> <a href="http://pvanhoof.be/blog/index.php/2006/12/25/dogville">a few years ago</a> that it&#8217;s the best movie I ever saw.</p>
<p>Don&#8217;t listen to the U.S. critics. In their struggle not to see the world from a U.S. point of view, they don&#8217;t understand what it&#8217;s about (it&#8217;s not really about slavery). I guess Lars Von Trier carefully selects his audience.</p>
<p>The movie Manderlay, like Dogville, has a (hidden) morality. Even more than Dogville, which <a href="http://pvanhoof.be/blog/index.php/2007/03/26/dogville-assertiveness">is basically about the moral necessity of assertiveness</a>, is Manderlay a movie that tries to make you think. In my case about the failure of only using assertiveness to educate people (about) a new reality. Also about the failure of using democratic voting for every issue (ownership of a tool). And about the necessity of a law system: no matter how moral, or, immoral; it&#8217;s still better than absolute freedom - people need a law -. But with &#8220;freedom&#8221; being some sort of piece of shit ideological word among many readers of my blog, I&#8217;m sure many wont understand what I mean with that. I try to carefully select my audience. <small> I&#8217;m not against &#8220;freedom&#8221;, just against its naive interpretations. Especially the &#8220;anarchy&#8221;-ones.</small></p>
<p>So Manderlay dances with the morals in Dogville. Both movies are part of a <a href="http://en.wikipedia.org/wiki/Lars_von_Trier#Trilogies">trilogy</a>, so I guess that makes sense.</p>
<p>I&#8217;m grateful that Lars carefully selects his audience. You don&#8217;t create art by appeasement.</p>
<p>Looking forward to <a href="http://en.wikipedia.org/wiki/Wasington">Wasington</a>, the last part of this trilogy.</p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/07/22/manderlay-i-always-wanted-to-write-about-it/feed</wfw:commentRss>
		</item>
		<item>
		<title>Why make things complicated?</title>
		<link>http://pvanhoof.be/blog/index.php/2010/07/19/why-make-things-complicated</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/07/19/why-make-things-complicated#comments</comments>
		<pubDate>Mon, 19 Jul 2010 16:05:19 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Informatics and programming]]></category>

		<category><![CDATA[Philosophy]]></category>

		<category><![CDATA[Politics]]></category>

		<category><![CDATA[condescending]]></category>

		<category><![CDATA[controversial]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[extremely condescending]]></category>

		<category><![CDATA[extremely controversial]]></category>

		<category><![CDATA[very condescending]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=575</guid>
		<description><![CDATA[There are no open source companies. There are companies and there are open source projects.
Some companies work on open source projects, some parent open source projects, some don&#8217;t.
Some of those companies are good at fostering a community that contributes to these open source projects. Others are unwilling and some don&#8217;t yet understand the process. And [...]]]></description>
			<content:encoded><![CDATA[<p>There are no open source companies. There are companies and there are open source projects.</p>
<p>Some companies work on open source projects, some parent open source projects, some don&#8217;t.</p>
<p>Some of those companies are good at fostering a community that contributes to these open source projects. Others are unwilling and some don&#8217;t yet understand the process. And again others have many open source projects being done by teams that do get it and have at the same time other projects being done by teams that don&#8217;t get it. Actually that last dual situation is the most common among the large companies. You know, the ones that often sponsor your community&#8217;s main conference and the ones that employ your heros.</p>
<p>If you do a quick reality-check then you&#8217;ll conclude there are no black / white companies. Actually, nothing in life nor in ethics is black / white. Nothing at all.</p>
<p>What you do have is a small group of amazingly disturbing purists who do zero coding themselves (that is, near zero) but do think black / white, and consequently write a lot of absurd nonsense in blog post-comments, on slashdot in particular, forums and mailing lists. These people are the reason numéro uno why many companies quit trying to understand open source.</p>
<p>It&#8217;s sad that the actual (open source) developers have to waste time explaining companies, for whom they do consultancy, that these people can be ignored. It&#8217;s also sad that these purists have turned so vocal, even violent, that they often can&#8217;t really be ignored anymore: people&#8217;s employers have been harassed.</p>
<p>&#8220;You have to fire somebody because he&#8217;s being unethical by disagreeing with my religious believe-system that Microsoft is evil!&#8221;. Maybe it&#8217;s just me who&#8217;s behind on ethics in this world? Well, those people can still get lost because I, in ethics, disagree with them.</p>
<p>Now, let&#8217;s get back to the projects and away from the open source vs. open core debates. We have a lot of work to do. And a lot of companies to convince opening their projects.</p>
<p>Open source developers succeeded in (for example) getting some software on phones. The people who did aren&#8217;t the religious black / white people. Maybe the media around open source should track down the people who did, and write quite a bit more about their work, ideas and passion?</p>
<p>Finally, the best companies are driven by the ideas and passions of their best employees. Those are the people who you should admire. Not their company&#8217;s open core PR.</p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/07/19/why-make-things-complicated/feed</wfw:commentRss>
		</item>
		<item>
		<title>Neelie Kroes on open source</title>
		<link>http://pvanhoof.be/blog/index.php/2010/07/15/neelie-kroes-on-open-source</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/07/15/neelie-kroes-on-open-source#comments</comments>
		<pubDate>Thu, 15 Jul 2010 10:18:53 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Informatics and programming]]></category>

		<category><![CDATA[Philosophy]]></category>

		<category><![CDATA[Politics]]></category>

		<category><![CDATA[Tracker]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[maemo]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=574</guid>
		<description><![CDATA[
Video link
]]></description>
			<content:encoded><![CDATA[<p><object width="640" height="385"><param name="movie" value="http://www.youtube.com/v/ok100U4Fo3Y&amp;hl=en_US&amp;fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/ok100U4Fo3Y&amp;hl=en_US&amp;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="385"></embed></object><br />
<a href="http://www.youtube.com/watch?v=ok100U4Fo3Y">Video link</a></p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/07/15/neelie-kroes-on-open-source/feed</wfw:commentRss>
		</item>
		<item>
		<title>Wrapping up 4.57 billion years</title>
		<link>http://pvanhoof.be/blog/index.php/2010/07/09/wrapping-up-457-billion-years</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/07/09/wrapping-up-457-billion-years#comments</comments>
		<pubDate>Fri, 09 Jul 2010 13:51:58 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Art &#38; culture]]></category>

		<category><![CDATA[Personal]]></category>

		<category><![CDATA[Philosophy]]></category>

		<category><![CDATA[Science]]></category>

		<category><![CDATA[condescending]]></category>

		<category><![CDATA[controversial]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[extremely condescending]]></category>

		<category><![CDATA[extremely controversial]]></category>

		<category><![CDATA[extremely personal]]></category>

		<category><![CDATA[very condescending]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=573</guid>
		<description><![CDATA[In 4.57 billion years our solar system went from creating simple bacteria to a large group of species. Several of which highly capable of making fairly intelligent decisions, one of which capable of having the indulgence of believing that it can think. That&#8217;s us.
The sun has an estimated 5 billion years to go before it [...]]]></description>
			<content:encoded><![CDATA[<p>In 4.57 billion years our solar system went from creating simple bacteria to a large group of species. Several of which highly capable of making fairly intelligent decisions, one of which capable of having the indulgence of believing that it can think. That&#8217;s us.</p>
<p>The sun has an estimated 5 billion years to go before it turns into a Red Giant that in its very early stages will wipe out truly every single idea that exists inside at least our own solar system.</p>
<p>Unless radio waves that our planet started emitting since we invented radio are seen and understood (which requires a recipient in the first place), that will be the ultimate end of all of our ideas and culture. Unless we figure out a way to let the ideas cultivate outside of our solar system. Just the ideas would already be an insane achievement.</p>
<p>But imagine going from bacteria to beings, colonized by bacteria, that think that they can think, in far less time than the current age of our sun. Unless, of course, bacteria somehow arrived into our solar system from outside (unlikely, but perhaps equally unlikely than us ever exporting our ideas and culture to another solar system).</p>
<p>Imagine what could happen in the next 5 billion years &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/07/09/wrapping-up-457-billion-years/feed</wfw:commentRss>
		</item>
		<item>
		<title>Domain indexes finished, technical conclusions</title>
		<link>http://pvanhoof.be/blog/index.php/2010/07/07/domain-indexes-finished-technical-conclusions</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/07/07/domain-indexes-finished-technical-conclusions#comments</comments>
		<pubDate>Wed, 07 Jul 2010 10:30:03 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Informatics and programming]]></category>

		<category><![CDATA[Tracker]]></category>

		<category><![CDATA[condescending]]></category>

		<category><![CDATA[controversial]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[extremely condescending]]></category>

		<category><![CDATA[extremely controversial]]></category>

		<category><![CDATA[extremely personal]]></category>

		<category><![CDATA[maemo]]></category>

		<category><![CDATA[very condescending]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=572</guid>
		<description><![CDATA[The support for domain specific indexes is, awaiting review / finished. Although we can further optimize it now. More on that later in this post. Image that you have this ontology:
nie:InformationElement a rdfs:Class .

nie:title a rdf:Property ;
  nrl:maxCardinality 1 ;
  rdfs:domain nie:InformationElement ;
  rdfs:range xsd:string .

nmm:MusicPiece a rdfs:Class ;
  rdfs:subClassOf nie:InformationElement [...]]]></description>
			<content:encoded><![CDATA[<p>The support for domain specific indexes is, <a href="http://git.gnome.org/browse/tracker/log/?h=domain-specific-indexes-review">awaiting review</a> / finished. Although we can further optimize it now. More on that later in this post. Image that you have this ontology:</p>
<pre>nie:InformationElement a rdfs:Class .

nie:title a rdf:Property ;
  nrl:maxCardinality 1 ;
  rdfs:domain nie:InformationElement ;
  rdfs:range xsd:string .

nmm:MusicPiece a rdfs:Class ;
  rdfs:subClassOf nie:InformationElement .

nmm:beatsPerMinute a rdf:Property ;
  nrl:maxCardinality 1 ;
  rdfs:domain nmm:MusicPiece ;
  rdfs:range xsd:integer .</pre>
<p>With that ontology there are three tables called <em>&#8220;Resource&#8221;</em>, <em>&#8220;nmo:MusicPiece&#8221;</em> and <em>&#8220;nie:InformationElement&#8221;</em> in SQLite&#8217;s schema:</p>
<ul>
<li>The <em>&#8220;Resource&#8221;</em> table has <em>ID</em> and the <em>subject</em> string</li>
<li>The <em>&#8220;nie:InformationElement&#8221;</em> has <em>ID</em> and <em>&#8220;nie:title&#8221;</em></li>
<li>The <em>&#8220;nmm:MusicPiece&#8221;</em> one has <em>ID</em> and <em>&#8220;nmm:beatsPerMinute&#8221;</em></li>
</ul>
<p>That&#8217;s fairly simple, right? The problem is that when you ORDER BY <em>&#8220;nie:title&#8221;</em> that you&#8217;ll cause a full table scan on <em>&#8220;nie:InformationElement&#8221;</em>. That&#8217;s not good, because there are less <em>&#8220;nmm:MusicPiece&#8221;</em> records than <em>&#8220;nie:InformationElement&#8221;</em> ones.</p>
<p>Imagine that we do this <a href="http://www.w3.org/TR/rdf-sparql-query">SPARQL</a> query:</p>
<pre>SELECT ?title WHERE {
   ?resource a nmm:MusicPiece ;
             nie:title ?title
} ORDER BY ?title</pre>
<p>We translate that, for you, to this SQL on our schema:</p>
<pre>SELECT   "title_u" FROM (
  SELECT "nmm:MusicPiece1"."ID" AS "resource_u",
         "nie:InformationElement2"."nie:title" AS "title_u"
  FROM   "nmm:MusicPiece" AS "nmm:MusicPiece1",
         "nie:InformationElement" AS "nie:InformationElement2"
  WHERE  "nmm:MusicPiece1"."ID" = "nie:InformationElement2"."ID"
  AND    "title_u" IS NOT NULL
) ORDER BY "title_u"</pre>
<p>OK, so with support for domain indexes we change the ontology like this:</p>
<pre>nmm:MusicPiece a rdfs:Class ;
  rdfs:subClassOf nie:InformationElement ;
  tracker:domainIndex nie:title .</pre>
<p>Now we&#8217;ll have the three tables called <em>&#8220;Resource&#8221;</em>, <em>&#8220;nmo:MusicPiece&#8221;</em> and <em>&#8220;nie:InformationElement&#8221;</em> in SQLite&#8217;s schema. But they will look like this:</p>
<ul>
<li>The <em>&#8220;Resource&#8221;</em> table has <em>ID</em> and the <em>subject</em> string</li>
<li>The <em>&#8220;nie:InformationElement&#8221;</em> has ID and <em>&#8220;nie:title&#8221;</em></li>
<li>The <em>&#8220;nmm:MusicPiece&#8221;</em> table now has three columns called <em>ID</em>, <em>&#8220;nmm:beatsPerMinute&#8221;</em> and <em>&#8220;nie:title&#8221;</em></li>
</ul>
<p>The same data, for titles of music pieces, will be in both <em>&#8220;nie:InformationElement&#8221;</em> and <em>&#8220;nmm:MusicPiece&#8221;</em>. We copy to the mirror column during ontology change coping, and when new inserts happen.</p>
<p>When now the rdf:type is known in the SPARQL query as a nmm:MusicPiece, like in the query mentioned earlier, we know that we can use the <em>&#8220;nie:title&#8221;</em> from the <em>&#8220;nmm:MusicPiece&#8221;</em> table in SQLite. That allows us to generate you this SQL query:</p>
<pre>SELECT   "title_u" FROM (
  SELECT "nmm:MusicPiece1"."ID" AS "resource_u",
         "nmm:MusicPiece1"."nie:title" AS "title_u"
  FROM   "nmm:MusicPiece" AS "nmm:MusicPiece1"
  WHERE  "title_u" IS NOT NULL
) ORDER BY "title_u"</pre>
<p>A remaining optimization is when you request a rdf:type that is a subclass of nmm:MusicPiece, like this:</p>
<pre>SELECT ?title WHERE {
  ?resource a nmm:MusicPiece, nie:InformationElement ;
            nie:title ?title
} ORDER BY ?title</pre>
<p>It&#8217;s still not as bad as now the <em>&#8220;nie:title&#8221;</em> is still taken from the <em>&#8220;nmm:MusicPiece&#8221;</em> table. But the join with <em>&#8220;nie:InformationElement&#8221;</em> is still needlessly there (we could just do the earlier SQL query in this case):</p>
<pre>SELECT   "title_u" FROM (
  SELECT "nmm:MusicPiece1"."ID" AS "resource_u",
         "nmm:MusicPiece1"."nie:title" AS "title_u"
  FROM   "nmm:MusicPiece" AS "nmm:MusicPiece1",
         "nie:InformationElement" AS "nie:InformationElement2"
  WHERE  "nmm:MusicPiece1"."ID" = "nie:InformationElement2"."ID"
  AND    "title_u" IS NOT NULL
) ORDER BY "title_u"</pre>
<p>We will probably optimize this specific use-case further later this week.</p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/07/07/domain-indexes-finished-technical-conclusions/feed</wfw:commentRss>
		</item>
		<item>
		<title>SQLite&#8217;s WAL, deleting a domain specific index</title>
		<link>http://pvanhoof.be/blog/index.php/2010/07/03/sqlites-wal-deleting-a-domain-specific-index</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/07/03/sqlites-wal-deleting-a-domain-specific-index#comments</comments>
		<pubDate>Sat, 03 Jul 2010 15:29:18 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Informatics and programming]]></category>

		<category><![CDATA[Tracker]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[maemo]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=571</guid>
		<description><![CDATA[SQLite&#8217;s WAL
SQLite is working on WAL, which stands for Write Ahead Logging.
The new logging technique means that we can probably keep read statements open for multiple processes. It&#8217;s not full MVCC yet as writes are still not doable simultaneously. But in our use-case is reading with multiple processes vastly more important anyway.
We&#8217;re investigating WAL mode [...]]]></description>
			<content:encoded><![CDATA[<p><strong>SQLite&#8217;s WAL</strong></p>
<p>SQLite is working on <a href="http://www.sqlite.org/draft/wal.html">WAL</a>, which stands for Write Ahead Logging.</p>
<p>The new logging technique means that we can probably keep read statements open for multiple processes. It&#8217;s not full MVCC yet as writes are still not doable simultaneously. But in our use-case is reading with multiple processes vastly more important anyway.</p>
<p><a href="http://git.gnome.org/browse/tracker/log/?h=wal">We&#8217;re investigating WAL mode of SQLite thoroughly these next few days</a>. <a href="http://blogs.gnome.org/juergbi">Jürg</a> is working most on this at the moment. If WAL is fit for our purpose then we&#8217;ll probably also start developing a direct-access library that&#8217;ll allow your process to connect directly with our SQLite database, avoiding any form of IPC.</p>
<p><a href="http://blogs.gnome.org/abustany">Adrien</a>&#8217;s <a href="http://blogs.gnome.org/abustany/2010/05/20/ipc-performance-the-return-of-the-report/">FD-passing</a> is in <a href="http://git.gnome.org/browse/tracker/log/">master</a>, though. And it&#8217;s performing quite well!</p>
<p>We&#8217;re thrilled that SQLite&#8217;s team is taking this direction with WAL. Very awesome guys!<br />
<strong><br />
Domain specific indexes<br />
</strong><br />
Yesterday I worked on support for deleting a <a href="http://pvanhoof.be/blog/index.php/2010/07/01/working-on-domain-specific-indexes">domain specific</a> <a href="http://pvanhoof.be/blog/index.php/2010/06/30/domain-specific-indexes">index</a> from the ontology. Because SQLite doesn&#8217;t support dropping a column with its ALTER support, I had to do it by renaming the original table, recreating the table without the mirror column, and then copying the data from the renamed table. And finally dropping the renamed table. It&#8217;s nasty, but it works. I think SQLite should just add DROP COLUMN to ALTER. Why is this so hard to add?</p>
<p>I finally got it working, now it must of course be tested and then again tested.</p>
<p>Next for the feature is adapting the SPARQL engine to start using the indexed mirror column and produce better performing SQL queries.</p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/07/03/sqlites-wal-deleting-a-domain-specific-index/feed</wfw:commentRss>
		</item>
		<item>
		<title>Working on domain specific indexes</title>
		<link>http://pvanhoof.be/blog/index.php/2010/07/01/working-on-domain-specific-indexes</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/07/01/working-on-domain-specific-indexes#comments</comments>
		<pubDate>Thu, 01 Jul 2010 15:01:33 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Informatics and programming]]></category>

		<category><![CDATA[Tracker]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[maemo]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=570</guid>
		<description><![CDATA[So &#8230; what is involved in a &#8220;simple change&#8221; like what I wrote about yesterday?
First you add support for annotating the domain specific index in the ontology files. This is straight forward as we of course have a generic Turtle parser, and it&#8217;s just a matter of adding properties to certain classes, and filling the [...]]]></description>
			<content:encoded><![CDATA[<p>So &#8230; what is involved in a &#8220;simple change&#8221; like what <a href="http://pvanhoof.be/blog/index.php/2010/06/30/domain-specific-indexes">I wrote about yesterday</a>?</p>
<p>First you add support for annotating the domain specific index in the ontology files. This is straight forward as we of course have a generic Turtle parser, and it&#8217;s just a matter of adding properties to certain classes, and filling the values from the ontology in in the instances in our in-memory representation of the ontology. You of course also need to change the CREATE-TABLE statements. Trivial.</p>
<p>Then you implement detecting changes in the ontology. And more complex; coping with the changes. This means doing ALTER on the SQL tables. You also need to copy from the InformationElement table to the MusicPiece table  (I&#8217;m using MusicPiece to clarify, it&#8217;s of course generic) in case of such a domain specific index being added during an ontology change, and put an implicit index on the column. After all, that index is why we&#8217;re doing this.</p>
<p>I finished those two yesterday. I have not finished detecting a deletion of a domain specifix index yet. That will have to ALTER the table with a DROP of the column. The most difficult here is detecting the deletion itself. We don&#8217;t yet have any code to diff on multivalue properties in the ontology (the ontology is a collection of RDF statements like everything else, describing itself).</p>
<p>Today I <a href="http://git.gnome.org/browse/tracker/log/?h=domain-specific-indexes">finished</a> writing copy values to the MusicPiece table&#8217;s mirror column</p>
<p>Next few days will be about adapting <a href="http://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-sparql-pattern.vala?h=domain-specific-indexes">the SPARQL engine</a> and of course coping with a deletion of a domain specific index. And then testing, and again testing. Mind that this has to work from a journal replay situation too. In which case no ontology is involved (it&#8217;s all stored in the history of the persistent journal).</p>
<p>Where&#8217;s my Redbull? Ah, waiting for me in the fridge. Good!</p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/07/01/working-on-domain-specific-indexes/feed</wfw:commentRss>
		</item>
		<item>
		<title>Domain specific indexes</title>
		<link>http://pvanhoof.be/blog/index.php/2010/06/30/domain-specific-indexes</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/06/30/domain-specific-indexes#comments</comments>
		<pubDate>Wed, 30 Jun 2010 13:51:38 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Informatics and programming]]></category>

		<category><![CDATA[Tracker]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[maemo]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=568</guid>
		<description><![CDATA[We store our data in a decomposed way. For single value properties we create a table per class and have a column per property. Multi value properties go in a separate table. For now I&#8217;ll focus on those single value properties.
Imagine you have a MusicPiece. In Nepomuk that&#8217;s a subclass of InformationElement. InformationElement adds properties [...]]]></description>
			<content:encoded><![CDATA[<p>We store our data in a decomposed way. For single value properties we create a table per class and have a column per property. Multi value properties go in a separate table. For now I&#8217;ll focus on those single value properties.</p>
<p>Imagine you have a MusicPiece. In Nepomuk that&#8217;s a subclass of InformationElement. InformationElement adds properties like title and subject. MusicPiece has performer, which is a Contact, and duration, an integer. A Contact has a fullname.</p>
<p>Alright, that looks like this in our internal storage.</p>
<p><img src="http://pvanhoof.be/files/domain-index/normal.png" alt="" /></p>
<p>Querying that in <a href="http://www.w3.org/TR/rdf-sparql-query/">SPARQL</a> goes like this. I&#8217;ll add the <a href="http://www.semanticdesktop.org/ontologies/">Nepomuk</a> prefixes.</p>
<pre>SELECT ?musicpiece ?title ?subject ?performer {
   ?musicpiece a nmm:MusicPiece ;
               nmm:performer ?p ;
               nie:title ?title ;
               nie:subject ?subject .
   ?p nco:fullname ?performer .
} ORDER BY ?title</pre>
<p>A problem if you ORDER BY the title field is that <a href="http://www.tracker-project.org/">Tracker</a> needs to make a join and a full table scan with that InformationElement table.</p>
<p>So <a href="http://git.gnome.org/browse/tracker/log/?h=domain-specific-indexes">we&#8217;re working on</a> what we&#8217;ll call domain specific indexes. It means that we&#8217;ll for certain properties have a redundant mirror column, on which we&#8217;ll place the index. The native SQL query will be generated to use that mirror column instead. A good example is nie:title for nmm:MusicPiece.</p>
<p><img src="http://pvanhoof.be/files/domain-index/domain-index.png" alt="" /></p>
<p>ps. A normal triple store has instead a huge table with just three columns: subject, predicate and object. That wouldn&#8217;t help you much with optimizing of course.</p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/06/30/domain-specific-indexes/feed</wfw:commentRss>
		</item>
		<item>
		<title>Smile or Die</title>
		<link>http://pvanhoof.be/blog/index.php/2010/05/28/smile-or-die</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/05/28/smile-or-die#comments</comments>
		<pubDate>Fri, 28 May 2010 17:19:49 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Art &#38; culture]]></category>

		<category><![CDATA[Personal]]></category>

		<category><![CDATA[Philosophy]]></category>

		<category><![CDATA[Politics]]></category>

		<category><![CDATA[condescending]]></category>

		<category><![CDATA[controversial]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[extremely condescending]]></category>

		<category><![CDATA[extremely controversial]]></category>

		<category><![CDATA[extremely personal]]></category>

		<category><![CDATA[very condescending]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=566</guid>
		<description><![CDATA[In followup on the RSA animation videos here&#8217;s the original talk by Barbara Ehrenreich titled Smile or Die.

I think part of GNOME&#8217;s crisis is caused by the same atmosphere of &#8220;go with the program, don&#8217;t complain, or you&#8217;re out&#8221;. I wrote about this before:
It’s not popular to be critical about a (the leader of a) [...]]]></description>
			<content:encoded><![CDATA[<p>In followup on <a href="http://pvanhoof.be/blog/index.php/2010/05/27/fwd-the-secret-powers-of-time">the RSA animation videos</a> <a href="http://www.youtube.com/watch?v=PJGMFu74a70">here&#8217;s the original talk by Barbara Ehrenreich titled Smile or Die</a>.</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="385" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/PJGMFu74a70&amp;hl=en_US&amp;fs=1&amp;rel=0" /><embed type="application/x-shockwave-flash" width="640" height="385" src="http://www.youtube.com/v/PJGMFu74a70&amp;hl=en_US&amp;fs=1&amp;rel=0" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>I think part of <a title="Go ahead, deny it. What crisis? Think happy!">GNOME&#8217;s crisis</a> is caused by the same atmosphere of <em>&#8220;go with the program, don&#8217;t complain, or you&#8217;re out&#8221;</em>. I <a href="http://pvanhoof.be/blog/index.php/2010/01/29/tough-talk">wrote about this before</a>:</p>
<blockquote><p>It’s not popular to be critical about a (the leader of a) popular idea. This is illustrated by the intellectually absurd criticisms David Schlesinger receives.</p>
<p>Yet is the critic who monitors the organs of a society key to that organ either producing for its stakeholders, or failing and dragging the entire society it serves down with it.</p></blockquote>
<p>Acknowledging the problem and changing course is what I seek in a candidate this year.</p>
<p><small>OK, two is enough. Back to technical articles.</small></p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/05/28/smile-or-die/feed</wfw:commentRss>
		</item>
		<item>
		<title>FWD: The Secret Powers of Time</title>
		<link>http://pvanhoof.be/blog/index.php/2010/05/27/fwd-the-secret-powers-of-time</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/05/27/fwd-the-secret-powers-of-time#comments</comments>
		<pubDate>Thu, 27 May 2010 09:07:15 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Art &#38; culture]]></category>

		<category><![CDATA[Personal]]></category>

		<category><![CDATA[Philosophy]]></category>

		<category><![CDATA[Politics]]></category>

		<category><![CDATA[Science]]></category>

		<category><![CDATA[controversial]]></category>

		<category><![CDATA[english]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=565</guid>
		<description><![CDATA[

Video link



Video link
]]></description>
			<content:encoded><![CDATA[<p><object width="640" height="385"><param name="movie" value="http://www.youtube.com/v/A3oIiH7BLmg&#038;hl=en_US&#038;fs=1&#038;rel=0"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/A3oIiH7BLmg&#038;hl=en_US&#038;fs=1&#038;rel=0" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="385"></embed></object><br />
<br />
<a href="http://www.youtube.com/watch?v=A3oIiH7BLmg">Video link</a></p>
<p />
<object width="640" height="385"><param name="movie" value="http://www.youtube.com/v/u5um8QWWRvo&#038;hl=en_US&#038;fs=1&#038;rel=0"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/u5um8QWWRvo&#038;hl=en_US&#038;fs=1&#038;rel=0" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="385"></embed></object><br />
<br />
<a href="http://www.youtube.com/watch?v=u5um8QWWRvo">Video link</a></p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/05/27/fwd-the-secret-powers-of-time/feed</wfw:commentRss>
		</item>
		<item>
		<title>IPC performance, the report</title>
		<link>http://pvanhoof.be/blog/index.php/2010/05/13/ipc-performance-the-report</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/05/13/ipc-performance-the-report#comments</comments>
		<pubDate>Thu, 13 May 2010 19:58:22 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Informatics and programming]]></category>

		<category><![CDATA[Science]]></category>

		<category><![CDATA[Tracker]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[maemo]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=563</guid>
		<description><![CDATA[The Tracker team will be doing a codecamp this month. Among the subjects we will address is the IPC overhead of tracker-store, our RDF query service.
We plan to investigate whether a direct connection with our SQLite database is possible for clients. Jürg did some work on this. Turns out that due to SQLite not being [...]]]></description>
			<content:encoded><![CDATA[<p>The Tracker team will be doing a codecamp this month. Among the subjects we will address is the IPC overhead of tracker-store, our RDF query service.</p>
<p>We plan to investigate whether a direct connection with our SQLite database is possible for clients. <a href="http://blogs.gnome.org/juergbi/">Jürg</a> <a href="http://git.gnome.org/browse/tracker/log/?h=sqlite-batch-locking">did some work on this</a>. Turns out that due to SQLite not being <a href="http://en.wikipedia.org/wiki/Multiversion_concurrency_control">MVCC</a> we need to override some of SQLite&#8217;s VFS functions and perhaps even implement ourselves a custom page cache.</p>
<p>Another track that we are investigating involves using a custom UNIX domain socket and sending the data over in such a way that at either side the marshalling is cheap.</p>
<p>For that idea I asked Adrien Bustany, a computer sciences student who&#8217;s doing an internship at <a href="http://codeminded.be">Codeminded</a>, to develop <a href="http://git.mymadcat.com/index.php/p/ipc-performance/source/tree/master/">three tests</a>: A test that uses D-Bus the way tracker-store does (by using the DBusMessage API directly), a test that uses an as ideal as possible custom protocol and technique to get the data over a UNIX domain socket and a simple program that does the exact same query but connects to SQLite by itself.</p>
<p><a href="http://git.mymadcat.com/index.php/p/ipc-performance/source/tree/master/report/report.odt">Here&#8217;s the report</a>:</p>
<p><strong>Exposing a SQLite database remotely: comparison of various IPC methods</strong></p>
<p><small>By Adrien Bustany<br />
Computer Sciences student<br />
National Superior School of Informatics and Applied Mathematics of Grenoble (ENSIMAG)</small></p>
<p>This study aims at comparing the overhead of an IPC layer when accessing a SQLite database. The two IPC methods included in this comparison are DBus, a generic message passing system, and a custom IPC method using UNIX sockets. As a reference, we also include in the results the performance of a client directly accessing the SQLite database, without involving any IPC layer.</p>
<p><strong>Comparison methodology</strong></p>
<p>In this section, we detail what the client and server are supposed to do during the test, regardless of the IPC method used.</p>
<p>The server has to:</p>
<ol>
<li>Open the SQLite database and listen to the client requests</li>
<li>Prepare a query at the client&#8217;s request</li>
<li>Send the resulting rows at the client&#8217;s request</li>
</ol>
<p>Queries are only &#8220;SELECT&#8221; queries, no modification is performed on the database. This restriction is not enforced on server side though.</p>
<p>The client has to:</p>
<ol>
<li>Connect to the server</li>
<li>Prepare a &#8220;SELECT&#8221; query</li>
<li>Fetch all the results</li>
<li>Copy the results in memory (not just fetch and forget them), so that memory pages are really used</li>
</ol>
<p><strong>Test dataset</strong></p>
<p>For testing, we use a SQLite database containing only one table. This table has 31 columns, the first one is the identifier and the 30 others are columns of type TEXT. The table is filled with 300 000 rows, with randomly generated strings of 20 ASCII lowercase characters.</p>
<p><strong>Implementation details</strong></p>
<p>In this section, we explain how the server and client for both IPC methods were implemented.</p>
<p><em><strong>Custom IPC (UNIX socket based)</strong></em></p>
<p>In this case, we use a standard UNIX socket to communicate between the client and the server. The socket protocol is a binary protocol, and is detailed below. It has been designed to minimize CPU usage (there is no marshalling/demarshalling on strings, nor intensive computation to decode the message). It is fast over a local socket, but not suitable for other types of sockets, like TCP sockets.</p>
<p><em>Message types</em></p>
<p>There are two types of operations, corresponding to the two operations of the test: prepare a query, and fetch results.</p>
<p><em>Message format</em></p>
<p>All numbers are encoded in little endian form.</p>
<p><em>Prepare</em></p>
<p>Client sends:</p>
<table border="1" cellspacing="0" cellpadding="4" width="496" bordercolor="#000000"><col width="96"></col><br />
<col width="400"></col></p>
<tbody>
<tr valign="TOP">
<td width="96"><strong>Size</strong></td>
<td width="400"><strong>Contents</strong></td>
</tr>
<tr valign="TOP">
<td width="96">4 bytes</td>
<td width="400">Prepare opcode (0&#215;50)</td>
</tr>
<tr valign="TOP">
<td width="96">4 bytes</td>
<td width="400">Size of the query (without trailing \0)</td>
</tr>
<tr valign="TOP">
<td width="96">&#8230;</td>
<td width="400">Query, in ASCII</td>
</tr>
</tbody>
</table>
<p>Server answers:</p>
<table border="1" cellspacing="0" cellpadding="4" width="496" bordercolor="#000000"><col width="96"></col><br />
<col width="400"></col></p>
<tbody>
<tr valign="TOP">
<td width="96"><strong>Size</strong></td>
<td width="400"><strong>Contents</strong></td>
</tr>
<tr valign="TOP">
<td width="96">4 bytes</td>
<td width="400">Return code of the sqlite3_prepare_v2 call</td>
</tr>
</tbody>
</table>
<p><em>Fetch</em></p>
<p>Client sends:</p>
<table border="1" cellspacing="0" cellpadding="4" width="496" bordercolor="#000000"><col width="96"></col><br />
<col width="400"></col></p>
<tbody>
<tr valign="TOP">
<td width="96"><strong>Size</strong></td>
<td width="400"><strong>Contents</strong></td>
</tr>
<tr valign="TOP">
<td width="96">4 bytes</td>
<td width="400">Fetch opcode (0&#215;46)</td>
</tr>
</tbody>
</table>
<p>Server sends rows grouped in fixed size buffers. Each buffer contains a variable number of rows. Each row is complete. If some padding is needed (when a row doesn&#8217;t fit in a buffer, but there is still space left in the buffer), the server adds an &#8220;End of Page&#8221; marker. The &#8220;End of page&#8221; marker is the byte 0xFF. Rows that are larger than the buffer size are not supported.</p>
<p>Each row in a buffer has the following format:</p>
<table border="1" cellspacing="0" cellpadding="4" width="496" bordercolor="#000000"><col width="96"></col><br />
<col width="400"></col></p>
<tbody>
<tr valign="TOP">
<td width="96"><strong>Size</strong></td>
<td width="400"><strong>Contents</strong></td>
</tr>
<tr valign="TOP">
<td width="96">4 bytes</td>
<td width="400">SQLite return code. This is generally SQLITE_ROW (there is a row to read), or SQLITE_DONE (there are no more rows to read). When the return code is not SQLITE_ROW, the rest of the message must be ignored.</td>
</tr>
<tr valign="TOP">
<td width="96">4 bytes</td>
<td width="400">Number of columns in the row</td>
</tr>
<tr valign="TOP">
<td width="96">4 bytes</td>
<td width="400">Index of trailing \0 for first column (index is 0 after the &#8220;number of columns&#8221; integer, that is, index is equal to 0 8 bytes after the message begins)</td>
</tr>
<tr valign="TOP">
<td width="96">4 bytes</td>
<td width="400">Index of trailing \0 for second column</td>
</tr>
<tr valign="TOP">
<td width="96">&#8230;</td>
<td width="400"></td>
</tr>
<tr valign="TOP">
<td width="96">4 bytes</td>
<td width="400">Index of trailing \0 for last column</td>
</tr>
<tr valign="TOP">
<td width="96">&#8230;</td>
<td width="400">Row data. All columns are concatenated together, and separated by \0</td>
</tr>
</tbody>
</table>
<p>For the sake of clarity, we describe here an example row</p>
<pre>100 4 1 7 13 19 1\0aaaaa\0bbbbb\0ccccc\0</pre>
<p>The first 100 is the return code, in this case SQLITE_ROW. This row has 4 columns. The 4 following numbers are the offset of the \0 terminating each column in the row data. Finally comes the row data.</p>
<p><em>Memory usage</em></p>
<p>We try to minimize the calls to malloc and memcpy in the client and server. As we know the size of a buffer, we allocate the memory only once, and then use memcpy to write the results to it.</p>
<p><em><strong>DBus</strong></em></p>
<p>The DBus server exposes two methods, Prepare and Fetch.</p>
<p><em>Prepare</em></p>
<p>The Prepare method accepts a query string as a parameter, and returns nothing. If the query preparation fails, an error message is returned.</p>
<p><em>Fetch</em></p>
<p>Ideally, we should be able to send all the rows in one batch. DBus, however, puts a limitation on the message size. In our case, the complete data to pass over the IPC is around 220MB, which is more than the maximum size allowed by DBus (moreover, DBus marshalls data, which augments the message size a little). We are therefore obliged to split the result set.</p>
<p>The Fetch method accepts an integer parameter, which is the number of rows to fetch, and returns an array of rows, where each row is itself an array of columns. Note that the server can return less rows than asked. When there are no more rows to return, an empty array is returned.</p>
<p><strong>Results</strong></p>
<p>All tests are ran against the dataset described above, on a warm disk cache (the database is accessed several time before every run, to be sure the entire database is in disk cache). We use SQLite 3.6.22, on a 64 bit Linux system (kernel 2.6.33.3). All test are ran 5 times, and we use the average of the 5 intermediate results as the final number.</p>
<p>For the custom IPC, we test with various buffer sizes varying from 1 to 256 kilobytes. For DBus, we fetch 75000 rows with every Fetch call, which is close to the maximum we can fetch with each call (see the paragraph on DBus message size limitation).</p>
<p>The first tests were to determine the optimal buffer size for the UNIX socket based IPC. The following graph describes the time needed to fetch all rows, depending on the buffer size:</p>
<p><a href="http://git.mymadcat.com/index.php/p/ipc-performance/source/tree/master/report/report_numbers.ods"><img src="http://pvanhoof.be/ipc_report/ipc_report_html_2de71214.gif" border="0" alt="" width="505" height="265" align="LEFT" /></a></p>
<p>The graph shows that the IPC is the fastest using 64kb buffers. Those results depend on the type of system used, and might have to be tuned for different platforms. On Linux, a memory page is (generally) 4096 bytes, as a consequence buffers smaller than 4kB will use a full memory page when sent over the socket and waste memory bandwidth. After determining the best buffer size for socket IPC, we run tests for speed and memory usage, using a buffer size of 64kb for the UNIX socket based method.</p>
<p><em><strong>Speed</strong></em></p>
<p>We measure the time it takes for various methods to fetch a result set. Without any surprise, the time needed to fetch the results grows linearly with the amount of rows to fetch.</p>
<p align="CENTER"><a href="http://git.mymadcat.com/index.php/p/ipc-performance/source/tree/master/report/report_numbers.ods"><img src="http://pvanhoof.be/ipc_report/ipc_report_html_7e12e9f7.gif" border="0" alt="" width="422" height="265" align="BOTTOM" /></a></p>
<p align="CENTER"><a href="http://git.mymadcat.com/index.php/p/ipc-performance/source/tree/master/report/report_numbers.ods"><img src="http://pvanhoof.be/ipc_report/ipc_report_html_m58a9c41c.gif" border="0" alt="" width="493" height="265" align="MIDDLE" /></a></p>
<table style="height: 113px;" border="1" cellspacing="0" cellpadding="4" align="center" width="406" bordercolor="#000000"><col width="155"></col><br />
<col width="112"></col></p>
<tbody>
<tr valign="TOP">
<td width="155"><strong>IPC method</strong></td>
<td width="112"><strong>Best time</strong></td>
</tr>
<tr valign="TOP">
<td width="155">None (direct access)</td>
<td width="112">2910 ms</td>
</tr>
<tr valign="TOP">
<td width="155">UNIX socket</td>
<td width="112">3470 ms</td>
</tr>
<tr valign="TOP">
<td width="155">DBus</td>
<td width="112">12300 ms</td>
</tr>
</tbody>
</table>
<p><em><strong>Memory usage</strong></em></p>
<p align="CENTER"><a href="http://git.mymadcat.com/index.php/p/ipc-performance/source/tree/master/report/report_numbers.ods"><img src="http://pvanhoof.be/ipc_report/ipc_report_html_6b9de253.gif" border="0" alt="" width="432" height="265" align="BOTTOM" /></a></p>
<p align="CENTER"><a href="http://git.mymadcat.com/index.php/p/ipc-performance/source/tree/master/report/report_numbers.ods"><img src="http://pvanhoof.be/ipc_report/ipc_report_html_m6fead15f.gif" border="0" alt="" width="452" height="265" align="BOTTOM" /></a></p>
<p>Memory usage varies greatly (actually, so much that we had to use a log scale) between IPC methods. DBus memory usage is explained by the fact that we fetch 75 000 rows at a time, and that it has to allocate all the message before sending it, while the socket IPC uses 64 kB buffers.</p>
<p><strong>Conclusions</strong></p>
<p>The results clearly show that in such a specialized case, designing a custom IPC system can highly reduce the IPC overhead. The overhead of a UNIX socket based IPC is around 19%, while the overhead of DBus is 322%. However, it is important to take into account the fact that DBus is a much more flexible system, offering far more features and flexibility than our socket protocol. Comparing DBus and our custom UNIX socket based IPC is like comparing an axe with a swiss knife: it&#8217;s much harder to cut the tree with the swiss knife, but it also includes a tin can opener, a ball pen and a compass (nowadays some of them even include USB keys).</p>
<p>The real conclusion of this study is: if you have to pass a lot of data between two programs and don&#8217;t need a lot of flexibility, then DBus is not the right answer, and never intended to be.</p>
<p>The code source used to obtain these results, as well as the numbers and graphs used in this document can be checked out from the following git repository: <a href="http://git.mymadcat.com/index.php/p/ipc-performance/source/tree/master/">git://git.mymadcat.com/ipc-performance</a> . Please check the various README files to see how to reproduce them and/or how to tune the parameters.</p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/05/13/ipc-performance-the-report/feed</wfw:commentRss>
		</item>
		<item>
		<title>Friday&#8217;s performance improvements in Tracker</title>
		<link>http://pvanhoof.be/blog/index.php/2010/05/01/fridays-performance-improvements-in-tracker</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/05/01/fridays-performance-improvements-in-tracker#comments</comments>
		<pubDate>Sat, 01 May 2010 13:32:07 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Informatics and programming]]></category>

		<category><![CDATA[Tracker]]></category>

		<category><![CDATA[condescending]]></category>

		<category><![CDATA[controversial]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[extremely condescending]]></category>

		<category><![CDATA[extremely controversial]]></category>

		<category><![CDATA[maemo]]></category>

		<category><![CDATA[very condescending]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=559</guid>
		<description><![CDATA[The crawler&#8217;s modification time queries
Yesterday we optimized the crawler&#8217;s query that gets the modification time of files. We use this timestamp to know whether or not a file must be reindexed.
Originally, we used a custom SQLite function called tracker:uri-is-parent() in SPARQL. This, however, caused a full table scan. As long as your SQL table for [...]]]></description>
			<content:encoded><![CDATA[<p><b>The crawler&#8217;s modification time queries</b></p>
<p>Yesterday we optimized the crawler&#8217;s query that gets the modification time of files. We use this timestamp to know whether or not a file must be reindexed.</p>
<p>Originally, we used a custom SQLite function called tracker:uri-is-parent() in SPARQL. This, however, caused a full table scan. As long as your SQL table for <a href="http://www.semanticdesktop.org/ontologies/nfo/#FileDataObject">nfo:FileDataObject</a>s wasn&#8217;t too large, that wasn&#8217;t a huge problem. But it didn&#8217;t scale linear. I started with optimizing the function itself. It was using a strlen() so I replaced that with a <a href="http://www.sqlite.org/c3ref/value_blob.html">sqlite3_value_bytes()</a>. We only store UTF-8, so that worked fine. It gained me ~ 10%; not enough.</p>
<p>So <a href="http://git.gnome.org/browse/tracker/commit/?id=4890d1c2f0a561a9d4aa746008c64ecde386ce42">this commit</a> was a better improvement. First it makes <a href="http://www.semanticdesktop.org/ontologies/nfo/#belongsToContainer">nfo:belongsToContainer</a> an indexed property. The <i>x nfo:belongsToContainer p</i> means <i>x is in a directory p</i> for file resources. The commit changes the query to use the property that is now indexed.</p>
<p>The original query before we started with this optimization took 1.090s when you had ~ 300,000 nfo:FileDataObject resources. The new query takes about 0.090s. It&#8217;s of course an unfair comparison because now we use an indexed property. Adding the index only took a total of 10s for a ~ 300,000 large table and the table is being queried while we index (while we insert into it). Do the math, it&#8217;s a huge win in all situations. For the SQLite freaks; the SQLite database grew by 4 MB, with all items in the table indexed.</p>
<p><b>PDF extractor</b></p>
<p>Another <a href="http://git.gnome.org/browse/tracker/log/?h=pdfmem-for-master">optimization</a> I did earlier was the PDF extractor. Originally, we used the <a href="http://cgit.freedesktop.org/poppler/poppler/tree/glib">poppler-glib</a> library. This library doesn&#8217;t allow us to set the OutputDev at runtime. If compiled with Cairo, the OutputDev is in some versions a CairoOutputDev. We don&#8217;t want all images in the PDF to be rendered to a Cairo surface. So I ported this back to C++ and made it always use a TextOutputDev instead. In poppler-glib master this appears to have improved (in git master poppler_page_get_text_page is always using a TextOutputDev).</p>
<p>Another major problem with poppler-glib is the huge amount of copying strings in heap. The performance to extract metadata and content text for a 70 page PDF document without any images went from 1.050s to 0.550s. A lot of it was caused by copying strings and GValue boxing due to GObject properties.</p>
<p><b>Table locked problem</b></p>
<p>Last week I <a href="http://pvanhoof.be/blog/index.php/2010/04/25/performance-dbus-handling-of-the-query-results-in-trackers-rdf-service">improved D-Bus marshaling</a> by using a database cursor. I forgot to handle SQLITE_LOCKED while <a href="http://blogs.gnome.org/juergbi">Jürg</a> and <a href="http://blogs.gnome.org/carlosg">Carlos</a> had been introducing multithreaded SELECT support. Not good. I fixed this; it was causing random <i>Table locked</i> errors.</p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/05/01/fridays-performance-improvements-in-tracker/feed</wfw:commentRss>
		</item>
		<item>
		<title>RDF propaganda, time for change</title>
		<link>http://pvanhoof.be/blog/index.php/2010/04/27/rdf-propaganda-time-for-change</link>
		<comments>http://pvanhoof.be/blog/index.php/2010/04/27/rdf-propaganda-time-for-change#comments</comments>
		<pubDate>Tue, 27 Apr 2010 21:06:30 +0000</pubDate>
		<dc:creator>pvanhoof</dc:creator>
		
		<category><![CDATA[Informatics and programming]]></category>

		<category><![CDATA[Tracker]]></category>

		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[condescending]]></category>

		<category><![CDATA[controversial]]></category>

		<category><![CDATA[english]]></category>

		<category><![CDATA[extremely condescending]]></category>

		<category><![CDATA[extremely controversial]]></category>

		<category><![CDATA[maemo]]></category>

		<category><![CDATA[very condescending]]></category>

		<guid isPermaLink="false">http://pvanhoof.be/blog/?p=558</guid>
		<description><![CDATA[I&#8217;m not supposed to but I&#8217;m proud. It&#8217;s not only me who&#8217;s doing it.
Adrien is one of the new guys on the block. He&#8217;s working on integration with Tracker&#8217;s RDF service and various web services like Flickr, Facebook, Twitter, picasaweb and RSS. This is the kind of guy several companies should be afraid of. His [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m not supposed to but I&#8217;m proud. It&#8217;s not only me who&#8217;s doing it.</p>
<p><a href="http://blogs.gnome.org/abustany">Adrien</a> is one of the new guys on the block. He&#8217;s working on integration with Tracker&#8217;s RDF service and various web services like <a href="http://git.gnome.org/browse/tracker/log/?h=miner-flickr-review">Flickr</a>, <a title="Due to licensing of Facebook's data, we don't public Facebook's integration yet">Facebook</a>, <a href="http://git.gnome.org/browse/tracker/log/?h=miner-twitter">Twitter</a>, <a href="http://git.gnome.org/browse/tracker/log/?h=miner-gdata">picasaweb</a> and <a href="http://git.gnome.org/browse/tracker/log/?h=miner-rss">RSS</a>. This is the kind of guy several companies should be afraid of. His work is competing with what they are trying to do do: integrating the social web with mobile.</p>
<p>Oh come on Steve, stop pretending that you aren&#8217;t. And you better come up with something good, because we are.</p>
<p>Not only that, Adrien is implementing so-called writeback. It means that when you change a local resource&#8217;s properties, that this integration will update Flickr, Facebook, picasaweb and Twitter.</p>
<p>You change a piece of info about a photo on your phone, and it&#8217;ll be replicated to Flickr. It&#8217;ll also be synchronized onto your phone as soon as somebody else made a change.</p>
<p>This is the future of computing and information technology. Integration with social networking and the phone is what people want. Dear Mark, it&#8217;s unstoppable. You better keep your eyes open, because we are going fast. Faster than your business.</p>
<p>I&#8217;m not somebody trying to guess how technology will look in a few years. I try to be in the middle of the technical challenge of actually doing it. Talking about it is telling history before your lip&#8217;s muscles moved.</p>
<p>At the Tracker project we are building a SPARQL endpoint that uses D-Bus as IPC. This is ideal on Nokia&#8217;s Meego. It&#8217;ll be a centerpiece for information gathering. On Meego you wont ask the filesystem, instead you&#8217;ll ask Tracker using SPARQL and RDF.</p>
<p>To be challenged is likely the most beautiful state of mind.</p>
<p>I invite everybody to watch <a href="http://vimeo.com/11270890">this demo by Adrien</a>. It&#8217;s just the beginning. It&#8217;s going to get better.</p>
<p><object width="400" height="245"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=11270890&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=11270890&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="400" height="245"></embed></object>
<p><a href="http://vimeo.com/11270890">Tracker writeback &#038; web service integration demo / MeegoTouch UI</a> from <a href="http://vimeo.com/abustany">Adrien Bustany</a> on <a href="http://vimeo.com">Vimeo</a>.</p>
<p><small>I tagged this as &#8216;extremely controversial&#8217;. That&#8217;s fine, Adrien told me that &#8220;people are used to me anyway&#8221;.</small></p>
]]></content:encoded>
			<wfw:commentRss>http://pvanhoof.be/blog/index.php/2010/04/27/rdf-propaganda-time-for-change/feed</wfw:commentRss>
		</item>
	</channel>
</rss>
