controversial – Page 9 – How easy it is to make people believe a lie, and how hard it is to undo that work again

Voorzien

Nu de rest van het land nog

Stroomgenerator

RE: Scudraketten

Wanneer Isabel Albers iets schrijft ben ik aandachtig: wij moeten investeren in infrastructuur.

Dit creëert welvaart, distribueert efficiënt het geld en investeert in onze kinderen hun toekomst: iets wat nodig is; en waar we voor staan.

De besparingsinspanningen kunnen we beperken wat betreft investeringen in infrastructuur; we moeten ze des te meer doorvoeren wat betreft andere overheidsuitgaven.

Misschien moeten we bepaalde scudraketten lanceren? Een scudraket op de overheidsomvang zou geen slecht doen.

Een week mediastorm meemaken over hoe hard we snoeien in bepaalde overheidssectoren: laten we dat hard en ten gronde doen.

Laten we tegelijk investeren in de Belgische infrastructuur. Laten we veel investeren.

De cultuur die heerst, leeft, ploert en zweet op het grondgebied van Rusland zal niet verdwijnen. Zelfs niet na een nucleaire aanval tenzij die echt verschrikkelijk wreedaardig is. En dan zou ik ons verdommen dat we dat gedaan hebben zoals ik hoor te doen. Wij, Europeanen, moeten daar mee leven zoals zij met ons moeten leven.

Let’s make things better

Matthew gets that developers need good equipment.

Glade, Scaffolding (DevStudio), Scintilla & GtkSourceView, Devhelp, gnome-build and Anjuta also got it earlier.

I think with GNOME’s focus on this and a bit less on woman outreach programs; this year we could make a difference.

Luckily our code is that good that it can be reused for what is relevant today.

It’s all about what we focus on.

Can we please now go back at making software?

ps. I’ve been diving in Croatia. Trogir. It was fantastic. I have some new reserves in my mental system.

ps. Although we’re very different I have a lot of respect for your point of view, Matthew.

Abstractionitis

Realizing I’ll be wrong again in future, I think I’m finally cured of abstractionitis. Thanks to Miguel for the diagnose and especially Rob who continuously pointed me out that you can’t escape the responsibility of having to actually implement. Thanks guys.

Dnsmasq’s code quality

Had to adapt dnsmasq’s code today. My god, that tiny application has a shitty code quality. I now feel bad knowing that so many systems are depending on this stuff.

Mr. Dillon; smartphone innovation in Europe ought to be about people’s privacy

Dear Mark,

Your team and you yourself are working on the Jolla Phone. I’m sure that you guys are doing a great job and although I think you’ve been generating hype and vaporware until we can actually buy the damn thing, I entrust you with leading them.

As their leader you should, I would like to, allow them to provide us with all of the device’s source code and build environments of their projects so that we can have the exact same binaries. With exactly the same I mean that it should be possible to use MD5 checksums. I’m sure you know what that means and you and I know that your team knows how to provide geeks like me with this. I worked with some of them together during Nokia’s Harmattan and Fremantle and we both know that you can easily identify who can make this happen.

The reason why is simple: I want Europe to develop a secure phone similar to how, among other open source projects, the Linux kernel can be trusted. By peer review of the source code.

Kind regards,

A former Harmattan developer who worked on a component of the Nokia N9 that stores the vast majority of user’s privacy.

ps. I also think that you should reunite Europe’s finest software developers and secure the funds to make this workable. But that’s another discussion which I’m eager to help you with.

In agreement

Surprisingly I’m in agreement with what Richard has to say in this specific context.

Why do you need Tracker?

(Or why our project’s name wasn’t wrong after all)

First and foremost, because the Internet isn’t available everywhere all the times. To put it simple: 3G (and 4G) suck. The latency is a joke in even the most modern countries, even at the center of their capital cities. Reliable availability of The Internet simply doesn’t exist for most people. Not even after a decade of promises and billion dollar investments from advertising firms like Google. Balloons that bring The Internet everywhere? Yeah, sure.

I invite you to try a serious Google maps use-case in a Swiss tunnel. The kind of use-case the nineties Microsoft Automap Streets & Tips – like software easily managed for planning vacations and trips across Europe decades ago (without The Internet). Or how about reading the newspapers on the airplane? Technology of pre the year 1700 could do it. Today Google’s tablets & glasses can’t, because I have no reliable Internet (the flight-attendant will reliably give me a newspaper, though – go ahead, try it Sergey). And if Google installed 3G routers in those tunnels, airplanes, forests, all third world countries, seas and the truly remote areas of the planet , I could still come up with a lot more places. Everybody can. By the time the Googles of today are finished, we’ll all travel to Mars with latencies of up to an hour while Google’s data still only travels at the speed of light (that, or fix quantum entanglement to be workable and more importantly: scalable to billions of users).

It’s great that sometimes we can use Google maps, sure. But if I can’t rely on it always and everywhere, it means that in embedded: it doesn’t exist. I want my Boeing 747’s landing software to work (oh and when we land on Mars, too). Always. I want software to land us safely. Same for my car’s ABS, the subway and train’s breaking systems. I’ll probably be dead when those services don’t operate when I need them. During those moments I don’t care about Google fans and their Google HTML5 religion. Screw HTML5, JavaScript, WebSocket and 3G and thank you interrupt based real time kernels that open my airbag, stop the tram and land the airplane.

For the car industry it’s probably cheaper to provide a storage hardware upgrade when the car must be serviced, than it would be to sell your company’s soul to privacy invading Cloud hosting services. Because in future I would like you to provide Facebook-like services in my car, as reliable as my airbag works. Without the Internet being everywhere. I want you to deliver it to me when I’m visiting Mars. And my kids … who knows what they’ll want?!

Your embedded technology needs to provide graph data about the users’ activity to services that your business wants to share the data with. I’ll illustrate this with This-is-Possible-Today use cases:

– The fridge contains no more milk. While walking the street watching his smartphone, the user opens a recipe for a meal that requires milk. When the user is at the supermarket the technology that will in future be installed on supermarket shopping carts (or his glasses) needs to show the recipe, its ingredients and highlight the fact that the fridge doesn’t contain one of the required ingredients (milk, sugar, butter). And if the user allows this, advertise different brands of milk, sugar and butter based on who paid most plus his wife’s buying habits.

– Your kid talked with a school friend on Facebook about an amusement park (De Efteling! Phantasialand!). Your wife decides that because it’s good weather this weekend, the family should (will) go to an amusement park (yes, she’s the boss. And that’s fine: you own the car – don’t worry, she’ll drive the way back so you can have your weekend nap). So the entire family gets in the (your) car and you ask your son: what amusement park shall I drive to?! Your kid opens the infotainment system at the back seat of the car and sees what he has been interested in last few weeks. After privacy authorization (or not) you as the driver of the car sees the list on the dashboard infotainment system. You select it and the navigation software of the (your) car navigates to it. What a dad! Meanwhile your front passenger seat’s infotainment system goes to the ticket ordering website of the amusement park. What a mom! Advertising related to amusement parks and ticket vending is shown. Of course! Phantasialand!

That is why you need Tracker’s Nepomuk based storage with its SPARQL querying and updating capability.

It lets your embedded appliance do what Facebook does. But in a light way, isolated (or not) from the rest of the world. You decide what happens with the data and who receives it. Allowing you to provide a trust relationship with your customers and consumers. You are the industry providing those cars, fridges and TV sets.

As a BMW driver myself, I would stop buying BMW cars as soon as I learn that BMW sells my driving habits (or whatever) to Google or Facebook (or the NSA). Today I’d trust BMW to integrate those habits into the infotainment system of my next car. IF I can trust BMW. I think that in future it’ll be the difference between succeeding and failing as a industry.

With Tracker you get the use-cases and features. But it’s not for free: you must hire brains instead of paying Google or Facebook’s marketing boys. This comes as a surprise? It has always been that way in tech: Brains and hard work, innovate. Ask Wernher von Braun. They landed on the wrong planet, but in the sixties and seventies his rockets got us to the moon.

A car is a car. A fridge is a fridge. A TV set is a TV set. They shouldn’t be Google’s or Facebook’s data mining devices. Besides, why would you give the data away? Your appliances collected it, not theirs. Talk with your customers fairly and openly on how it can and how it can’t be used.

As managers at these industries it’s up to you to solve the crisis of social features vs. privacy mining.

Kind regards, from one of the guys who developed such technology for Nokia’s N9.

Warming up

Hey former Harmattan peeps. How about we do a little bit of this Jolla stuff after our hours and see where it goes? You never know, and neither have any of the technologies and improvements that we did for Nokia harmed us. It’s at #jollamobile on FreeNode. Btw. Ping me if you are going to FOSDEM. Maybe we can discuss how we can revive some of our Harmattan projects? Personally, I’m thinking about reducing the role of Tracker’s FS miner in Jolla by first refactoring libtracker-extract and adapting buteo to call for metadata extraction instead of letting miner-fs pick the newly added files up. Dead to file system monitoring on phones!

At the same time I’m also working with Calligra a lot lately. Which is by the way awesome stuff. Can’t choose.

Morals? Forbidding stuff?

It isn’t freedom to have to choose for Richard Stallman’s world view. It isn’t ‘freedom’ to be called immoral just because you choose another ethic. It isn’t freedom when a single person or group with a single view on morality tries to forbid you something based on just their point of view.

For example, Stallman has repeatedly said about Trusted Computing (which he in a childish way apparently calls Treacherous Computing) that it ‘should be illegal’ (that’s a quote from official FSF and GNU pages). I also recall Stallman trying to forbid blog posts about proprietary software (it was about VMWare) on planet-gnome (original thread here).

Richard Stallman and some of his followers don’t seem to understand that it isn’t necessarily moral to impose your world view, about morality, on everybody else by claiming, for example, that the other’s view ‘should be illegal’ or ‘is immoral’ (these are terms that he and some of his followers frequently use).

Firstly something should be only illegal when all procedures for making a new law in a country have been followed. In most democratic countries that means getting a majority in parliament but also getting advise from your country’s judges and from experts in the field who’ll be affected by your new law. So not just by listening and following a guy like Richard Stallman blindly. This is why I was very much against a rule for planet-gnome to forbid posts about proprietary software that uses GNOME: nor the majority of GNOME foundation members nor all experts in the field who’d be affected by that new law nor all the maintainers of planet-gnome (its judges) followed Richard’s opinion.

In this new situation it also isn’t only Richard Stallman who should be blindly followed. Ubuntu needs to take into account all stakeholders and not just Stallman and his followers.

Secondly is morality defined by a person’s own views and for a huge part by that person’s culture. ‘How we ought to live’ is (also) a question at the individual level. Not per definition answered by Richard Stallman alone. Although, sure, it can be one’s choice to strictly copy Richard’s morals. Morality is not necessarily a single option nor is it necessarily written in a single book.

For me it’s not fine when your morality includes enforcing others to copy exactly your morals. To put it in a way that Richard’s strict followers might understand: for me morality isn’t like the GPL; agreeing to some of Stallman’s morals does not mean having to puristic copy them all.

Allowing local cults of personality in open source

Hey Aaron. I mostly agree with your post. I don’t fully agree, however, with “We needed Android because we couldn’t do it ourselves”:

Mostly Qt (and also KDE) developers, and some GNOME developers who where still left developing for Nokia since the N900 and earlier, made the Nokia N9 Swipe phone. Technically the product is a success; look at the N9’s reviews to verify that. Marketing-wise it’s sort of a failure due to, in my humble opinion, a CEO switch at the wrong time and because he didn’t have enough time to learn how good the phone actually was. But even without much marketing, the product is being sold as we speak.

I do agree if you mean with your blog post that for example the N9 happened thanks to local leadership. The leadership that made it happen was employed at Nokia though, and not really a person in either the Qt or the GNOME camp. Rather a group of passionate leadership-taking people at Nokia.

It might have contributed that these technical leaders didn’t see how strong they could have been together during the CEO switch, at the time when Ari Jaaksi left Nokia as soon as Stephen Elop’s plans became clear. I’m not sure.

I think what we can learn from the episode is to put more trust in the person, and the leadership-taking people, who lead the next product developed the way the N9 was developed. Give those people more time onstage at open source conferences.

I’m also sick and tired of Free Software being inefficient and self-destructive due to internal schism. It’s one of the reasons why I’m not working much on Free Software nowadays. As I’m not much of a leader myself, I silently hope some local leader would change this. Maybe somebody at Digia? Jolla? If I can help, let me know.

How I think companies like Jolla should do it

I’ll focus on the technical stuff; I think I would only Peter Principle myself if I would try giving management advice.

What I’ve seen too much are community projects, companies or groups who think that the synchronization of Harmattan with Moblin or MeeGo was done well to make what is now the OS on the N9. Luckily is Jolla hiring Harmattan staff, so they understand the situation.

For me it was always clear that “MeeGo” was a more or less failed PR thing between Intel and Nokia. By the time the N9 was first released wasn’t Harmattan synchronized with Moblin or MeeGo technically very much. And after several updates of Harmattan it still isn’t.

The situation on the N9 now is an OS that has relatively few technical resemblance with “MeeGo”. For me is N9’s software Harmattan or Maemo 6. It’s the continuation of the software on the N900: Maemo 5 or Fremantle (after ~ two or three rather big rewrites, that much is true). That the rewrites happened doesn’t mean that during those rewrites Harmattan suddenly became MeeGo. MeeGo is, in other words, a different platform.

A successful project will have to work with what Harmattan is, and not try to replace it with what MeeGo is today. If they do want to end up with “MeeGo” on an N9 they will have to progressively improve Harmattan towards that goal by for example asking Nokia to open closed components, by developing fixes for softwares that are already open source (a lot are), by repackaging them and by explaining N9 owners how to add a repository and how to upgrade their phone safely.

I understand the idea isn’t to deploy on an N9, but if you want a new phone or device that resembles what the N9 is; the N9’s software is in my opinion not MeeGo but Harmattan. Rewrites have happened too often already. It’s my opinion that yet another rewrite of Harmattan isn’t a good idea at all.

For example replacing the Debian package management system with RPM doesn’t sound like a viable option to me at all. Nor is replacing any of the major middleware really doable within the timeframe you’d have to deliver to be relevant.

Instead software project per software project improve the phone’s OS. Kinda like how Ximian did Red Carpet many years ago (which also supported multiple package management systems).

No more big rewrites, no more starting from scratch. No more politics about how it should have been done. Start with the platform as it is. There are reasons why the OS is good, and among the reasons is that good middleware choices and compromises were made.

Kind regards, good luck.

Avoiding duplicate album art storage on the N9

At Tracker (core component of Nokia N9‘s MeeGo Harmattan’s Content Framework) we extract album art out of music files like MP3s, and we do a heuristic scan in the same directory of the music files for files like cover.jpg.

Right now we use the media art storage spec which we at a Boston Summit a few years ago, together with the Banshee guys, came up with. This specification allows for artist + album media art.

This is a bit problematic now on the N9 because (embedded) album art is getting increasingly bigger. We’ve seen music stores with album art of up to 2MB. The storage space for this kind of data isn’t unlimited on the device. In particular is it a problem that for an album with say 20 songs by 20 different artists, with each having embedded album art, 20 times the same album art is stored. Just each time for a different artist-album combination.

To fix this we’re working on a solution that compares the MD5 of the image data of the file album-md5(space)-md5(album).jpg with the MD5 of the image data of the file album-md5(artist)-md5(album).jpg. If the contents are the same we will make a symlink from the latter to the former instead of creating a normal new album art file.

When none exist yet, we first make album-md5(space)-md5(album).jpg and then symlink album-md5(artist)-md5(album).jpg to it. And when the contents aren’t the same we create a normal file called album-md5(artist)-md5(album).jpg.

Consumers of the album art can now choose between using a space for artist if they are only interested in ‘just album’ album art, or filling in both artist and album for artist-album album art.

This is a first idea to solve this issue, we have some other ideas in mind for in case this solution comes with unexpected problems.

I usually blog about unfinished stuff. Also this time. You can find the work in progress here.

Refactoring our writeback system

Tracker writes back certain metadata to your files. It for example writes back in XMP the title of a JPeg file, among other fields that XMP supports.

We had a service that runs in the background waiting for signals coming from the RDF store that tell it to perform a writeback.

To avoid that our FS miner would pick up the changes that the writeback service made, and that way index the file again, we introduced a D-Bus API for our FS miner called IgnoreNextUpdate. When the API is issued will the FS miner ignore the first next filesystem event that would otherwise be handled on a specific file.

That API is now among our biggest sources of race conditions. Although we wont remove it from 0.10 due to API promises, we don’t like it and want to get rid of it. Or at least we want to replace all its users.

To get rid of it we of course had to change the writeback service in a way that it wouldn’t need the API call on the FS miner any longer.

The solution we came up with was to move the handling of the signal and the queuing to the FS miner‘s process. There we have all the control we need.

The original reason why writing back was done as a service was to be robust against the libraries, used for the actual writeback, crashing or hanging. We wanted to keep this capability, so just like the extractor is a portion of the writeback system going to run out of process of the FS miner.

When a queued writeback task is to be run, an IPC call to a writeback process is made and returns only when it’s finished. Then the next task in the queue, in the FS miner, is selected. A lot like how the extracting of metadata works.

We have and will be working on this in the writeback-refactor branches next few days.

The ever growing journal problem

Current upstream situation

In Tracker‘s RDF store we journal all inserts and deletes. When we replay the journal, we replay every event that ever happened. That way you end up in precisely the same situation as when the last journal entry was appended. We use the journal also for making a backup. At restore we remove the SQLite database, put your backup file where the journal belongs, and replay it.

We also use the journal to cope with ontology changes. When an ontology change takes place for which we have no support using SQLite’s limited ALTER, we replay the journal over a new SQLite database schema. While we replay we ignore errors; some ontology changes can cause loss of data (ie. removal of a property or class).

This journal has a few problems:

First the obvious space problem: when you insert a lot of data and later remove it all; instead of consuming no space at all it consumes twice the amount of space for an empty database. Unless you remove the journal, you can’t get it back. It’s all textual data so even when trying really, really hard wont you consume gigabytes that way. Nowadays are typical hard drives several hundreds of gigabytes in size. But yes, it’s definitely not nice.
Second problem is less obvious, but far worse: your privacy. When you delete data you expect it to be gone. Especially when a lot of desktop interaction involves inserting or deleting data with Tracker. For example recently visited websites. When a user wants to permanently remove his browser history, he doesn’t want us to keep a copy of the insert and the delete of that information. With some effort it’s still retrievable. That’s not only bad, it’s a defect!

This was indeed not acceptable for Nokia’s N9. We decided to come up with an ad-hoc solution which we plan to someday replace with a permanent solution. I’ll discuss the permanent solution last.

The ad-hoc solution for the N9

For the N9 we decided to add a compile option to disable our own journal and instead use SQLite’s synchronous journaling. In this mode SQLite guarantees safe writes using fsync.

Before we didn’t use synchronous journaling of SQLite and had it replaced with our own journal for earlier features (backup, ontology change coping) but also, more importantly, because the N9’s storage hardware has a high latency on fsync: we wanted to take full control by using our own journal. Also because at first we were told it wouldn’t be possible to force-shutdown the device, and then this suddenly was again possible in some ways: we needed high performance plus we don’t want to lose your data, ever.

The storage space issue was less severe: the device’s storage capacity is huge compared to the significance of that problem. However, we did not want the privacy issue so I managed to get ourselves the right priorities for this problem before any launch of the N9.

The performance was significantly worse with SQLite’s synchronous journaling, so we implemented manual checkpointing in a background thread for our usage of SQLite. With this we have more control over when fsync happens on SQLite’s WAL journal. After some tuning we got comparable performance figures even with our high latency storage hardware.

We of course replaced the backup / restore to just use a copy of the SQLite database using SQLite’s backup API.

Above solution means that we lost an important feature: coping with certain ontology changes. It’s true that the N9 will not cope with just any ontology change, whereas upstream Tracker does cope with more kinds of ontology changes.

The solution for the N9 will be pragmatic: we won’t do any ontology changes, on any future release that is to be deployed on the phone, that we can’t cope with, unless the new ontology gets shipped alongside a new release of Tracker that is specifically adapted and tested to cope with that ontology change.

Planned permanent solution for upstream

The permanent solution will probably be one where the custom journal isn’t disabled and periodically gets truncated to have a first transaction that contains an entire copy of the SQLite database. This doesn’t completely solve the privacy issue, but we can provide an API to make the truncating happen at a specific time, wiping deleted information from the journal.

We delivered

Damned guys, we’re too shy about what we delivered. When the N900 was made public we flooded the planets with our blogs about it. And now?

I’m proud of the software on this device. It’s good. Look at what Engadget is writing about it! Amazing. We should all be proud! And yes, I know about the turbulence in Nokia-land. Deal with it, it’s part of our job. Para-commandos don’t complain that they might get shot. They just know. It’s called research and development! (I know, bad metaphor)

I don’t remember that many good reviews about even the N900, and that phone was by many of its owners seen as among the best they’ve ever owned. Now is the time to support Harmattan the same way we passionately worked on the N900 and its predecessor tablets (N810, N800 and 770). Even if the N9’s future is uncertain: who cares? It’s mostly open source! And not open source in the ‘Android way’. You know what I mean.

The N9 will be a good phone. The Harmattan software is awesome. Note that Tracker and QSparql are being used by many of its standard applications. We have always been allowed to develop Tracker the way it’s supposed to be done. Like many other similar projects: in upstream.

As for short term future I can announce that we’re going to make Michael Meeks happy by finally solving the ever growing journal problem. Michael repeatedly and rightfully complained about this to us at conferences. Thanks Michael. I’ll write about how we’ll do it, soon. We have some ideas.

We have many other plans for long term future. But let’s for now work step by step. Our software, at least what goes to Harmattan, must be rock solid and very stable from now on. Introducing a serious regression would be a catastrophe.

I’m happy because with that growing journal – problem, I can finally focus on a tough coding problem again. I don’t like bugfixing-only periods. But yeah, I have enough experience to realize that sometimes this is needed.

And now, now we’re going to fight.

INSERT OR REPLACE explained in more detail

A few weeks ago we were asked to improve data entry performance of Tracker’s RDF store.

From earlier investigations we knew that a large amount of the RDF store’s update time was going to the application having to first delete triples and internally to the insert having to look up preexisting values.

For this reason we came up with the idea of providing a replace feature on top of standard SPARQL 1.1 Update.

When working with triples is a feature like replace of course a bit ambiguous. I’ll first briefly explain working with triples to describe things. When I want to describe a person Mark who has two dogs, we could do it like this:

Max is a Dog
Max is 10 years old
Mimi is a Dog
Mimi is 11 years old
Mark is a Person
Mark is 30 years old
Mark owns Max
Mark owns Mimi

If you look at those descriptions, you can simplify each by writing exactly three things: the subject, the property and the value.

In RDF we call these three subject, predicate and object. All subjects and predicates will be resources, the objects can either be a resource or a literal. You wrap resources in inequality signs.

You can continue talking about a resource using semicolon, and you continue talking about a predicate using comma. When you want to finish talking about a resource, you write a dot. Now you know how the Turtle format works.

In SPARQL Update you insert data with INSERT { Turtle formatted data }. Let’s translate that to Mark’s story:

INSERT {
  <Max> a <Dog> ;
        <hasName> ‘Max’ ;
        <hasAge> 10 .
  <Mimi> a <Dog> ;
        <hasName> ‘Mimi’ ;
        <hasAge> 11 .
  <Mark> a <Person> ;
         <hasName> ‘Mark’ ;
         <hasAge> 30 ;
         <owns> <Max>, <Mimi>
}

In the example we are using both single value property and multiple value properties. You can have only one name and one age, so <hasName> and <hasAge> are single value properties. But you can own more than one dog, so <owns> is a multiple value property.

The ambiguity with a replace feature for SPARQL Update is at multiple value properties. Does it need to replace the entire list of values? Does it need to append to the list? Does it need to update just one item in the list? And which one? This probably explains why it’s not specified in SPARQL Update.

For single value properties there’s no ambiguity. For multiple value properties on a resource where the particular triple already exists, there’s also no ambiguity: RDF doesn’t allow duplicate triples. This means that in RDF you can’t own <Max> twice. This is also true for separate insert executions.

In the next two examples the first query is equivalent to the second query. Keep this in mind because it will matter for our replace feature:

INSERT { <Mark> <owns> <Max>, <Max>, <Mimi> }

Is the same as

INSERT { <Mark> <owns> <Max>, <Mimi> }

There is no ambiguity for single value properties so we can implement replace for single value properties:

INSERT OR REPLACE {
  <Max> a <Dog> ;
        <hasName> ‘Max’ ;
        <hasAge> 11 .
  <Mimi> a <Dog> ;
        <hasName> ‘Mimi’ ;
        <hasAge> 12 .
  <Mark> a <Person> ;
         <hasName> ‘Mark’ ;
         <hasAge> 31 ;
         <owns> <Max>, <Mimi>
}

As mentioned earlier doesn’t RDF allow duplicate triples, so nothing will change to the ownerships of Mark. However, would we have added a new dog then just as if OR REPLACE was not there would he be added to Mark’s ownerships. The following example will actually add Morm to Mark’s dogs (and this is different than with the single value properties, they are overwritten instead).

INSERT OR REPLACE {
  <Morm> a <Dog> ;
        <hasName> ‘Morm’ ;
        <hasAge> 2 .
  <Max> a <Dog> ;
        <hasName> ‘Max’ ;
        <hasAge> 12 .
  <Mimi> a <Dog> ;
         <hasName> ‘Mimi’ ;
         <hasAge> 13 .
  <Mark> a <Person> ;
          <hasName> ‘Mark’ ;
          <hasAge> 32 ;
          <owns> <Max>, <Mimi>, <Morm>
}

We know that this looks a bit strange, but in RDF it kinda makes sense too. Note again that our replace feature is not part of standard SPARQL 1.1 Update (and will probably never be).

If for some reason you want to completely overwrite Mark’s ownerships then you need to precede the insert with a delete. If you also want to remove the dogs from the store (let’s say because, however unfortunate, they died), then you also have to remove their rdfs:Resource type:

DELETE { <Mark> <owns> ?dog . ?dog a rdfs:Resource }
WHERE { <Mark> <owns> ?dog }
INSERT OR REPLACE {
  <Fred> a <Dog> ;
        <hasName> ‘Fred’ ;
        <hasAge> 1 .
  <Mark> a <Person> ;
         <hasName> ‘Mark’ ;
         <hasAge> 32 ;
         <owns> <Fred> .
}

We don’t plan to add a syntax for overwriting, adding or deleting individual items or entire lists of a multiple value property at this time (other than with the preceding delete). There are technical reasons for this, but I will spare you the details. You can find the code that implements replace in the branch sparql-update where it’s awaiting review and then merge to master.

We saw performance improvements, whilst greatly depending on the use-case, of 30% and more. A use-case that was tested in particular was synchronizing contact data. The original query was varying in time between 17s and 23s for 1000 contacts. With the replace feature it takes around 13s for 1000 contacts. For more information on this performance test, read this mailing list thread and experiment yourself with this example.

The team working on qtcontacts-tracker, which is a backend for the QtContacts API that uses Tracker’s RDF store, are working on integrating with our replace feature. They promised me tests and numbers by next week.

A REPLACE extension for Tracker’s SPARQL’s Update

SPARQL Update has INSERT and DELETE. To update an existing triple in RDF you need to DELETE it first. You of course already have our INSERT-SILENT but that just ignores certain errors; it doesn’t replace triples.

A (performance) problem is that with each DELETE having to solve all possible solutions you create an extra query for each time you want to update using a ‘DELETE-WHERE INSERT’-construction.

INSERT also checks for old values. It has to do this to implement SPARQL Update where you can’t insert a triple with a different value than the old value: If the value of a triple is identical, the insert for that triple is ignored; if the triple didn’t exist yet, it’s inserted; if the values aren’t identical, error is thrown — you need to use DELETE upfront.

Both having to do the extra delete and the old-values come at a performance price.

To solve this we plan to provide Tracker specific support for REPLACE. It’ll be Tracker specific simply because this isn’t specified in SPARQL Update. That has a probable reason:

Replacing or updating doesn’t fit well in the RDF world. Updating properties that have multiple values, like nie:keyword, is ambiguous: does it need to replace the entire list of values; does it need to append to the list; does it need to update just one item in the list, and which one? This probably explains why it’s not specified in SPARQL Update.

We decided to let our REPLACE be only different than INSERT for single value properties. For multi value properties will our REPLACE behave the same as normal INSERT.

How a GraphUpdated triggered by a REPLACE behaves is still being decided. Especially the value of the object’s ID for resource objects in the ‘deletes’-array. Having to look up the old ID kinda defeats the purpose of having a REPLACE (as we’d still need to look it up, like what an INSERT does, destroying part of the performance gain).

Either way, let me show you some examples:

We start with an insert of a resource that has a single value and two times a multi value property filled in:

INSERT { <r> a nie:InformationElement ;
             nie:title 'title';
             nie:keyword 'keyw1';
             nie:keyword 'keyw2' }

A quick query to verify, and yes it’s in:

SELECT ?t ?k { <r> nie:title ?t; nie:keyword ?k }
Results:
  title, keyw1
  title, keyw2

If we repeat the query a second time then the old-values check will turn the insert into a noop:

INSERT { <r> a nie:InformationElement ;
             nie:title 'title';
             nie:keyword 'keyw1';
             nie:keyword 'keyw2' }

And a quick query to verify that, and indeed nothing has changed:

SELECT ?t ?k { <r> nie:title ?t; nie:keyword ?k }
Results:
  title, keyw1
  title, keyw2

If we’d do that last insert query but with different values, we’d get this:

INSERT { <r> a nie:InformationElement ;
             nie:title 'title new';
             nie:keyword 'keyw4';
             nie:keyword 'keyw3' }

SparqlError.Constraint: Unable to insert multiple values for subject
`r' and single valued property `dc:title' (old_value: 'title', new
 value: 'title new')

Note that for the two nie:keyword triples this would have worked, but given that each query is a transaction and because the nie:title part failed, aren’t those two written either.

Let’s now try the same with INSERT OR REPLACE (edit: changed from just REPLACE to INSERT OR REPLACE):

INSERT OR REPLACE { <r> a nie:InformationElement ;
                        nie:title 'title new';
                        nie:keyword 'keyw4';
                        nie:keyword 'keyw3' }

And a quick query now yields:

SELECT ?t ?k { <r> nie:title ?t; nie:keyword ?k }
Results:
  title new, keyw1
  title new, keyw2
  title new, keyw3
  title new, keyw4

You can see that how it behaved for nie:title was different than for nie:keyword. That’s because nie:title is a single value -and nie:keyword is a multi value property.

What if we do want to reset the multi value property and insert a complete new list? Simple, just do this as a single query (space or newline delimited) (edit: changed to INSERT OR REPLACE from just REPLACE):

DELETE { <r> nie:keyword ?k } WHERE { <r> nie:keyword ?k }
INSERT OR REPLACE { <r> a nie:InformationElement ;
                        nie:title 'title new';
                        nie:keyword 'keyw4';
                        nie:keyword 'keyw3' }

And a quick query now yields:

SELECT ?t ?k { <r> nie:title ?t; nie:keyword ?k }
Results:
  title new, keyw3
  title new, keyw4

The work on this is in progress. You can find it in the branch sparql-update. It’s working but especially the GraphUpdated stuff is unfinished.

Also note that the final syntax may change.

Synchronizing your application’s data with Tracker’s RDF store

A few months ago we added the implicit tracker:modified property to all resources. This property is an auto-increment. It used to be that the property was incremented on ~ each SQL update-query that happens. The value is stored per resource.

We are now changing this to be per transaction. A transaction in Tracker is one set of SPARQL-Update INSERT or DELETE queries. You can do inserts and deletes about multiple resources in one such sentence (a sentence can contain multiple space delimited Update queries). An exception is everything related to ontology changes. These ontology changes get the first increment as their value for tracker:modified. This is also for ontology changes that happen after the initial ontology transaction (at the first start, is this first transaction made). The exception is made for supporting future ontology changes and the possibly needed data conversions.

The per-resource tracker:modified value is useful for application’s synchronization purposes: you can test your application’s stored tracker:modified value against the always increasing (w. exception at int. overflow) Tracker’s tracker:modified value to know whether or not your version is older.

The reason why we are changing this to per-transaction is because this way we can guarantee that the value will be restored after a journal replay and/or a backup’s restore without having to store it in either the journal nor the backup. This means that we now guarantee the value being restored without having to change either the backup’s format nor the journal’s format.

Having a persistent journal we actually make a simple copy of the journal to deliver you a backup in a fast file-copy. But let this deception be known only by the people who care about the implementation. Sssht!

We’re already rotating and compressing the rotated chunks for reducing the journal size. We’re working on not journaling data that is embedded in local files this week. A re-index of that local file will re-insert the data anyway. This will significantly reduce the size of the journal too.

M	T	W	T	F	S	S
« Oct
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31