Focus on query performance

Every (good) developer knows that copying of memory and boxing, especially when dealing with a large amount of pieces like members of collections and the cells in a table, are a bad thing for your performance.

More experienced developers also know that novice developers tend to focus on just their algorithms to improve performance, while often the single biggest bottleneck is needless boxing and allocating. Experienced developers come up with algorithms that avoid boxing and copying; they master clever pragmatical engineering and know how to improve algorithms. A lot of newcomers use virtual machines and script languages that are terrible at giving you the tools to control this and then they start endless religious debates about how great their programming language is (as if it matters). (Anti-.NET people don’t get on your horses too soon: if you know what you are doing, C# is actually quite good here).

We were of course doing some silly copying ourselves. Apparently it had a significant impact on performance.

Once Jürg and Carlos have finished the work on parallelizing SELECT queries we plan to let the code that walks the SQLite statement fill in the DBusMessage directly without any memory copying or boxing (for marshalling to DBus). We found the get_reply and send_reply functions; they sound useful for this purpose.

I still don’t really like DBus as IPC for data transfer of Tracker’s RDF store’s query results. Personally I think I would go for a custom Unix socket here. But Jürg so far isn’t convinced. Admittedly he’s probably right; he’s always right. Still, DBus to me doesn’t feel like a good IPC for this data transfer..

We know about the requests to have direct access to the SQLite database from your own process. I explained in the bug that SQLite3 isn’t MVCC and that this means that your process will often get blocked for a long time on our transaction. A longer time than any IPC overhead takes.

9 thoughts on “Focus on query performance

  1. pvanhoof Post author

    So you have reached my comment system. It’s possible that you are an anti-.NETter who wants to respond to what I wrote about C# and .NET.

    If you aren’t one of them then please, really, seriously, completely ignore all this nonsense. You are totally welcome and I do apologize deeply that I’m even typing this. It’s not directed at you, at all. I know it’s childish. I know it looks needless (but sadly, it isn’t). I know, I know (you should see the religious crap that we sometimes receive, it’s amazing).

    If you are one of those anti-.NET idiots:

    The situation is very simple: either you have technical argumentation and you are emotionally intelligent to bring it in such a way that I’ll accept it, or you don’t deserve even a single bit of storage on my blog infrastructure.

    I have read and removed plenty of nonsense coming from you guys. It has really been that bad.

    I don’t think stupidity is immoral, but I do think hatred is.

    I have no place for hatred. I host no platform for it. If hatred is your goal, you are not invited nor welcome here. You are already banned.

    But I’m also not even going to waste my time with you until necessary. So if you repost it, I will block your entire ISP’s range for a few months (or whatever is most convenient for me).

    You don’t have any say in this. I do own this infrastructure. I do rent the bandwidth. I do decide and I am consciously assertive about it. That’s because I ethically refuse to be a coward and I do take a stance on this matter. Even if that means hate for me from you.

  2. Simon

    It’s an interesting discussion on the bug. From a user point of view, I find the Rhythmbox UI works pretty well most of the time. But it’s true, without any filters applied, the track listing is nearly useless – from both a technical and user perspective, fetching 100k items into that table is a waste of time.

    A question, though – like most music players, RB has a random function, picking a new random track every time it gets to the end of the current one. How would you implement that one – can you ask Tracker to “give me a random item matching some criteria”? Or would it have to be done client-side, meaning a full track list is needed, without the ability to read page-by-page?

  3. pvanhoof Post author

    @Simon: Don’t you just need LIMIT and COUNT() for random?

    @Ruben: I’d be interested in using Firebird for example. It advertises a small memory footprint too. But for now SQLite will do (maybe in a new major release cycle of Tracker).

  4. Jean

    And experts know that scipting languages can be easly extended by low-level languages like C or Java. So they start program in scripting language and when necessary, they optimize critical parts in low-level language.

  5. Simon

    @Philip – you mean pick a random number up to the number of items, then use the paging mechanism to ask for a 1-item page at that offset? Didn’t think of that – I was thinking in terms of how you’d request a single item by a a known ID, but yeah, your way would work without any extra API…

  6. pvanhoof Post author

    @Jos: Why don’t you make a simple layer that translates our D-Bus results to that http protocol? Or make a http socket service in src/tracker-store that does exactly this? Look at the src/tracker-store/tracker-resources.c file for an example. It’s really easy to do.

    We aim for desktop and mobile platforms, not for cloud platforms. We think that at this moment Virtuoso is probably a better RDF/SPARQL endpoint for use-cases where you have huge amounts of memory, electrical energy and CPU resources. Like servers. This ain’t what we are optimizing for.

Leave a Reply