Archive for the ‘english’ Category

Performance improvements for SPARQL’s regex in Tracker

Thursday, March 25th, 2010

The original SPARQL regex support of Tracker is using a custom SQLite function. But of course back when we wrote it we didn’t yet think much about optimizing. As a result, we were using g_regex_match_simple which of course recompiles the regular expression each time.

Today Jürg and me found out about sqlite3_get_auxdata and sqlite3_set_auxdata which allows us to cache a compiled value for a specific custom SQLite function for the duration of the query.

This is much better:

static void
function_sparql_regex (sqlite3_context *context,
                       int              argc,
                       sqlite3_value   *argv[])
{
  gboolean ret;
  const gchar *text, *pattern, *flags;
  GRegexCompileFlags regex_flags;
  GRegex *regex;

  if (argc != 3) {
    sqlite3_result_error (context, "Invalid argument count", -1);
    return;
  }

  regex = sqlite3_get_auxdata (context, 1);
  text = sqlite3_value_text (argv[0]);
  flags = sqlite3_value_text (argv[2]);
  if (regex == NULL) {
    gchar *err_str;
    GError *error = NULL;
    pattern = sqlite3_value_text (argv[1]);
    regex_flags = 0;
    while (*flags) {
      switch (*flags) {
      case 's': regex_flags |= G_REGEX_DOTALL; break;
      case 'm': regex_flags |= G_REGEX_MULTILINE; break;
      case 'i': regex_flags |= G_REGEX_CASELESS; break;
      case 'x': regex_flags |= G_REGEX_EXTENDED; break;
      default:
        err_str = g_strdup_printf ("Invalid SPARQL regex flag '%c'", *flags);
        sqlite3_result_error (context, err_str, -1);
        g_free (err_str);
        return;
      }
      flags++;
    }
    regex = g_regex_new (pattern, regex_flags, 0, &error);
    if (error) {
      sqlite3_result_error (context, error->message, error->code);
      g_clear_error (&error);
      return;
    }
    sqlite3_set_auxdata (context, 1, regex, (void (*) (void*)) g_regex_unref);
  }
  ret = g_regex_match (regex, text, 0, NULL);
  sqlite3_result_int (context, ret);
  return;
}

Before (this was a test on a huge amount of resources):

$ time tracker-sparql -q "select ?u { ?u a rdfs:Resource . FILTER (regex(?u, '^titl', 'i')) }"
real	0m3.337s
user	0m0.004s
sys	0m0.008s

After:

$ time tracker-sparql -q "select ?u { ?u a rdfs:Resource . FILTER (regex(?u, '^titl', 'i')) }"
real	0m1.887s
user	0m0.008s
sys	0m0.008s

This will hit Tracker’s master today or tomorrow.

Working hard at the Tracker project

Wednesday, March 17th, 2010

Today we improved journal replaying from 1050s for my test of 25249 resources to 58s.

Journal replaying happens when your cache database gets corrupted. Also when you restore a backup: restore uses the same code the journal replaying uses, backup just makes a copy of your journal.

During the performance improvements we of course found other areas related to data entry. It looks like we’re entering a period of focus on performance, as we have a few interesting ideas for next week already. The ideas for next week will focus on performance of some SPARQL functions like regex.

Meanwhile are Michele Tameni and Roberto Guido working on a RSS miner for Tracker and has Adrien Bustany been working on other web miners like for Flickr, GData, Twitter and Facebook.

I think the first pieces of the RSS- and the other web miners will start becoming available in this week’s unstable 0.7 release. Martyn is still reviewing the branches of the guys, but we’re very lucky with such good software developers as contributors. Very nice work Michele, Roberto and Adrien!

The future of the European community, a European Monetary Fund.

Monday, March 8th, 2010

I’m worried about the EURO’s M3 if a European version of the IMF (a EMF) is to be installed.

Nonetheless, I think the European community should do it just to strengthen Europe’s economy. I’m not satisfied by Europe’s economic strength: I want it to be undefeatable.

We must not let the IMF solve our problems. Europe might be a political dwarf, but we Europeans should show that we will solve our own problems. We’re an adult composition of cultures with vast amounts of experience. We know how to solve any imaginable problem. And let’s not, in our defeatism, pretend we don’t.

A EMF is a commitment to future member states: Europe often asks them fundamental changes; economic strength is what Europe offers in return. This needs to come at a highest price: Greece will have to fix their deficit problem. Even if their entire population goes on strike. Greece will be an example for countries like my own: Belgium has to fix a serious deficit problem, too.

An EMF comes at an equally high price, and that frightens me a bit: I don’t want the ECB to go as ballistic on money creation as the FED has been last two years. I want the EURO to be the strongest relevant currency mankind has ever created. No matter how insane the rest of the world thinks that ambition is: I believe that keeping the EURO’s M3 in check is a key to creating a wealthy society in Europe.

Politically I want European nations to negotiate more and more often. The European Union is a political dwarf only because finding agreement is hard. But in the long run will our solution be the most negotiated, most tested on this planet.

Together we can deal with anything. That doesn’t mean it’ll be easy; it has never been easy: just seventy years ago we were still killing each other. We’re all guilty of that one way or another. And before that it wasn’t any better. Today, not that many people still care: “it wasn’t me”, right? So stop being a bitch about it, then.

It’s time to let it be. It’s time to start a new European century that will be better. With respect for all European cultures, languages, nations, nationalities, values, borders and interests.

But also a European century with economic responsibilities for each member. It’s our strength: we figured out how to keep our population wealthy: let’s continue doing so in the future.

Emotional (and social) intelligence

Sunday, March 7th, 2010

It was the dawn of the 1970s, at the height of worldwide student protests against the Vietnam War, and a librarian stationed at a U.S. Information Agency post abroad had received bad news: A student group was threatening to burn down her library.

But the librarian had friends among the group of student activists who made the threat. Her response on first glance might seem either naïve or foolhardy — or both: She invited the group to use the library facilities for some of their meetings.

But she also brought Americans living in the country there to listen to them — and so engineered a dialogue instead of a confrontation.

In doing so, she was capitalizing on her personal relationship with the handful of student leaders she knew well enough to trust — and for them to trust her. The tactic opened new channels of mutual understanding, and it strengthened her friendship with the student leaders. The library was never touched.

(More available at the flash preview widget’s page 21)

– Daniel Goleman, Working With Emotional Intelligence, Competencies of the stars. 1998

In Working with Emotional Intelligence, Daniel Goleman explains several practical methods to improve the social skills of people. Before I bought this book a year or two ago, I read Daniel’s first book Emotional Intelligence. This weekend I finally started reading Working With.

I recommend the section Some Misconceptions. Regretfully ain’t this section available for display in the flash preview widget. Instead of violating copyright laws by typing it down here, I’m recommending to just buy the book.

You can find audiobooks online. The section about misconceptions is at track three. Track five talks about two computer programmers, which is very illustrative for many of my blog’s readers (and possibly myself). I hope you wont illegally download using torrents. Instead, buy the material.

Also very interesting is this lecture by Daniel:

Part 1, Part 2, Part 3

Here you can also find a Authors@Google talk by Daniel Goleman:


What distinguishes Daniel Goleman from old line proponents of positive thinking, however, is his grounding in psychology and neuroscience. Armed with a Ph.D in psychology from Harvard and a first-grade journalism background at the New York Times, Dr. Goleman has authored half a dozen books that explore the physical and chemical workings on the brain and their relationship with what we experience as everyday life.

– Peter Allen, director of Google university, introduction to Daniel Goleman. August 3, 2007

I hope readers of my blog will shun away from pseudo science when it comes to emotional and social intelligence, but instead read and learn from authors like Daniel Goleman. I also (still) recommend the books available at The Moral Brain by for example Dr. Jan Verplaetse.

Tinymail 1.0!

Friday, March 5th, 2010

Tinymail‘s co-maintainer Sergio Villar just released Tinymail’s first release.

psst. I have inside information that I might not be allowed to share that 1.2 is being prepared already, and will have bodystructure and envelope summary fetch. And it’ll fetch E-mail body content per requested MIME part, instead of always entire E-mails. Whoohoo!

An ode to our testers

Tuesday, March 2nd, 2010

You know about those guys that use your software against huge datasets like their entire filesystem, with thousands of files?

We do. His name is Tshepang Lekhonkhobe and we owe him a few beers for reporting to us many scalability issues.

Today we found and fixed such a scalability issue: the update query to reset the availability of file resources (this is for support for removable media) was causing at least a linear increase of VmRss usage per amount of file resources. For Tshepang’s situation that meant 600 MB of VmRss. Jürg reduced this to 30 MB of peak VmRss in the same use-case, and a performance improvement from minutes to a second or two, three. Without memory fragmentation as glibc is returning almost all of the VmRss back to the kernel.

Thursday is our usual release day. I invite all of the 0.7 pioneers to test us with your huge filesystems, just like Tshepang always does.

So long and thanks for all the testing, Tshepang! I’m glad we finally found it.

Invisible costs

Monday, March 1st, 2010


We would rather suffer the visible costs of a few bad decisions than incur the many invisible costs that come from decisions made too slowly – or not at all – because of a stifling bureaucracy.

Letter by Warren E. Buffett to the shareholders of Berkshire, February 26, 2010

The Euro skeptics and pro Europeans are finally united in an opinion!

Thursday, February 25th, 2010

We both agree that Nigel Farage is a complete moron.

Perhaps we should put a damp rag like the one he mentions in his mouth next time he opens it?

Nigel Farage, you’re an disgrace to yourself. The European parliament is no place for personal attacks, and you aren’t fit to carry the title Member of the European Parliament. Please keep the honour to yourself and resign.

Every sensible person outside of the U.K. thinks you should. Even the Euro skeptics do. You’re an embarrassment for your country and its culture, so I hope for the people in the U.K. that they’ll kick you out of politics.

I fear you’re just playing the populist card, and that you’ll even get votes for this from other morons.

Working hard

Thursday, February 18th, 2010

I don’t decide about Tracker‘s release. The team of course does.

But when you look at our roadmap you notice one remaining ‘big feature’. That’s coping with modest ontology changes.

Right now if we’d make even a small ontology change all of our users would have to recreate their databases. We also don’t support restoring a backup of your metadata over a modified ontology.

This is about to change. This week I started working in a branch on supporting class and property ontology additions.

I finished it today and it appears to be working. The patches obviously need a thorough review by the other team members, and then testing of course. I invite all the contributors and people who have been testing Tracker 0.7′s releases to tryout the branch. It only supports additions, so don’t try to change properties or classes, or remove them. You can only add new ones. You might have noticed the nao:deprecated property in the ontology files? That’s what we do with deleted properties.

Anyway

Meanwhile are Martyn and Carlos working on a bugfix in the miner about duplicate entries for file resources and on a timeout for the extractor so that extraction of large or complicated documents doesn’t block the entire filesystem miner.

Jürg is working on timezone storage for xsd:dateTime fields and last few days he implemented limited support for named graphs.

By the looks of it, one would almost believe that Tracker’s first new stable release is almost ready!

Please don’t rewrite softwares (that are) written in .NET

Tuesday, February 9th, 2010

This (super) cool .NET developer and good friend came to me at the FOSDEM bar to tell me he was confused about why during the Tracker presentation I was asking people to replace F-Spot and Banshee.

I hope I didn’t say it like that, I would never intent to say that. But I’ll review the video of the presentation as soon as Rob publishes it.

Anyway, to ensure everybody understood correctly what I did wanted to say (whether or not I did, is another question):

The call was to inspire people to reimplement or to provide different implementations of F-Spot’s and Banshee’s data backends, so that they would use an RDF store like tracker-store instead of each app its own metadata database.

I think I also mentioned Rhythmbox in the same sentence because the last thing I would want is to turn this into a .NET vs. anti-.NET debate. It just happens to be that the best GNOME softwares for photo and music management are written in .NET (and that has a good reason).

People who know me also know that I think those anti-.NET people are disruptive ignorable people. I also actively and willingly ignore them (and they should know this). I’m actually a big fan of the Mono platform.

I’ll try to ensure that I don’t create this confusion during presentations anymore.

SMASHED at FOSDEM?

Wednesday, February 3rd, 2010

This is to let Rob Taylor and David Schlesinger know that they better start organizing S.M.A.S.H.E.D.

FWD: [Tracker] tracker-miner-rss 0.3

Wednesday, January 27th, 2010

This is the kind of stuff that needs a forward on the planets:

From: Roberto -MadBob- Guido

This is just an update about tracker-miner-rss effort, already mentioned in this list some time ago.

Website, SVN, Last release (0.3)

Since 0.2 we (Michele and me) have just dropped dependency from rss-glib due some limitation found, and created our own Glib-oriented feeds handling library, libgrss, starting from the code of Liferea and adding nice stuffs such as a PubSub subscriber implementation. At the moment it is shipped with tracker-miner-rss itself, in the future may be splitted so to easy usage by other developers.

Next will come integration with libchamplain to describe geographic points found in geo-rss enabled feeds, integration with libedataserver to better handle “person” rappresentation (suggestions for a better PIM-like shared library with useful objects?), and perhaps a first full-featured feed reader using Tracker as backend.

Enjoy :-)

Roberto is doing a demo on FSter at FOSDEM during our presentation. My role in the presentation will be light this year. I decided to give most of the talk away to Rob Taylor and Roberto. I will probably demo Debarshi Ray‘s Solang and if time permits his work on the Nautilus integration. Regretfully Debarshi can’t come and so he asked me to do the demo.

Solang, a photo manager

Monday, January 18th, 2010

For the last few weeks has Debarshi Ray contributed to Tracker’s Nautilus plugin and worked on Solang, a photo manager that will start using Tracker’s SPARQL capability to get a language to query for metadata about the photos and the photos themselves.

Debarshi explains it all very well himself on his own blog.

We’ll probably do a lightening demo during our Tracker presentation at FOSDEM about how Solang did this integration. We’re also planning to demo the code of a few other applications that are working on integrating with Tracker’s store.

Somebody should port Solang to the next version of Maemo!

The role of media in the USA

Monday, January 18th, 2010

Two posts ago I wrote that something like The Real news is quite unique in the U.S.’s completely broken media.

Today I found an interesting double interview on AlJazeeraEnglish by Riz Khan titled Has the mainstream media in the US replaced serious coverage with “junk news” and tabloidism?

Part 3, Zbigniew Brzezinski on Iran

Sunday, January 17th, 2010

Brzezinski

In the third segment of The Real News‘ interview with Dr. Brzezinski, Paul Jay asks him about Israel’s threat to bomb Iranian Nuclear facilities and the American strategy towards Iran.

Brzezinski talks about how this might force the U.S. out of the region in the short term, how it would affect the price of oil, how the U.S. would be militarily involved and how the U.S. would be alone in this. And what the fundamental consequences for Israel would be.

You can find all three parts of the interview and their transcripts here:

Part 1, Part 2, Part 3

Politics, skimming facebook

It’s Sunday so I skim Facebook a bit. I came across Lefty’s link to a 100 quotes every geek should know blog. Artwork like humor often represents a philosophy. I think this first quote on that blog is a very good meme, also for foundation boards:

Strange women lying in ponds distributing swords is no basis for a system of government. Supreme executive power derives from a mandate from the masses, not from some farcical aquatic ceremony.
— Dennis the Peasant, Monty Python and the Holy Grail

Brzezinski interview on the Afghan war

Thursday, January 14th, 2010

I’ve been watching The Real News for some time now. It claims to be “the real news” but the reality is that it’s fairly left-wing pro-unions most of the times. Most of their documentaries and interviews are very interesting, though. Nor do they make it difficult to filter out their own bias. It’s quite unique in the U.S.’s completely broken media to have something like The Real News.

This week they are interviewing Brzezinski. People who know Brzezinski, know that that’s a huge interview for them. Watch part one of the interview. Knowing The Real News, part two will probably be released in a week.

Youtube video

SPARQL subqueries

Wednesday, December 9th, 2009

This style of subqueries will also work (you can do this one without a subquery too, but it’s just an example of course):

SELECT ?name COUNT(?msg)
WHERE {
	?from a nco:Contact  ;
	          nco:hasEmailAddress ?name . {
		SELECT ?from
		WHERE {
			?msg a nmo:Email ;
			         nmo:from ?from .
		}
	}
} GROUP BY ?from  

The same query in QtTracker will look like this (I have not tested this, let me know if it’s wrong Iridian):

#include <QObject>
#include <QtTracker/Tracker>
#include <QtTracker/ontologies/nco.h>
#include <QtTracker/ontologies/nmo.h>

void someFunction () {
	RDFSelect outer;
	RDFVariable from;
	RDFVariable name = outer.newColumn<nco::Contact>("name");
	from.isOfType<nco::Contact>();
	from.property<nco::hasEmailAddress>(name);
	RDFSelect inner = outer.subQuery();
	RDFVariable in_from = inner.newColumn("from");
	RDFVariable msg;
	msg.property<nmo::from>(in_from);
	msg.isOfType<nmo::Email>();
	outer.addCountColumn("total messages", msg);
	outer.groupBy(from);
	LiveNodes from_and_count = ::tracker()->modelQuery(outer);
}

What you find in this branch already supports it. You can find early support for subqueries in QtTracker in this branch.

To quickly put some stuff about Emails into your RDF store, read this page (copypaste the turtle examples in a file and use the tracker-import tool). You can also enable our Evolution Tracker plugin, of course.

ps. Yes, somebody should while building a GLib/GObject based client library for Tracker copy ideas from QtTracker.

Bla bla bla, subqueries in SPARQL, bla bla

Tuesday, December 8th, 2009

Coming to you in a few days is what Jürg has been working on for last week.

Yeah, you guess it right by looking at the query below: subqueries!

This example shows you the amount of E-mails each contact has ever sent to you:

SELECT ?address
    (SELECT COUNT(?msg) AS ?msgcnt WHERE { ?msg nmo:from ?from })
WHERE {
    ?from a nco:Contact ;
          nco:hasEmailAddress ?address .
}

The usual warnings apply here: I’m way early with this announcement. It’s somewhat implemented but insanely experimental. The SPARQL spec has something for this in a draft wiki page. Due to lack of error reporting and detection it’s easy to make stuff crash or to get it to generate wrong native SQL queries.

But then again, you guys are developers. You like that!

Why are we doing this? Ah, some team at an undisclosed company was worried about performance and D-Bus overhead: They had to do a lot of small queries after doing a parent query. You know, a bunch of aggregate functions for counts, showing the last message of somebody, stuff like that.

I should probably not mention this feature yet. It’s too experimental. But so exciting!

Anyway, here’s the messy branch and here’s the reviewed stuff for bringing this feature into master.

ps. I wish I could show you guys the query that we support for that team. It’s awesome. I’ll ask around.

Tracker’s write back support now in master

Thursday, November 26th, 2009

Whoohoo!

We just committed the support for write back in master.

What is it?

Tracker has a limited capability to write metadata back into the data resource. In case of a file that means writing it back into the file. For example writing some of the metadata the user sets using a SPARQL Update back into an MP3 file as ID3 tags.

Which ones do we support already?

Right now the write back capability is under development and only supports a bunch of fields for a few XMP formats (JPEG, PNG and TIFF) and the Title of MP3 files. In near future we will start supporting increasingly more fields.

Documentation?

For people who want to write support for their properties and file formats, read the documentation.

Party like it’s 2009!

Hannah Arendt

Tuesday, November 24th, 2009

Looks like I found myself a book that I need to read someday:


But it could be that we, who are earth-bound creatures and have begun to act as though we were dwellers of the universe, will forever be unable to understand, that is, to think and speak about the things which nevertheless we are able to do. In this case, it would be as though our brain, which constitutes the physical, material condition of our thoughts, were unable to follow what we do, so that from now on we would indeed need artificial machines to do our thinking and speaking.

Hannah Arendt, The human condition (prologue)