A technique that we started using in Tracker is utilizing the mainloop to do asynchronous functions. We decided that avoiding threads is often not a bad idea.
Instead of instantly falling back to throwing work to a worker thread we try to encapsulate the work into a GSource’s callback, then we let the callback happen until all of the work is done.
An example
You probably know sqlite3’s backup API? If not, it’s fairly simple: you do sqlite3_backup_init, followed by a bunch of sqlite3_backup_step calls, finalizing with sqlite3_backup_finish. How does that work if we don’t want to block the mainloop?
I removed all error handling for keeping the code snippet short. If you want that you can take a look at the original code.
static gboolean backup_file_step (gpointer user_data) { BackupInfo *info = user_data; int i; for (i = 0; i < 100; i++) { if ((info->result = sqlite_backup_step(info->backup_db, 5)) != SQLITE_OK) return FALSE; } return TRUE; } static void backup_file_finished (gpointer user_data) { BackupInfo *info = user_data; GError *error = NULL; if (info->result != SQLITE_DONE) { g_set_error (&error, _DB_BACKUP_ERROR, DB_BACKUP_ERROR_UNKNOWN, "%s", sqlite3_errmsg ( info->backup_db)); } if (info->finished) info->finished (error, info->user_data); if (info->destroy) info->destroy (info->user_data); g_clear_error (&error); sqlite3_backup_finish (info->backup); sqlite3_close (info->db); sqlite3_close (info->backup_db); g_free (info); } void my_function_make_backup (const gchar *dbf, OnBackupFinished finished, gpointer user_data, GDestroyNotify destroy) { BackupInfo *info = g_new0(BackupInfo, 1); info->user_data = user_data; info->destroy = destroy; info->finished = finished; info->db = db; sqlite3_open_v2 (dbf, &info->db, SQLITE_OPEN_READONLY, NULL); sqlite3_open ("/tmp/backup.db", &info>backup_db); info->backup = sqlite3_backup_init (info->backup_db, "main", info->db, "main"); g_idle_add_full (G_PRIORITY_DEFAULT, backup_file_step, info, backup_file_finished); }
Note that I’m not suggesting to throw away all your threads and GThreadPool uses now.
Note that just like with threads you have to be careful about shared data: this way you’ll allow that other events on the mainloop will interleave your backup procedure. This is async(ish), it’s precisely what you want, of course.
This is the type of stuff I wrote libiris for. It has a work-stealing scheduler that is much faster than GThreadPool which helps when you try to start scaling the “push small work items” model.
You can also manage shared state easily using “IrisTask” which is like a python-twisted deferred. Or if you like message passing, use message passing and the coordination arbiter (which is like a reader-writer lock but asynchronous and can span many threads efficiently).
Personnally, it’s a difficult question: what should we use nowadays, multi-threading or async programming?
Nowadays, “all” computer has multi-core, so multi-threading is really usefull.
Guyou: maybe this question can be (over-)simplified to “performance or correctness” :-)
Threads can give better performance nowadays (if the task is CPU-bound), but IMO it’s really difficult to get them right. Personally I prefer to add async tasks to a mainloop because it’s much easier to avoid or find bugs (the behavior is more predictable than with multiple threads).
If you want to create a heavy number crunching application, obviously you’ll go for multiple threads when assuming a multi-core CPU.
But a supporting daemon like tracker, that should ideally stay out of the users way in its resource-hungriness, should IMHO try to avoid context switches (even lightweight inter-thread switches). That way, the kernel can decide to schedule on the same or a different CPU, depending how much else is going on in the system. So as long as you can keep the internal response time low, AIO is the way to go. (Note that would ideally mean sqlite generating query results asynchronously to tracker process whenever disk reads finish – not sure if this happens.)
Having said that, I can understand from an architecture standpoint the moving of the crawlers and indexers out-of-process. This increases separation and robustness to errors. However, keep in the back of your mind that storage of the extracted tuples can force additional context switches (unless you have multiple idle cores).
In the future, if tracker has established itself as the central tuple store and more and more applications become dependent on getting a steady stream of RDF tuples to avoid blocking on their work, it might make sense to re-think this decision and go heavily multi-threaded. Or if somebody wants to use tracker from within many web server threads, and the internal response time does not scale down well.
Let’s see. For now, I’m pretty confident that tracker is on the right path. :-)
@pixelpapst: thanks. And of course, tracker’s store is, as it name says, heavily I/O and not as much CPU bound. For example the sqlite3-backup API is going to yield mostly I/O, and almost zero CPU.
of course, being I/O bound it’s easy for Tracker to block waiting for disk operations, which can very quickly have an impact on how interactive the GUI feels when sharing the mainloop this way.
@Jan: tracker-store has no UI, it’s a desktop service. Asynchronously calling DBus is the responsibility of the client in DBus.