Treeview conclusions

It looks like you guys are interested in loading three million rows in a GtkTreeView. Yesterday 512 (new) visitors visited our company subversion service to checkout the demo :-p. I think +- 100 people checked it out completely (the less obvious files, like Makefile.am’s, got viewed +- 90 times). That tops codegen‘s first day. Well, you can now put comments on my blog. So if you have questions about it: go ahead.

Note that I updated some files in the repository. Now it’s using a more correct way of implementing interfaces in C. The usage of the proxy pattern, thus implementing an interface, is actually the point of the demo. Sure is the wow cool thing about it that you can load millions of rows in a view. But the point is the proxy pattern. So it should be correct, if the intention is to ‘show’ how to do it. Right? Note that the proxy technique can be used in all sorts of model view controller situations. Not just for a treeview or datagrid.

In that demo, I should also do more with the proxy classes to show that you can treat them as if they are real subjects. That’s because they fulfill the contract (the interface) of the subject (a message header, in the case of the demo). Yet these proxy instances consume only 20 bytes. By the way, my favorite programming technique is strategy (I promised somebody not to use the word design pattern anymore. He felt design pattern is a buzzword). I’m likely going to do/show something with strategy sooner or later :-p. When browsing free software code, I often see “less good” designs decisions. Performance tweaking is very good but it wont help if the application developers aren’t going to use their brains before designing the application. The era of typical VB6 development should be over. A lot ‘managers’ should stop whining about KISS and first learn what it really means. KISS, like the KISS most people think KISS is, sucks. It doesn’t scale and it’s impossible to integrate unit testing and modern programming methods with it. The other KISS is a different story. Read Head First, Design Patterns and learn all about it. There, I’ve said it.

As a consultant I guess I got tired of manager-type guys who don’t have a clue, telling me to do everything KISS. Perhaps those guys should read about Peter Principle?! Sorry for this opinionated entry. I know I shouldn’t.

Auwch, I made a big mistake:

$ svn diff -r 14
Index: src/msg-header-proxy.c
=============================================================
--- src/msg-header-proxy.c      (revision 14)
+++ src/msg-header-proxy.c      (working copy)
@@ -76,7 +76,7 @@
 MsgHeaderProxy**
 msg_header_proxy_new_alot (gint amount)
 {
-       MsgHeaderProxy **proxies = (MsgHeaderProxy **) g_new (MsgHeaderProxy, amount);
+       MsgHeaderProxy **proxies = (MsgHeaderProxy **) g_new (MsgHeaderProxy*, amount);
        gint i=0;

        for (i=0; i < amount; i++)
$

Wow .. that would have been a total waste of memory AND a huge leak! Note that doing one huge allocation in stead of one allocation of a pointer-index-table followed by many g_slice_alloc, might improve the cpu usage a little bit (less expensive malloc syscalls). So perhaps it can be made even a little bit faster than the current result. Try it, send me a diff.

Migrated to wordpress

Some people might have noticed on the blog aggregators that yesterday, I’ve switched from DotClear to WordPress. I guess the reason some blog aggregators repeat old blog entries is because they compare cache. And after a migration, for example the unique IDs are often different. So no, there’s nothing wrong with my blog :-p!

I also installed a RewriteRule in such a way that most old blog url’s will resolve to the new WordPress URL. I can’t make all of them work automatically because DotClear and WordPress use a different algorithm for forming the title-part of the URL (called the post slug). But most work. You can tell me if a specific “old” URL isn’t working. WordPress allows me to set the “post slug” manually. I can easily set it to the old slug or in such a way that my redirector stuff resolves it correctly.

Oh, and somebody tell the DotClear developers to get themselves a real anti-spam feature. Perhaps a captcha or something like that? For my case it’s to late: goodbye DotClear.

I’m probably going to regret it, but after installing bad-behaviour on both my wiki and my blog and on the blog also Akismet, I re-enabled editing the wiki and posting comments on the blog. As I weed through thousands of moderated spam messages in my old blogs database, I’ll try to recover the relevant comments and restore them in the WordPress database of my new WordPress blog.

Some code that might help you migrate DotClear to WordPress:

Three million rows in a GtkTreeView

Edit, the repository has since disappeared, you can find a Subversion Dump of it here:

Three million rows (the size per cell doesn’t matter a lot) in a treeview, and loading the treeview in four seconds. Is that doable? Sure! The treeview wil become very slow you think? Nope, it works as fast as any other (smaller) treeview. The amount of visible rows is what would slow it down. Since most screens can’t show more than 500 rows, and since showing more would be useless from a usability point of view, it’s fast.

I committed my performance tweaks to the demo repository. It includes using g_slice for allocating the real subject and replacing the GSList in the custom model with an implementation that uses a pointer position.

So I don’t have to depend on a slow linked list anymore. In stead I simply allocate a large block of proxy instances (three million proxy instances of 20 bytes each in a continuous allocation) and inject that as index in my custom treemodel.

Since those are proxy instances, they’ll each check whether their this->real property isn’t NULL when they are needed. When a row becomes visible, that instance is needed for the from, id and to properties. When it becomes invisible, it’s no longer needed (and should therefore be freed, but the unref_node thingy of gtktreeview doesn’t work perfectly — so when scrolling a little bit, around 200 instances are kept around for no reason, I’m going to try fixing that behaviour in gtktreeview soon).

Most of the time is spend in the loop that prepares the proxy instances (msg_header_proxy_new_alot). Bringing the (visible) items to the treeview doesn’t take a lot time, as the GtkTreeView is smart enough not to load everything in case fixed-row-height mode is on.

You don’t have to believe me, you can checkout the code here. Compile it (autotools) and try.

Anyway, I’m convinced GtkTreeView by itself isn’t slow. But that doesn’t mean that the way you use it can’t make it slow. I hope others will enjoy the demo as a starting point for getting their way of using the gtktreeview optimized. For most use-cases, the use of a GSList or GList is a better technique. A linked list makes it more easy to add new items to your model. Inserting and removing items would be a lot more difficult if you use the technique I used in the demo. That technique, however, is fast because you can allocate it as one large block and excercise your high-school pointer knowledge with.

Nevertheless, I swear the unref_node stuff isn’t working correctly! :-p. Or I misunderstood it’s purpose.

Nooo! It’s that GtkTreeView proxy guy again!

And I’m not finished with the GtkTreeView. No I’m not. Moehaha!

I created this full sample that shows what I meant with custom treemodels and only allocating the model items behind visible rows. It includes an autotools environment and more or less good way of creating classes in C (except that I didn’t yet use GObject for MsgHeader nor MsgHeaderProxy, feel free to send me a diff).
This is a Subversion repository. Use “svn checkout” in front of the url after installing subversion.

It uses the “unref_node” method of the GtkTreeModelIface interface which gets triggered by gtk_tree_model_unref_node to unallocate the real subjects that aren’t visible in the view.

I noticed that it (it is the GtkTreeView stuff) sometimes “misses” rows that become invisible (if you scroll very fast). So I fear this unref_node method will need fixes and/or if you use it, you’ll need to also create a background thread/procedure that checks for leftovers. This sample shows how you can walk the entire treemodel and do things with only the unvisible ones. I know it’s ugly. IMHO the full demo isn’t, but regretfully doesn’t the unref_node stuff work perfectly.

Feel free to request SVN accounts and/or send diffs if you want to experiment.

Good morning and by the way

Good morning (for people in Europe, of course) .. and by the way. About that GtkTreeView sample of last night:

Say you wanted to use a string value of the subject as a row-value in a column of such a treeview? I didn’t show that yesterday. You can do that by converting it to a GValue and setting that as the “text” property of the cell. You do it like this:

static void
msg_header_treeview_get_model_item (GtkTreeViewColumn *tree_c,
                                   GtkCellRenderer *cell,
				   GtkTreeModel *tree_m,
                                   GtkTreeIter *iter, gpointer data)
{
	GValue val = {0,};
	IMsgHeader *header;

	gtk_tree_model_get (tree_m, iter, COLUMN_HEADER,
			&header, -1);
	g_value_init (&val, G_TYPE_STRING);
	g_value_set_string (&val,
		imsg_header_get_from (header));
	g_object_set_property (G_OBJECT (cell), "text", &val);
	g_value_unset (&val);
}

  gtk_tree_view_column_set_cell_data_func (column, renderer,
	msg_header_treeview_get_model_item, NULL, ...);

What will happen? The proxy classes will be instantiated. Yes, all of them. If you don’t want that to happen, you will need to create a custom GtkTreeModel implementation or use ETable, which is available in gal. Evolution also uses the ETable widget and it’s models for displaying the headers of your big INBOX. However, the proxy classes are rather small. I didn’t do it here, but if you want to avoid memory segmentation, there’s tools in glib (memory pools, etc) for allocating lots of such instances. Or simply use the amount as second argument of the g_new function and increase your pointer each iteration (this causes a large block of memory, this is likely going to be less or not segmented. Doing 10.000 times g_new(MsgHeaderPrxy,1) will cause memory segmentation and you don’t want that).

In the sample case only a g_strdup is happening in the real subject instantiation. In reality it’s often far worse. Don’t change your model items: View and Model should be decoupled.

In stead, you simply create proxy classes for it. The first time those are needed (when their rows become visible in the treeview, for example. As one of it’s values are now used to draw the treeview rows themselves), they will instantiate the real subject and use that to deliver the requested property. Note that this “first time they become visible”-behaviour is only valid for GtkTreeView when the fixed-height of the treeview widget and the fixed-width of the column properties are set. Else a background procedure will fetch all to calculate the scrollbar (ask kris and jrb on IRC for more details about this).

You are, of course, responsible for cleaning them up (also that, I didn’t show). You could, for example, in GObject overload the destroy and check for this->real not being NULL, and free it if that is the case. There’s also other methods (a factory that caches the real subject instances and always gives the same instance in case you have the same id multiple times in your list model: in this case, freeing up the real subject instances might get more complicated). This depends on your application design, of course.

I’m now searching for a technique to auto-free the real subjects behind the rows that aren’t visible. If you know: tell me :-p. Note that when using something like this, you want to use a memory pool for the real subjects (else: possible memory segmentation).

Using GtkTreeView: Proxy classes and lazy instantiation

The technique of using proxy classes as items in a list model applied to GtkTreeView gives you for instance the possibility of displaying list of many email subjects whilst mostly instantiating only inexpensive stub objects to represent them, rather than real header objects; something of immense value for memory-constrained scenarios of embedded devices and similar.
You can find its explained here. A sample e-mail application that could use it is osso-email. Follow the link for more information.

Edit: unbuzzworded :-p

“Free software” subjects for 2006

Part 1

These are the/my “free software” subjects that I have in mind for this year. I guess that in 2007 I will look back at this blog-entry of mine and laugh with the fact that none of my plans got achieved (I do have a professional career and girlfriend, you know). Oh well.

  • Redesign and recreate osso-email. Let it use a very strict model view controller paradigm and develop a custom MsgHeader list model. Let the view become an observer of that model and let that view request only the “visible” e-mail headers from the model. The model will be smart enough not to load all the MsgHeader instances from slow disk cache into a faster memory cache. Perhaps also reuse the E-mail header treeview of evolution and talk with Harish about decoupling such parts from Evolution. A lot like what I did with EMsgComposer last year;
  • Replace the GtkHtml with Gecko in Evolution-ui. I adapted the EMsgComposer source code for this purpose last year (moved the struct to the c file so that it becomes really private outside of the implementation file);
  • Create a libmainloop. Patch glib to use libmainloop. Patch qt to use libmainloop. This depends on what the decisions on “shared mainloops” will become. Perhaps also patch D-BUS and stuff like Twisted. This is still in “concept/design” phase;
  • Help with dvfs/common-vfs/or FUSE integration in KIO and gnome-vfs. Or port KIO to C (yeah, I know I’m insane) or decouple it from Qt and develop a wrapper library for Glib. Perhaps also patch KIO in kde to use the shared library. All this depend on what the decisions on “shared VFS” will become. Some concepts/designs also depend on a libmainloop.
  • Work on a shareable infrastructure for desktop configuration. For now I dubbed this as deconf-desk. This depends on the libmainloop stuff (else I would need to do the same tricks D-BUS did for the mainloop integration: this is insanely stupid code duplication).
  • Work on a infrastructure for remote desktop configuration. My plans are to use XMPP (Jabber) as protocol for inter process communication between service and clients (getting clients informed about updates). There’s a JEP that proposes a standard for “offline messaging” in Jabber. I discussed this JEP with Peter Saint Andre last year for this purpose;
  • I also have some idea’s for gnome-schedule. Not sure about it yet (I know I initially wrote gnome-schedule (but Kristof and Gaute changed a lot, btw), but I disike using Python, it’s not my programming language).

Feel free to E-mail me if you want to put comments on my blog. I disabled this feature because to many bots are trying to put spam on it. I counted at least 15 such bots! Insane. And no matter what PHP code I add, the operators of the bots adapt their botcode. So I’m waiting for some more intelligent anti spam solutions. Perhaps I’ll soon migrate to WordPress. Note that I’d hate making it more difficult for (for example) blind people to use my blog. Suggestions are welcome.

  • Oh .. well: Integrate spamassassin with some blog engines like WordPress. Or has that been done already? (edit: ‘ikke’ on IRC told me it is using some plugin)
  • Since I also have troubles with spambots on a wiki of mine, integrate spamassassin with MediaWiki.

Part 2

Mono and Fedora: FINALLY guys. I’m extremely happy that this decision has finally been made.

First real gnome-schedule release

Gaute Hope, my companion and the person to whom we (Kristof Vansant, myself and Gaute) gave the maintainership of it, decided to release the first one point zero release of gnome-schedule (But I’ve seen some important fixes going in CVS and on Bugzilla already).

The gnome-schedule tool can be used to configure your crontab and at services in a user interface oriented way.

At this moment gnome-schedule has been translated to +- 60 languages and has a manual written by Rodrigo Marcos Fombellida. It’s written in Python and uses gnome-python components like GConf and Glade.

You can check it out here. For the packagers interested in packaging gnome-schedule: please inform Gaute in detail about the many aspects of package building if you have any difficulties preparing your packages (use Bugzilla, of course).

I noticed Gaute openend a a new bug which he’ll depend on all the bugs that are important for the next (one point one) release. Note that it’s been a while since I last coded stuff in the gnome-schedule sources (I mainly helped creating the very very first alpha versions and assembled it’s build environment). So for development questions you better ask Gaute.

Xen “ready to go” images for x86_64

You can find “ready to go” images for Xen 3.0 here. I created both a Debian testing and a Fedora Core 4 image.

Note that these images are x86_64 (or amd64) only!

Edge resistance in metacity and I/O scheduler

Edge resistance

Because else people are most likely going to kill poor Elijah, I created a patch that makes the new edge resistance feature of metacity optional.

Paolo asked me to do it in such a way that you can specify how much pixels to resist, rather than completely disable it. Feel free to improve folks! :p

Kernel I/O scheduler

Oh and because somebody was complaining about the kernel elevator, I started questioning how I switch the I/O scheduler for a specific block device. So I ended up reading block/scheduler.c and block/rr_rw_blk.c and found that you can set it using

echo "[scheduler]" > /sys/block/[device]/queue/scheduler

Cat the same file to get a list of available schedulers. Well, all I can say is that it’s not very well documented.

Putting Xen 3.0 in production

Remember I talked about Xen last week? Well I just started migrating my web-data to a new such Xen guest operating system. If you have difficulties watching my web site(s), please do inform me about it.

later this week is the Postfix (smtp) and the courier (imap) going to be migrated. So people who’ll be sending e-mails might (however, shouldn’t) get difficulties since it involves changing the IP address of the MX of my domains. Note that if it appears on p.g.o, p.g.o’s nameserver in in sync and the stuff worked. So I’ m using p.g.o. for testing. Hah!

So .. let’s now hope this Xen on x86_64 is more stable than the release sais it is!

Today I stepped in the shoes of a Linux administrator

I decided to install Xen 3.0 on the SuperMicro SuperServer 6014H-T with RAID 1 using Fedora Core 4 as operating system.

Sooner or later will this device be used as host for one of my many
virtual machines of which one will run this little blog.

This device contains a ICH5 and a Marvell 88SX6541-BCZ SATA controller,
two Intel Xeon(TM) CPU 3.00GHz x86_64 and some other bla bla hardware.

I partly succeeded. I haven’t yet got the Marvell SATA controller working since that one isn’t yet supported in the kernel being
used by the Xen 3.0 version (it’s using 2.6.12). This was my procedure:

Buy yourself two SATA cables of 50 cm. The current ones are way to short to reach the ICH5 controller.

Open the device and connect your harddisks to the ICH5 controller. The
SATA ports are right behind the standard PATA controllers (behind the
blue and the black IDE controllers). Look in your manual page 5-9.

Insert the Fedora Core 4 x86_64 cdrom 1 and install using the following
partitioning settings (don’t install much, it’s just your dom0):

/dev/sda1: /boot (100M)
/dev/sdb1: /boot_backup (100M)
/dev/sda2: Software RAID (2000M
/dev/sdb2: Software RAID (2000M)
/dev/sdaX, /dev/sdbX: other partitions
/dev/sdaY, /dev/sdbY: swap
Software RAID: / (using sda2, sdb2)

[root@oceanus ~]# yum update && reboot

You’ll now have the sata_mv module as it’s in the 2.6.14
kernel which is available as a Fecora Core 4 update. If you load it you can
see the Marvell 88SX6541-BCZ SATA controller being empty in dmesg.

Regretfully is the Xen 3.0 version using Linux kernel 2.6.12 which
doesn’t yet support the sata_mv driver. So we’re going to leave our
hotswap controller for what it is and hope that the next Xen release
will use Linux kernel 2.6.14 or newer.

Download Xen-3.0 (the tarball release).
The Xen 3.0 Fedora Core 4 binary release install will totally corrupt your x86_64 packages. If you forcefully attempt to install
them, after force installing the e2fsutils you’re mount and umount tools will no longer function. If you
did, you can recover using a Fedora Core 4 rescue disk.

Untar and install it using the defaults. Don’t yet do the grub update. Or do it but watch out! You do need to add an initrd and
you should remove the boot from the /boot/blabla from each line. That’s because we’ve installed /boot to sda1 as
a partition with only /boot on it.

[root@oceanus ~]# cd /root
[root@oceanus ~]# mkinitrd -v -f --with=ipv6 --with=e1000  --with=ext3 \
	--with=jbd --with=raid1 --with=sd_mod --with=scsi_mod \
	--builtin=ata_piix --builtin=sata_mv --builtin=dm_mod \
	initrd-2.6.12.6-xen.img 2.6.12.6-xen
[root@oceanus ~]# cp initrd-2.6.12.6-xen.img /boot
[root@oceanus ~]# echo >> /etc/grub.conf
[root@oceanus ~]# echo "title Xen 3.0 / XenLinux 2.6" >> /etc/grub.conf
[root@oceanus ~]# echo -en "\tkernel /xen-3.0.gz console=vga" >> /etc/grub.conf
[root@oceanus ~]# echo -en "\tmodule /vmlinuz-2.6-xen root=/dev/md0 ro console=tty0" >> /etc/grub.conf
[root@oceanus ~]# echo -en "\tmodule /initrd-2.6.12.6-xen.img" >> /etc/grub.conf
[root@oceanus ~]# reboot

[root@oceanus ~]# uname -a
Linux oceanus 2.6.12.6-xen #1 SMP Sun Dec 4 20:40:43 GMT 2005 x86_64 x86_64 x86_64 GNU/Linux
[root@oceanus ~]#

[root@oceanus ~]# mount
/dev/md0 on / type ext3 (rw)
...
[root@oceanus ~]#
[root@oceanus ~]# cat /proc/cpuinfo | grep processor
processor       : 0
processor       : 1
processor       : 2
processor       : 3
[root@oceanus ~]#

GParts and a shared mainloop. Will it happen?!

This is soooooooo damn cool. More information here, here, here, of course here and here, and finally here.

I’m most likely going to use my upcoming holiday to help the project. I hope other developers of the different free desktop environment communities will follow.

Rant and a UML class diagram for deconf.

I haven’t mentioned confuse or deconf nor the deconf specification lately.

That’s mainly because at this moment I’m focusing myself on other things. In a few weeks I’ll have a long holiday, chances are high I’ll work on a few of my items in my growing to do list.

Amongst them are further improving codegen and deconf-desk, an implementation of this desktop configuration standard which I wrote a few weeks ago.

However. Since it’s a good practise and since it might help interested people in joining the efforts of implementing it, I created a UML Class diagram of what is current and of what the idea is. If you don’t know how to interpret a class diagram, you can of course use codegen to generate code from it.

You can find it here. It obviously uses the observer/observable pattern a lot. That’s because successful current configuration systems also use it (like gconf, you can view the desktop applications as the observers and the daemon as the observable). I’m also using the (remote) proxy pattern. You’ll notice that this “design pattern bla bla” is indeed just naming for something that is most likely trivial and something you most likely call different and already know. Yet it’s interesting to discover how they are being reused by programmers time after time and for totally different projects and scopes.

Anyway, as usual. This ain’t a promise that something usable will ever exist. I never make such promises for free software projects. And this one highly depends on the cooperation of an awful lot other people. In fact it’s nearly undoable to ever make this succeed. That’s mainly because the community of people that are attempting to build a free software desktop are very bad at actually agreeing on desktop standards and shared desktop components.

To the Microsofts of this world: If you want to make sure we will never succeed in selling our desktop, make sure we will for ever keep disagreeing like we are doing now. The strength of Microsoft as a desktop software builder is that they do have decision making leadership. Our failure is that we don’t. And that we can’t agree on the most simple and basic things. I’ll keep repeating this until I die or until we solve the problem. We aren’t solving the real problems at this moment.

Note that the kernel folks do have decision making leadership. And surprise surprise: they are successful at selling it. This is indeed why I asked these additional questions to the GNOME Foundation board candidates of this year. Perhaps now they’ll address this problem? I fear not. Sure it’s not the purpose of that board. Whatever, it’s all we have a.t.m..

No, I’m by far not satisfied by the achievements of the freedesktop.org movement. It’s, by far, not enough. Agreed it’s a small step in the right direction. And no, it’s not a big step for mankind. We need so much more. It’s unbelievable.

Note that if I was intelligent enough to have the solution, I’d propose it. I’m not. So yes, indeed, this is rant. I know.

Observer/Observable in codegen, Improved Java class builder XSLT Template

Today I dramatically refactored codegen again. I removed the Hashtables from the Package and Project classes and replaced it with IList’s. I also
replaced all ArrayList references with IList’s and added a Add and a Clear method for every many-relation in the classes. So make sure you update your checkout.

I did this because I wanted to implement Observer/Observable. If you don’t know what Observer/Observable is, read about it at wikipedia. You can’t easily observe the Add method of the ArrayList (unless you
extend that class, of course). However, now it’s possible to be Observer of the Package, Project, Interface, Class, Operation,
Attribute and Parameter instances. This means that if they change, you can get notified about it (and, for example, regenerate your code).

This doesn’t yet have a use-case because the sample application uses the ISourceParser for parsing the UML class diagram source file and the IGenerator
for generating the entire project in two steps. The idea is to some day create an IGenerator that plays the role of an observer of
those instances. This would allow for regeneration of code on-the-fly when deep integrating codegen with a integration development
environment or code editing application. For example regenerating code when your diagram changes.

Perhaps someday regeneration or redrawing on-the-fly of the diagram if the code changes (this ain’t current, it’s a futuristic idea). I’m
still experimenting with my own ideas here. I got inspired about all this after reading this blog.

I’d like to point out that this is indeed in the scope of this project. I’m indeed attempting to build a code generation framework, not just a simple code generator. Of course is the only end-user use of codegen at this moment “a simple code generator that is a little bit fancy because it can already handle multiple input formats and generate multiple programming languages. I’m, however, (in future) planning to do much cooler things with the concept of code generation. Mainly integration with integrated development environments and really making Model Driven Development much more easy and pleasant. But for all those cool idea’s, you first need to get the basics right. Right?

The first contribution from somebody other than me comes from my colleague Marien Johan who greatly improved the
Java class builder XSLT Template. He basically rewrote the stylesheet and added a huge amount of comments and documentation. Check it out if you’re planning to add support for other languages.

Short term plans for codegen: Create some NUnit tests, create PHP 5 and Python XSLT Templates. Also redoing the current Observable/Observer infrastructure. A.t.m. it’s junk. But I need to get some sleep now. Feel free to contact me if you’d like to help with these or any other subjects related to codegen.

Codegen and PHP5

I just committed a very simple set of XSLT Templates that will let codegen generate PHP5 classes and interfaces. I haven’t yet done it the way Marien Johan did it for Java. I know this is a better way of creating xsl’s. I will improve this soon.

First signs of Java support in codegen

These XSLT Templates for simple Java interfaces and classes are the first clear signs that I’m really planning to some day support non .NET languages including Java, Perl, Python, C (GObject), C++ on codegen.

I’m working on it. You can help me (search for pvanhoof on the popular community IRC networks). Especially if you think that at this moment it sucks: I’m working on it. And you can help me. Okay?!

Oh, update. And I finally updated the UML class diagram.

Feature list and more OO support for codegen

I wrote a short feature and planned list about codegen.

Our Subversion repository is back up, so you can start updating your checkout. I’d like to iterate that my company isn’t the copyright owner. I might have scared people away by putting the code on that specific repository. I’m the copyright owner and yes, the project is fully LGPL licensed (and I’m not requiring copyright ownership reassignment).

I also added support for discovering which packages a class depends on. It can be used to create the “using”, “#include or “imports” many programming languages utilise (the generator always uses the package name as default namespace). If you’re target is .NET, the NETSupport.Fixer will also search for .NET standard packages and namespaces to depend your class/interface code on (so that you can
generate the project files and using clausules correctly. The default XSLT Template has some support for this as a sample).

On top of that I added support for abstract classes and abstract attributes and operations. They’ll remain abstract if the class is abstract. But a class that ain’t abstract cannot have abstract attributes nor operations upon generation (of course). Codegen now checks for that. I also added support for discovery for the need of the overrides code attribute.

If you check the default XSLT Template for generating a class, you’ll see that now there’s support for inheritance, implementing interfaces, abstract classes, abstract operations and typed attributes, private, public and protected (for the class, operations and typed attributes), typed operation parameters, namespaces and package dependencies. I’ve tested these XSLT Templates and so far haven’t succeeded in letting it generate a class or interface that didn’t compile. Unless I start using wierd and/or unexisting type names for the attributes, operations or parameters. If I’m still missing something: let me know. Also check the feature list for more information.

I’m going to start creating a XSLT Template set for PHP 5, Python classes and Java soon. If people want to help me with that, get in touch.

Technical documentation for codegen available

It looks like our company Subversion service is unavailable at this moment (update: My colleagues told me it’s most likely hanging at the RAID bios after a power failure in the building. It could be hanging there since there was one defect disk being replaced, but since it’s Sunday and the disk will arrive this Monday anyway, chances are high nobody is going to press [enter] until Monday morning: damned). But no worries, I’ve put online a new snapshot. Note that this one isn’t yet
committed in the Subversion repository (as the service is also unavailable for me). So if you
update your sources before I commit this new stuff, you’ll get less. If this unavailability happens often,
I’ll put it on a more known repository (like the ones that you can get at Novell Forge).
Feel free to make suggestions about this.

Note that I removed all the Subversion meta information to make sure nobody will
use this snapshot as a starting point (to make sure nobody will update
the snapshot code using Subversion). View this one as a sneak preview.

Again quite a lot has changed. I refactored a lot stuff. Added support for protected class members and the visibility attribute for interfaces and classes. And I fixed
stuff so that the Default XSLT Templates now successfully transform into a
complete VS.NET 2003 solution. Including correctly build classes that are filled with,
correctly build interfaces with abstract operations and attributes, correctly build project files and a correctly
build solution file. Somebody should really repeat this XSL Template work
for Eclipse Java, Python, Perl, PHP, GObject, C++ etcetera. Now that we
have a Free Software code generator framework, lets make it greater than
any existing one.

Therefore I created this technical documentation. It explains how to build your own code generator
using codegen (which ain’t hard and very few code is needed, don’t worry)
and how to prepare your XSLT Templates so that’ll generate
code in your favourite programming language and environment. This lets
you define the generated code exactly how you want it. And this lets you
choose how much of house-style code gets generated.

Note to the people that hate code generators: Codegen does not try
to define nor generate your implementations. It merely attempts to convert your UML
Class diagram into skeleton code. This basically means that it’ll generate
your classes and interfaces in the syntax of your favourite programming
language. Codegen will not read your database or whatever. At this moment it’s only using your UML Class diagram. Support for generating code using a database schema as input isn’t planned.

You have the very nice NHibernate for that.
Yet technically spoken isn’t NHibernate a code generator. It’s rather
a very useful framework for making it more easy to implement your data access
layer. Integration with NHibernate is a possible feature for a codegen user to
create. I can imagine one could mark certain UML objects as an implementation
of a IPersistable interface, and tell codegen to generate code for each
class that implements that interface to try generate calls to the NHibernate
framework. There’s not yet such a generator for codegen implemented. The main focus
of codegen isn’t to generate this type of implementation code. Often it’s
the task of the programmer to do this. But it could be a nice add-on, yes.

Documentation about codegen and support for XSLT Templates

I wrote a document that describes codegen in detail (yet it’s not a technical description, for a technical description there’s this simplified DIA UML Class diagram). And I created a snapshot of what is current for people that dislike using Subversion. There’s also a binary available of the sample console tool.

Note to readers: I hate managing projects with tools like sourceforge or Novell Forge. I hate creating versions, I just want to code. Stuff like that. If you’d like to help me with this: contact me. For now, don’t expect me to start throwing versions at you guys. I’m not a manager, just a simple programmer. Versions will sooner or later happen. Sure. Whatever.

Codegen now also supports XSLT Templates as secondary input. Using the default XSLT Template set it now successfully builds to a VS.NET solution with all classes and interfaces in separate files (and correctly converted by the templates to usable stub code). I’d be happy to create a repository with contributed Template sets. For example one that in stead generates MonoDevelop project files. Or nant build files. Or XSLT Templates for generating other languages like Java, GObject, Python or Perl. Just contribute them.

It also supports writing the (by the XSL Transformer) internally used intermediate XML documents to the filesystem. This makes it possible to apply your own XSL Transformations on the resulting XML files.

Other ideas like support for reverse engineering code written using programming languages like Java, C# and VB.NET is on the to do list of the project. I’ll (or you’d) need to create a parser for the language first. I’m most likely going to take a look at the ones shipped with Mono for this. After that I’m (or you’d) need to create a XSLT Template that converts the intermediate internal XML document to, for example, XMI. Which is a format readable by Rational Rose. Or create one that converts it to DIA. The creation of such XSLT Templates would already add support for convertion of XMI to DIA and/or DIA to XML (depending on which XSLT Template you made). You could also create a concrete IGenerator that does this (if XSLT isn’t suitable for this, but I as far as I can tell … it’s suitable).

I’m not an XSL geek, so I’m silently hoping for the worlds greatest XSLT dudes offering me help with this. You can find sample XSLT Templates and the XML format to convert here. Information about both the XMI and the DIA format is available online.