Proposal – How easy it is to make people believe a lie, and how hard it is to undo that work again

Let’s make a GNOME OCR application on top of Ocropus. One where the user can select regions to scan for text and where those regions will translate to XHTML DIV tags that are relatively positioned right and where the user can select regions to simply copy as image. Doesn’t sound terrible hard, or does it?

After that, let’s rethink some of the printer uis and dialogs and/or integrate it with SANE a little bit: a lot of printers nowadays are so called ‘multifunctionals’: they combine a flatbed scanner with a printer and have fancy features like: scan to your computer, scan to a MMC card, scan and print (make a copy).

It’s a little bit silly that I right now have to scan to an MMC card, put that MMC in my N800 because Linux doesn’t support the MMC slot of my Laptop, wire things up with USB cables, copy the a file called SCAN0016.JPG to a folder, open it with GIMP and dissect it into regions and do other conversions to the image that might improve OCR detection, manually create an xhtml document and manually measure the positions on the original, manually put that into relative positions of the xhtml file, etc etc. I mean, these are all tasks that can easily be automated.

And now we finally have a reasonable good OCR library or framework as underlying engine for this.

I’m sure Google would love projects like this for their Summer Of Code, for example. No?

Anyway, the Google Ocropus thingy works on most normal texts. I just printed out a few documents with also some handwritten names and signatures on, scanned the prints in and did a OCR scan on the scans. The handwritten parts caused some discrepancies in the detection, but the vast majority of the text got detected right. With maybe a few a-s that turned into o-s (well, that document’s font was quite hard for those two characters indeed). I’m quite sure the library will improve.

The getting started page talks about going into a release directory, right? Well, the page isn’t very clear about it (yet): you need to get both tesseract-ocr and ocropus itself (which is explained in the “Downloads” tab of the site). That release directory is your “ocropus” Subversion checkout, it seems. Well, that worked for me. You’ll also need to install jam, libtiff4-dev, libaspell-dev. All the other stuff was already installed on my typical “gnome-devel”-prepared Edgy.

2 thoughts on “Proposal”

sybille says:

April 11, 2007 at 9:03 pm

Hi,
Have you ever used gscan2pdf ? It’s a very nice app for scanning and creating multi-page pdfs that maybe could be extended to include (some of ?) the ideas you mention. Currently OCR is implemented with gocr, which is not all that useful. But nonetheless, it’s a helpful frontend for scanimage and scanadf.

Links :
http://gscan2pdf.sourceforge.net/
http://sourceforge.net/projects/gscan2pdf/
http://ubuntuforums.org/showthread.php?t=62636

Anyway, I like your ideas very much, even though I have a scanner and a laser printer (i.e. two separate pieces of equipment as opposed to an all-in-one). For example, I’d love to have a way of including OCR data from scans in pdfs so that they could be located with an indexer like tracker or beagle.
sybille says:

April 11, 2007 at 9:06 pm

Hi,
Have you ever used gscan2pf ?

http://gscan2pdf.sourceforge.net/
http://sourceforge.net/projects/gscan2pdf/
http://ubuntuforums.org/showthread.php?t=62636

It’s a very nice frontend for scanimage and scanadf that maybe could be extended to incorporate (some of ?) the interesting ideas you’ve brought up in this post. =)

Comments are closed.

M	T	W	T	F	S	S
« Mar				May »
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30