Jump to content

oK computer wizzes nows the time to show us


Recommended Posts

With GOCR I'm not sure if there's some sort of a wrapper existing that lets you "batch OCR" multiple images (example all tiffs in folder my_book: /images/my_book/*.TIF) and stream them into ONE text file. On my platform I've been using tesseract in combination with ocube that let's you do exactly that. There should be a tesseract binary for windows but I don't know if there's a wrapper equivalent to ocube. Maybe try a web search for windows batch ocr or something like that.

This is a little out of my comfort level. But I do have a question about GOCR. Does it handle tables at all? Or allow part of a page to be saved as an image (and then embedded into the resulting document)? Those are things I absolutely need before I start monkeying around with learning a new OCR software. If not, I'll just stick with some of the commercial versions. Thanks.

Eric

Link to comment
Share on other sites

well, it looks like I may have to make a go of it myself - Kinkos would end up in the $700-$800 range and I can't do that -

I really do appreciate all the advice - BUT -

can we distill this to a single program, that works with Windows XP, text only, one sheet at a time - (I don't mind if it takes a few weeks to do this) - that will convert to an editable word format? And one that I can buy easily?

Edited by AllenLowe
Link to comment
Share on other sites

well, it looks like I may have to make a go of it myself - Kinkos would end up in the $700-$800 range and I can't do that -

I really do appreciate all the advice - BUT -

can we distill this to a single program, that works with Windows XP, text only, one sheet at a time - (I don't mind if it takes a few weeks to do this) - that will convert to an editable word format? And one that I can buy easily?

I don't think there is a single program, but like I said, you can output PDF files from a scanner such as mine (but a flatbed like mine would take a long time) then use the software I mentioned to convert PDF to Word. The only time consuming thing there would be you'd open each PDF file separately to convert it, creating 350 word docs. Then you'd have to copy and paste 349 times to bring it together into one Word Doc.

Link to comment
Share on other sites

If you simply commit to typing in 10 pages a day, you'd be done in 35 days, a little over a month.

5 pages a day = 70 days.

5 pages a day wouldn't be all that much typing, really. 30 minutes max per day, if that much?

Unless you're a two-finger typist.

Link to comment
Share on other sites

Well, AbbyyFinereader Express (which I will probably order) cost about $50 and runs on XP.

A bunch of these OCR software packages give you a 15 day free trial, but you can only OCR one page at a time -- which sounds exactly what you are trying to do, so you might go that route.

I still don't understand the GOCR well enough to try it, but there is surely a way to get that working on your system as well, and it should be fine for text only -- and it is free.

Main bit of advice is to save often (not as big a deal when doing one page at a time).

Link to comment
Share on other sites

When I was preparing to update and expand my Bessie Smith biography (which I originally wrote on a typewriter), I used OmniPage (a Mac version) to scan in all the pages from an original edition. It really didn't take long and it was pretty accurate (the software should be even better now, 7 years later). You have to go over it, page by page, to catch the inevitable mis-reads, but that is to your advantage, because you will always find something that cries out for change.

Were I to do the same thing today, I would probably read the book into my Mac. I find that to be even more accurate these days (I use MacSpeech Dictate). Much of what I post on my blog is dictated.

Link to comment
Share on other sites

When I was preparing to update and expand my Bessie Smith biography (which I originally wrote on a typewriter), I used OmniPage (a Mac version) to scan in all the pages from an original edition. It really didn't take long and it was pretty accurate (the software should be even better now, 7 years later). You have to go over it, page by page, to catch the inevitable mis-reads, but that is to your advantage, because you will always find something that cries out for change.

Were I to do the same thing today, I would probably read the book into my Mac. I find that to be even more accurate these days (I use MacSpeech Dictate). Much of what I post on my blog is dictated.

Yeah, I used to really like OmniPage, but Caere was bought out and the buzz is that the newest versions are buggy and when you need to download a driver or something you are out of luck. Oh, and they charge you $20 to speak to anyone in customer service when this happens.

Link to comment
Share on other sites

Here's another *free* possibility: Google scans and OCR's all PDF's it crawls. Allen, you would have to scan the book into PDF format and load it up to a web page for Google to crawl. Google will crawl the document and output to HTML for web viewing. At that point, you can open the HTML file in Word and save it as a Word file.

OR - Send me the unbound book. I'll scan them to PDF on the whiz-bang-copier-imager-printer-scanner and just convert the file to Word for you since I sent you on the $800 Kinko's OCR safari. PM me if interested.

Edited by spinlps
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...