There are two reviews here, done 2 years apart, for Presto! OCR Pro 4.0 (April 2002) and Abbyy FineReader Professional 7.0 (October-December 2004)
Presto! OCR Pro 4.0
"OCR with Human-Eye Precision." What a huge load of crap!!
It is considerably better than the old version of Textbridge Pro that I was using previously, but it is still far from satisfactory. My major project has been scanning an 1890 book, a gazetteer, mediocrely printed. Presto does a fair job recognizing the average text, particularly with numbers. I'm impressed that it does a good job with words broken by a line-break, and at figuring out plurals and posessives.
But: it very frequently reads "the" as "tlie", but won't learn that, and won't even offer "the" as a possible correction; often reads "of" as "o£", and won't ask for confirmation on the "f", yet asks for confirmation of many other letters; inconsistently recognizes the "$" symbol; demands a language identification for some questionable words (even numbers) instead of defaulting (to English); only offers the options of specifying text as all-normal or all-potentially-variable, but I only want the options of normal vs italic (due to the quality of printing); my text has frequent dipthongs (e.g. the combination letters "æ" and "œ"), and Presto does not recognize them or allow for their insertion from an internal set of odd characters. In addition, my text has the first word of its thousands of paragraphs in type 2 columns high; Presto doesn't automatically handle that, and requires manual block layout to avoid jibberish.
Presto does a terrible job with tables; the instructions are unclear, and the results are not suitable for transfer to Excel or Word tables. Also, the numbers in the tables are in small type, and Presto does a poor job recognizing them.
Finally, I'm scanning a book that has thousands of new words (names, towns, rivers, villages, mountains, etc.); Presto (and the other OCRs) should use the available word processor dictionaries so the user does not have to authorize every new word twice.
Would I swear to every assertion here, that it couldn't be fixed if I managed to find the right switches? No, but this review is my experience as a beginner with the application.
I was trying to find a review site to submit this to, and came up dry, so here it is on my own site, in the faint hope someone will find it useful. [Several people have seen it and commented. Thanks, and I'm happy it was useful.]
Abbyy FineReader Professional 7.0
This review is still in progress.
It's 2 years later, and computers are much faster, and I have what counts for now as a pretty good one. Abbyy FineReader came out well in the magazine reviews, and offered a steep discount from the list price. My scanner is a Dell 940 printer-scanner-copier. Overall, the software is much better, more accurate, but still not at all satisfactory.
I don't know what the default settings are appropriate for—they sure don't work well for me, and I can't find a way to reset many of them. The settings are in multiple nested pages of check-off boxes. It is not clear what many of them do. Included Help is even more useless than usual. Often I have to use settings that take 1-2 minutes to scan and read a page reasonably well. The tutorial is only useful for the absolute basics. If you have an actual problem, the on-line tech desk may help, given a day or 3. For more complex problems, expect several exchanges with them to show you have a clue what you're doing and still have a problem, then they'll actually consider a personal answer.
Finereader draws boxes around blocks of material, guessing at its type: text, table or figure. A weird problem with it, is that it seems to go out out its way to exclude the corner page numbers—it will draw a rectangular text box, except for a dogleg to exclude the number. On some of my texts it will will draw boxes that are completely arbitrary— irregular shapes wrapping around each other. It should make rectangluar boxes by default, or have a way of specifying whether it should. It wants to assign different fonts to headers and page numbers than to body text by default, even when it isn't correct. I most often have to tell it when there is a table, and its decisions about divider spacing are rather poor, and definitely difficult to adjust. The figures I scan are mostly gray-scale—Finereader saves them by default as B/W, and the settings for changing this are obscure to nonexistent.
Presto actually did a better job recognizing plurals, possessives and proper nouns, though at the expense of asking about each unfamiliar word. Abbyy does a good job with dictionary words broken by line breaks, but when it's a new or questionable word, it offers all sorts of ridiculous suggestions for the right word, but rarely the correct one. As with its predecessors, it does a poor job with the word "the," and doesn't even offer t-h-e as an option for the hundreds of times it sees "tlie," and "tlic" and several other common misreadings. Again, the OCR does not recognize dipthongs, nor accented letters or footnoting symbols, and has no set of non-keyboard symbols to offer. It will accept symbols pasted in from another application, but I shouldn't have to keep Word or Mozilla Composer open to find dipthongs, degrees and accents. How about using the Word set? And again, it requires duplication of dictionaries.
I'm still working with historical documents, often reprints, with many old words and spellings and names, with pages that vary in ink density, yet are easy enough to read by eye. Some of the texts I'm working with use the "long s" character ſ, that looks like an f. After several emails to Abbyy tech support, they finally admitted that they can't handle long s. They claim to handle hundreds of really obscure texts and languages, yet can't handle the most common English type face of the 1700s. And apparently Finereader can't make a guess about words with more than 3 wrong letters (this is an annoyance when it won't read the long s correctly in articles about "Maſſachuſetts" and "Wiſcaſſet." (I see that the preceding quoted words don't come out right in some browsers: The long s characters in Massachusetts and Wiscasset are replaced with question marks.)
A very major annoyance is that it will do a spell-check, but forgets all of it when you close out that window, even when you're still in the document. That is, it will remember that the words exist, but not how to handle them when they occur in the open document (or any other document.) And even within the open spell-check, it often does not remember, even when "always replace" is keyed. I'm working mostly with documents that are 10-50 pages long, with hundreds of things to check, so this is a big deal. The replacements only happen as you get to the questionable words on a page—it doesn't search forward into the document to replace things beforehand.
The little default window opened for spell-check is an annoyance too—it doesn't give enough context. I happen to have the luxury of dual monitors—and finally figured out that I could move and expand the window.
Another big annoyance with spell-check: it does not handle M-dashes correctly as punctuation. These are the long dashes, —, used as separators, somewhat like colons, without spaces between the dash and connected words. (They should be ignored, and the words flanking them should be checked separately.) My 19th century texts usually leave a space between a word and a following colon, semicolon, question mark and exclamation: the OCR should insert a "hard space" there as a document-wide option. Finereader does a poor job with italic words, with large fonts, and recognizing N, H and X..
Seems like spell-check handles hyphenated proper nouns incorrectly when it offers suggestions, For example, it won't recognize "New-Hampshire" correctly: the closest offering is "new-Hampshire" and it won't learn to offer it correctly. This is a common problem, not a rare one.
There are many export options, mostly unexplained. I usually want HTML, and Finereader defaults to Internet Explorer, though this is not my preferred browser. It's default export type is CSS (cascading style sheets). I wouldn't mind that, except Finereader will then give me 50 or more "styles," when there should have been 3. It will decide that one paragraph is 11.83 point type, and another is 11.86 point, and some other piece is 11.79, on and on, giving them all a different "style," when those determinations are wrong to begin with, and make editing the resulting HTML file difficult. (I'm working in Mozilla Composer. I end up just deleting all the style specifications in the HTML source file.) If I export without font specification, Finereader also eliminates all the italic, bolding and underlining, which I want to keep.
A helpful feature would be a way to automatically straighten figures. A degree or 2 from straight looks distinctly off, but it's often hard to align pictures when the book is thick, or even was printed off-kilter. This would seem to be simple, since it seems that the OCR must already do this for the text reading.
Nearly everything I scan is in English, and I've already gone into the problems of Finereader for that. But some of my texts have sections in Latin, and there are small sections of French and Greek (I don't claim to read them.) Supposedly Finereader can handle dozens of languages, but if there's a way to make it switch languages within an English text, I can't find it. The Latin alphabet isn't any extra problem, of course, but there should be separate dictionaries.
PS: a 12/2004 note from Jim, who read this page: " I'm using OmniPage Pro 14.0 for my occasional scanning needs and about the only thing it reads well is high quality text without graphics, symbols, or tables."
return to book reviews