Ehancement of text in pdfs

 
By Lew, 3 Comments
Last comment 15 November 11, 02:57pm By Fiona (DigitalNZ)


I have a number of documents in pdf format that have been created from scans and ocr’s .

There are Vietnam era and were originally typed on thin paper and are not the most legible. Can your software enhance these pdf files to them more legible ??

Comments


Unfortunately improving the legibility of digital scans is most easily done at the time the scan. Depending on the type of issue, enlargement by increasing resolution settings, placing blank paper behind the page, or adjusting the image settings of the scanner or camera to increase contrast can all assist. Quality checks on each scan are also important, particularly if the original may become hard to access again.

If you have access to professional desktop editing software such as Photoshop, it is possible to import an edit a PDF page by page as JPGs to increase legibility. However the quality of the image can suffer in the conversion process and if you have many pages it can be very time consuming.

Another option we have come across is to print out all the pages, photocopy them to increase contrast and re-scan the resulting copies.

While some OCR (optical character recognition) software can cope better with bleed-through and other page markings, almost all of the scanned output requires some form of re-formatting and manual correction to produce completely accurate text. So unfortunately there are no easy solutions there. There are some professional services that will do the correction for you, but they are likely to be an expensive option.

By ,
Tuesday 31 May, 2011 06:25pm

As well as professional services some memory institutions offer the ability to crowdsource text correction from rough OCR. A good example is the Australian Newspapers project.

By ,
Thursday 02 June, 2011 10:06pm

The above mentioned OCR correction project is part of Trove: http://trove.nla.gov.au/general/about

By Fiona (DigitalNZ),
Tuesday 15 November, 2011 02:57pm
Add Comment:
 
 

About this section

Need some advice on any aspect of digitisation or digital content creation? Ask your questions in this forum... or jump in with your own answers to other peoples questions.