Mid_dnz_logo_print
 

RSS icon Digital newspapers your ancestors read

 
July 02, 2009 by Lewis Brown, DigitalNZ

Quietly launching in the last week of June was a major update to Papers Past, New Zealand's largest free online digitised resource.  Papers Past, featuring newspapers from the nineteenth and early twentieth centuries, is now bigger, faster and fully text searchable thanks to Optical Character Recognition (OCR) technology.

Papers past was first launched in 2001 with quarter of a million digitised pages of New Zealand historic newspapers.  It now has five times that many pages (1.3 million) covering 52 different New Zealand publications from as far back as 1839.  In fact, Papers Past currently has more digitised pages online than Chronicling America, an equivalent project in the U.S. for historic newspapers.

Papers Past NZ Truth

NZ Truth is one of 52 publications now available on Papers Past

Old newspapers make for a really interesting resource to digitise, and it's perhaps not surprising that they have been one of the earliest digitisation efforts around the world.

Because of the poor quality of the paper they were printed on, newspapers are prime candidates for copying to make surrogates to access instead of fragile originals.  Without microfilm and now digitisation, many old newspapers would simply not be available to view at all.

Many New Zealand newspapers have been microfilmed or are being microfilmed for preservation purposes.  Digitising microfilm is a lot simpler and cheaper than dealing with the paper originals. It is only recently that directly digitising high volume large format materials like newspapers has become possible.  The National Archives and Records Administration (NARA) in the United States has recently invested in 10 large format scanners at a cost of around NZ$250,000 apiece, but it's likely to be some time before that kind of technology is widely available and affordable.  A lot of preparation work to sort, unbend and repair old newspapers is also required - this work has already been undertaken where the paper has been microfilmed.

From a copyright perspective, old newspapers are often less complicated than other resources, as a large proportion of the contributions of articles from the nineteenth and early twentieth centuries are unattributed.  In New Zealand, where authors are unknown after reasonable enquiry, the copyright term for published works is only 50 years.

Being text-based and complex in structure, newspapers lend themselves extremely well to full text searching.  OCR is the quickest way to achieve this, and while by no means perfect, it can get very good results - certainly far better than scrolling through pages of microfilm.  As a way of trying to improve on OCR results, the National Library of Australia is testing out a very cool newspaper service that allows users to easily correct and tag newspaper content in a way that has search results getting better over time.

Newspapers are likely to continue to be a highly prized target for digitisation, both public domain editions and more recent in copyright ones.  Engaging with newspaper publishers and encouraging them to open up their more recent back catalogue for digitisation and public access is a challenge that still lies ahead of us.  In the meantime, there has already been considerable interest on our Make it Digital voting tool for a variety of newspaper editions to be digitised.  We've invited the Papers Past manager to join in and post newspaper titles so you can have a say in what you think they should be digitising next.  Get voting now!

2 comments | Post a comment Leave a comment


Posted by Canterbury Heritage | 04 Jul 2009 06:12

Shame the Papers Past web site doesn't facilitate Boolean logic in search protocols.


Posted by Chelsea | 07 Jul 2009 11:29

Hi Canterbury Heritage,

Papers Past provides for Boolean searching and allows for the following Boolean operators: AND, OR, NOT, +, - . The site also allows for wildcard searching using ? and * as well as proximity operator ~. Here are links to a few examples:

AND
http://tr.im/r9FO

OR
http://tr.im/r9G3

NOT
http://tr.im/r9Gf

+
http://tr.im/r9Gm

-
http://tr.im/r9Gu

?
http://tr.im/r9GJ

*
http://tr.im/r9GU

~
http://tr.im/r9Ha

Unfortunately the richness of boolean searching can be compromised when full-text transcriptions aren't 100% accurate. But hopefully now you have an idea of how Boolean operators work within Papers Past.

-Chelsea Hughes, Digital Services Manager, NLNZ