Ocropodium: open source OCR for small-scale historical archives

Research output: Contribution to journalArticlepeer-review

13 Citations (Scopus)

Abstract

Large-scale digitization projects dealing with text-based historical material face challenges that are not well catered for by commercial software. This article discusses the results of a project to build a scalable OCR workflow for historical collections based on open source tools that is particularly tailored towards use in small-scale historical archives. It argues that open source tools allow for better customization to match these requirements, particularly with regard to character model training and per-project language modelling. We offer insights into our accuracy evaluation results of various open source OCR tools, as well as a case study about the challenges and opportunities of open source OCR in historical archives.
Original languageEnglish
Pages (from-to)76 - 86
Number of pages11
JournalJOURNAL OF INFORMATION SCIENCE
Volume38
Issue number1
DOIs
Publication statusPublished - Feb 2012

Fingerprint

Dive into the research topics of 'Ocropodium: open source OCR for small-scale historical archives'. Together they form a unique fingerprint.

Cite this