Projects per year
Abstract
Large-scale digitization projects dealing with text-based historical material face challenges that are not well catered for by commercial software. This article discusses the results of a project to build a scalable OCR workflow for historical collections based on open source tools that is particularly tailored towards use in small-scale historical archives. It argues that open source tools allow for better customization to match these requirements, particularly with regard to character model training and per-project language modelling. We offer insights into our accuracy evaluation results of various open source OCR tools, as well as a case study about the challenges and opportunities of open source OCR in historical archives.
Original language | English |
---|---|
Pages (from-to) | 76 - 86 |
Number of pages | 11 |
Journal | JOURNAL OF INFORMATION SCIENCE |
Volume | 38 |
Issue number | 1 |
DOIs | |
Publication status | Published - Feb 2012 |
Fingerprint
Dive into the research topics of 'Ocropodium: open source OCR for small-scale historical archives'. Together they form a unique fingerprint.Projects
- 1 Finished