Abstract
We explore methods of automating the digital palaeographic process, using a divide and conquer approach. Firstly, image noise is reduced using a combination of colour removal, and varied blurring and thresholding techniques. Initial values for these processes are calculated by the system based on the average greyscale colour of the image upon initial importation. By combining these algorithms, the system is able to achieve high levels of noise reduction.
The process of segmenting the script into letters is also divided. First, blocks of text are detected in the noise-reduced image, by measuring the proportion of black pixels within pre- defined sized blocks of pixels, comparing these values to the average colour values of not only the entire image, but the surrounding blocks (minimising false positive rates). These blocks of text are split into individual lines through detection of whitespace, and then further segmented into individual letters, through a similar technique.
In order to verify the integrity of the letters, the sizing of each segment is compared to the letter average (since most letters within manuscripts are of a similar width). Any letters excessively differential to this average, are then re-checked, by re-performing the segmentation algorithms in these specific locations with thresholding set to both lighter and darker levels. The results of these segmentations are then merged, with each box finally being expanded to fit the letter more precisely.
Original language | English |
---|---|
Type | Automated Image Segmentation Methods for Digitally-Assisted Palaeography of Medieval Manuscripts |
Media of output | |
Number of pages | 28 |
Publication status | Published - 2013 |