TY - CHAP
T1 - Creating an annotated corpus for extracting canonical citations from classics-related texts by using active annotation
AU - Romanello, Matteo
PY - 2013
Y1 - 2013
N2 - This paper describes the creation of an annotated corpus supporting the task of extracting information-particularly canonical citations, that are references to the ancient sources-from Classics-related texts. The corpus is multilingual and contains approximately 30,000 tokens of POS-tagged, cleanly transcribed text drawn from the L'Année Philologique. In the corpus the named entities that are needed to capture such citations were annotated by using an annotation scheme devised specifically for this task. The contribution of the paper is two-fold: firstly, it describes how the corpus was created using Active Annotation, an approach which combines automatic and manual annotation to optimize the human resources required to create any corpus. Secondly, the performances of an NER classifier, based on Conditional Random Fields, are evaluated using the created corpus as training and test set: the results obtained by using three different feature sets are compared and discussed.
AB - This paper describes the creation of an annotated corpus supporting the task of extracting information-particularly canonical citations, that are references to the ancient sources-from Classics-related texts. The corpus is multilingual and contains approximately 30,000 tokens of POS-tagged, cleanly transcribed text drawn from the L'Année Philologique. In the corpus the named entities that are needed to capture such citations were annotated by using an annotation scheme devised specifically for this task. The contribution of the paper is two-fold: firstly, it describes how the corpus was created using Active Annotation, an approach which combines automatic and manual annotation to optimize the human resources required to create any corpus. Secondly, the performances of an NER classifier, based on Conditional Random Fields, are evaluated using the created corpus as training and test set: the results obtained by using three different feature sets are compared and discussed.
UR - http://www.scopus.com/inward/record.url?scp=84875490838&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-37247-6_6
DO - 10.1007/978-3-642-37247-6_6
M3 - Chapter
AN - SCOPUS:84875490838
SN - 9783642372469
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 60
EP - 76
BT - Computational Linguistics and Intelligent Text Processing
A2 - Gelbukh, Alexander
PB - Springer Berlin Heidelberg
T2 - 14th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2013
Y2 - 24 March 2013 through 30 March 2013
ER -