Creating an annotated corpus for extracting canonical citations from classics-related texts by using active annotation

Matteo Romanello*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapter

2 Citations (Scopus)

Abstract

This paper describes the creation of an annotated corpus supporting the task of extracting information-particularly canonical citations, that are references to the ancient sources-from Classics-related texts. The corpus is multilingual and contains approximately 30,000 tokens of POS-tagged, cleanly transcribed text drawn from the L'Année Philologique. In the corpus the named entities that are needed to capture such citations were annotated by using an annotation scheme devised specifically for this task. The contribution of the paper is two-fold: firstly, it describes how the corpus was created using Active Annotation, an approach which combines automatic and manual annotation to optimize the human resources required to create any corpus. Secondly, the performances of an NER classifier, based on Conditional Random Fields, are evaluated using the created corpus as training and test set: the results obtained by using three different feature sets are compared and discussed.

Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing
Subtitle of host publication14th International Conference, CICLing 2013, Samos, Greece, March 24-30, 2013, Proceedings, Part I
EditorsAlexander Gelbukh
PublisherSpringer Berlin Heidelberg
Pages60-76
Number of pages17
ISBN (Electronic)9783642372476
ISBN (Print)9783642372469
DOIs
Publication statusPublished - 2013
Event14th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2013 - Samos, Greece
Duration: 24 Mar 201330 Mar 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume7816 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Conference

Conference14th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2013
Country/TerritoryGreece
CitySamos
Period24/03/201330/03/2013

Fingerprint

Dive into the research topics of 'Creating an annotated corpus for extracting canonical citations from classics-related texts by using active annotation'. Together they form a unique fingerprint.

Cite this