Identifying Epochs in Text Archives

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review


This paper develops an automated approach to the ’distant reading’ of textual archives in order to classify epochs in the use of language and examine their particular characteristic. It classifies epochs by applying a series of standardised dictionaries to map the semantics of government documents, using the changing frequency of terms in these dictionaries to identify moments of rupture in language. It then tests a variety of techniques to chart the relationship between the changing shape of individual linguistic elements and aggregate patterns, particularly topic models and word2vec word embeddings. The result are a set of largely automated tools for understanding the structure of digital textual archives.
Original languageEnglish
Title of host publicationIEEE International Conference on Big Data Proceedings
Publication statusAccepted/In press - 2018


  • Computational Archives


Dive into the research topics of 'Identifying Epochs in Text Archives'. Together they form a unique fingerprint.

Cite this