Finding connections between fundamentally different conceptions of “models”: Explorations of highly structured data in the context of Large Language Models

Research output: Contribution to conference typesPosterpeer-review

Abstract

One of the major projects in the DH has been to find meanings in text through digital means—text analysis—and the recent emergence of Large Language Models has presented a radically new perspective on this. In contrast, almost all my professional DH life at KCL (in 25 substantial funded DH projects) has centered not on text analysis but representation through highly structured data (usually in terms of the relational database). Also, my Pliny project has explored how aspects of the more informal process of humanities research which is, at least in good part, about finding new interpretations could be usefully supported digitally. Both Pliny and most of the highly structured projects have text "nestled" within them. Both highly structured data and LLMs create models expressing some the semantics of their material, but LLM's models are very different kind of thing from models represented by graph-oriented highly structured data projects (and in Pliny). Is there any point of connection? To explore this issue we have built mechanisms to extract the textual bits from Record of Early English Drama's (U of Toronto) EMLoT resource, and from the Pliny dataset of a significant Pliny user. Text analysis techniques have then been applied to uncover structure. We have begun this work with Voyant, and continued by applying basic LLM models and examined how these language Models might characterize these texts. Text operating semantically can be found in highly structured data, and particularly in Pliny. Hence, part of the semantic significance of these projects is not found only in their object structure, but also in the text in these structures, and the semantic meaning between the text and the data structure is likely to be complementary. This poster reveals some initial ways in which points of contact can be found in these data structure and text visualizations. Do they enable a fuller vision of what our materials represent than either do by themselves?
Original languageEnglish
DOIs
Publication statusPublished - 2024
EventDigital Humanities 2024: Reinvention and Responsibility - George Mason University, Arlington, United States
Duration: 5 Aug 202410 Aug 2024
https://dh2024.adho.org/

Conference

ConferenceDigital Humanities 2024
Abbreviated titleDH 2024
Country/TerritoryUnited States
CityArlington
Period5/08/202410/08/2024
Internet address

Keywords

  • Large Language Models
  • Digital Humanities
  • Structured Data

Fingerprint

Dive into the research topics of 'Finding connections between fundamentally different conceptions of “models”: Explorations of highly structured data in the context of Large Language Models'. Together they form a unique fingerprint.

Cite this