ChoCo: a Chord Corpus and a Data Transformation Workflow for Musical Harmony Knowledge Graphs

Jacopo de Berardinis*, Albert Merono Penuela, Andrea Poltronieri*, Valentina Presutti

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


Various disconnected chord datasets are currently available for music analysis and information retrieval, but they are often limited by either their size, non-openness, lack of timed information, and interoperability. Together with the lack of overlapping repertoire coverage, this limits cross-corpus studies on harmony over time and across genres, and hampers research in computational music analysis (chord recognition, pattern mining, computational creativity), which needs access to large datasets. We contribute to address this gap, by releasing the Chord Corpus (ChoCo), a large-scale dataset that semantically integrates harmonic data from 18 different sources using heterogeneous representations and formats (Harte, Leadsheet, Roman numerals, ABC, etc.). We rely on JAMS (JSON Annotated Music Specification), a popular data structure for annotations in Music Information Retrieval, to represent and enrich chord-related information (chord, key, mode, etc.) in a uniform way. To achieve semantic integration, we design a novel ontology for modelling music annotations and the entities they involve (artists, scores, etc.), and we build a 30M-triple knowledge graph, including 4K+ links to other datasets (MIDI-LD, LED).
Original languageEnglish
Article number641
JournalScientific Data
Issue number1
Publication statusPublished - 20 Sept 2023


  • Knowledge Engineering
  • music information retrieval
  • Knowledge graphs
  • Harmony


Dive into the research topics of 'ChoCo: a Chord Corpus and a Data Transformation Workflow for Musical Harmony Knowledge Graphs'. Together they form a unique fingerprint.

Cite this