TY - JOUR
T1 - ChoCo: a Chord Corpus and a Data Transformation Workflow for Musical Harmony Knowledge Graphs
AU - de Berardinis, Jacopo
AU - Merono Penuela, Albert
AU - Poltronieri, Andrea
AU - Presutti, Valentina
N1 - Funding Information:
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101004746. The authors acknowledge Nicolas Lazzari for his contribution to the data conversion module and the test JAMS dataset, and Elia Rizzetto for his contribution to the latter. We also thank Simon Holland and Naomi Barker for their role in the expert validation of the chord conversion module, and all the annotators that took part in this study. The authors also thank Andrew Choi for his contribution to the Band-in-a-Box parser, Mark Gotham for addressing queries on the When in Rome corpus, Mark Granroth-Wilding for having provided details of the Jazz Corpus, Luigi Aspirino and Enrico Daga for the technical support with SPARQL-Anything, and Marilena Daquino for contributing to the deployment of the SPARQL endpoint. Finally, we would like thank the anonymous reviewers for having contributed constructive feedback to improve the manuscript.
Publisher Copyright:
© 2023, Springer Nature Limited.
PY - 2023/9/20
Y1 - 2023/9/20
N2 - Various disconnected chord datasets are currently available for music analysis and information retrieval, but they are often limited by either their size, non-openness, lack of timed information, and interoperability. Together with the lack of overlapping repertoire coverage, this limits cross-corpus studies on harmony over time and across genres, and hampers research in computational music analysis (chord recognition, pattern mining, computational creativity), which needs access to large datasets. We contribute to address this gap, by releasing the Chord Corpus (ChoCo), a large-scale dataset that semantically integrates harmonic data from 18 different sources using heterogeneous representations and formats (Harte, Leadsheet, Roman numerals, ABC, etc.). We rely on JAMS (JSON Annotated Music Specification), a popular data structure for annotations in Music Information Retrieval, to represent and enrich chord-related information (chord, key, mode, etc.) in a uniform way. To achieve semantic integration, we design a novel ontology for modelling music annotations and the entities they involve (artists, scores, etc.), and we build a 30M-triple knowledge graph, including 4K+ links to other datasets (MIDI-LD, LED).
AB - Various disconnected chord datasets are currently available for music analysis and information retrieval, but they are often limited by either their size, non-openness, lack of timed information, and interoperability. Together with the lack of overlapping repertoire coverage, this limits cross-corpus studies on harmony over time and across genres, and hampers research in computational music analysis (chord recognition, pattern mining, computational creativity), which needs access to large datasets. We contribute to address this gap, by releasing the Chord Corpus (ChoCo), a large-scale dataset that semantically integrates harmonic data from 18 different sources using heterogeneous representations and formats (Harte, Leadsheet, Roman numerals, ABC, etc.). We rely on JAMS (JSON Annotated Music Specification), a popular data structure for annotations in Music Information Retrieval, to represent and enrich chord-related information (chord, key, mode, etc.) in a uniform way. To achieve semantic integration, we design a novel ontology for modelling music annotations and the entities they involve (artists, scores, etc.), and we build a 30M-triple knowledge graph, including 4K+ links to other datasets (MIDI-LD, LED).
KW - Knowledge Engineering
KW - music information retrieval
KW - Knowledge graphs
KW - Harmony
UR - http://www.scopus.com/inward/record.url?scp=85171809731&partnerID=8YFLogxK
U2 - 10.1038/s41597-023-02410-w
DO - 10.1038/s41597-023-02410-w
M3 - Article
VL - 10
JO - Scientific Data
JF - Scientific Data
IS - 1
M1 - 641
ER -