Abstract
Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks.
Original language | English |
---|---|
Pages | 1-6 |
Number of pages | 6 |
DOIs | |
Publication status | Published - 9 Jun 2024 |
Event | 8th Workshop on Data Management for End-to-End Machine Learning, DEEM 2024 - Santiago, Chile Duration: 9 Jun 2024 → … |
Conference
Conference | 8th Workshop on Data Management for End-to-End Machine Learning, DEEM 2024 |
---|---|
Country/Territory | Chile |
City | Santiago |
Period | 9/06/2024 → … |
Keywords
- discoverability
- ML datasets
- reproducibility
- responsible AI