Abstract
Open and practical exchange, dissemination, and reuse of specimens and data have become a fundamental requirement for life sciences research. The quality of the data obtained and thus the findings and knowledge derived is thus significantly influenced by the quality of the samples, the experimental methods, and the data analysis. Therefore, a comprehensive and precise documentation of the pre-analytical conditions, the analytical procedures, and the data processing are essential to be able to assess the validity of the research results. With the increasing importance of the exchange, reuse, and sharing of data and samples, procedures are required that enable cross-organizational documentation, traceability, and non-repudiation. At present, this information on the provenance of samples and data is mostly either sparse, incomplete, or incoherent. Since there is no uniform framework, this information is usually only provided within the organization and not interoperably. At the same time, the collection and sharing of biological and environmental specimens increasingly require definition and documentation of benefit sharing and compliance to regulatory requirements rather than consideration of pure scientific needs. In this publication, we present an ongoing standardization effort to provide trustworthy machine-actionable documentation of the data lineage and specimens. We would like to invite experts from the biotechnology and biomedical fields to further contribute to the standard.
Original language | English |
---|---|
Article number | e10365 |
Journal | Learning Health Systems |
Volume | 8 |
Issue number | 1 |
DOIs |
|
Publication status | Published - Jan 2024 |
Keywords
- biotechnology
- International Organization for Standardization
- provenance information
- standardization
Access to Document
Other files and links
Fingerprint
Dive into the research topics of 'Toward a common standard for data and specimen provenance in life sciences'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver
}
In: Learning Health Systems, Vol. 8, No. 1, e10365, 01.2024.
Research output: Contribution to journal › Comment/debate › peer-review
TY - JOUR
T1 - Toward a common standard for data and specimen provenance in life sciences
AU - Wittner, Rudolf
AU - Holub, Petr
AU - Mascia, Cecilia
AU - Frexia, Francesca
AU - Müller, Heimo
AU - Plass, Markus
AU - Allocca, Clare
AU - Betsou, Fay
AU - Burdett, Tony
AU - Cancio, Ibon
AU - Chapman, Adriane
AU - Chapman, Martin
AU - Courtot, Mélanie
AU - Curcin, Vasa
AU - Eder, Johann
AU - Elliot, Mark
AU - Exter, Katrina
AU - Goble, Carole
AU - Golebiewski, Martin
AU - Kisler, Bron
AU - Kremer, Andreas
AU - Leo, Simone
AU - Lin-Gibson, Sheng
AU - Marsano, Anna
AU - Mattavelli, Marco
AU - Moore, Josh
AU - Nakae, Hiroki
AU - Perseil, Isabelle
AU - Salman, Ayat
AU - Sluka, James
AU - Soiland-Reyes, Stian
AU - Strambio-De-Castillia, Caterina
AU - Sussman, Michael
AU - Swedlow, Jason R.
AU - Zatloukal, Kurt
AU - Geiger, Jörg
N1 - Funding Information: This work has been co-funded by EOSC-Life supported by EU Horizon 2020, grant agreement no. 824087; EJP-RD supported by EU Horizon 2020, grant agreement no. 825575; BioExcel-2 supported by EU Horizon 2020, grant agreement no. 823830; the PAM and the XDATA Projects, funded by the Sardinian Regional Authority. VC and MCh are supported by the National Institute for Health Research (NIHR) Biomedical Research Centre based at Guy's and St Thomas’ National Health Service (NHS) Foundation Trust and King's College London (RJ112/N027) and by NIHR Application Research Collaboration South London (ARC SL). TB, MCo acknowledges funding from EMBL-EBI Core Funds and the FAIRplus project (H2020 No 802750). MCo was supported by Wellcome Trust GA4GH award number 201535/Z/16/Z and the CINECA project (H2020 No 825775). AC was supported by EPSRC (EP/S028366/1). JS was supported by the US National Institute of Health (U24 EB028887, R01 GM122424, and OT2OD026671), the US National Science Foundation (NSF 2054061), and the US EPA (RD840027). ME was supported by the Alan Turing Institute (ProvAnon). KZ was supported by the Bundesministerium für Bildung, Wissenschaft und Forschung (Federal Ministry of Education, Science and Research of Austria) (BMBWF-10.470/0010-V/3c/2018). CS was supported by NIH grant #U01CA200059 and by grant #2019-198155 (5022) awarded by the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation, as part of their Imaging Scientist Program. The opinions in this paper are those of the authors and do not necessarily reflect the opinions of the funders. Representation of communities: The co-author's team represents a wide coverage of life-sciences communities. PH, RW, CM, FF, HM, MP, and JG come from human biobanking and biomolecular resources communities, BBMRI-ERIC Research Infrastructure, and are directly involved as experts in the ISO standardization process. KZ and JE come from cancer research, biobanking, and medical informatics and are long-term contributors to data quality standardization efforts. TB, MCo is a director of Ontario Institute for Cancer Research. IC and KE come from marine biology and EMBRC Research Infrastructure. CG and SSR have worked with bioinformatics, CWL, and RO-Crate. JRS and JM come from bio-imaging communities and EUBioImaging Research Infrastructure. VC and MCh come from health informatics. HN participates in the provenance standardization process as an expert from Japan, MS and JS as experts from the United States, and AK as an expert from Luxembourg. ME contributes to privacy protection and provenance aspects. FB is a biobanking expert and director of the microbiological resource center CRBIP, Institut Pasteur. AS is a biobanking expert and ESBB councilor. SL-G and CA are from NIST and convenor and secretary of ISO/TC 276/WG 3 “Analytical Methods.” AM belongs to the tissue engineering and biomedical research community. MM is a standard expert in the digital media, genomic sequencing, and annotation data fields, and convenor of ISO/IEC SC29/WG 8 “MPEG Genomic Coding.” AC contributes to the capture and handling of provenance within large organizations. CS is a Cell Biologists actively engaged in the development of quality control and reproducibility specifications and tools for light microscopy as a member of the Data Coordination and Integration Center of the NIH-funded 4D Nucleome initiative, Chair of the Quality Control and Data Management WG of BioImaging North America, and Co-Chair of the WG on Metadata (WG7) of the QUality Assessment and REProducibility for Instruments and Images in Light-Microscopy (QUAREP-LiMI) initiative. SLe is a member of the RO-Crate community and co-chair of a working group for the development of an RO-Crate profile for capturing the provenance of scientific workflow executions. Funding Information: This work has been co‐funded by EOSC‐Life supported by EU Horizon 2020, grant agreement no. 824087; EJP‐RD supported by EU Horizon 2020, grant agreement no. 825575; BioExcel‐2 supported by EU Horizon 2020, grant agreement no. 823830; the PAM and the XDATA Projects, funded by the Sardinian Regional Authority. VC and MCh are supported by the National Institute for Health Research (NIHR) Biomedical Research Centre based at Guy's and St Thomas’ National Health Service (NHS) Foundation Trust and King's College London (RJ112/N027) and by NIHR Application Research Collaboration South London (ARC SL). TB, MCo acknowledges funding from EMBL‐EBI Core Funds and the FAIRplus project (H2020 No 802750). MCo was supported by Wellcome Trust GA4GH award number 201535/Z/16/Z and the CINECA project (H2020 No 825775). AC was supported by EPSRC (EP/S028366/1). JS was supported by the US National Institute of Health (U24 EB028887, R01 GM122424, and OT2OD026671), the US National Science Foundation (NSF 2054061), and the US EPA (RD840027). ME was supported by the Alan Turing Institute (ProvAnon). KZ was supported by the Bundesministerium für Bildung, Wissenschaft und Forschung (Federal Ministry of Education, Science and Research of Austria) (BMBWF‐10.470/0010‐V/3c/2018). CS was supported by NIH grant #U01CA200059 and by grant #2019‐198155 (5022) awarded by the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation, as part of their Imaging Scientist Program. The opinions in this paper are those of the authors and do not necessarily reflect the opinions of the funders. Representation of communities: The co‐author's team represents a wide coverage of life‐sciences communities. PH, RW, CM, FF, HM, MP, and JG come from human biobanking and biomolecular resources communities, BBMRI‐ERIC Research Infrastructure, and are directly involved as experts in the ISO standardization process. KZ and JE come from cancer research, biobanking, and medical informatics and are long‐term contributors to data quality standardization efforts. TB, MCo is a director of Ontario Institute for Cancer Research. IC and KE come from marine biology and EMBRC Research Infrastructure. CG and SSR have worked with bioinformatics, CWL, and RO‐Crate. JRS and JM come from bio‐imaging communities and EUBioImaging Research Infrastructure. VC and MCh come from health informatics. HN participates in the provenance standardization process as an expert from Japan, MS and JS as experts from the United States, and AK as an expert from Luxembourg. ME contributes to privacy protection and provenance aspects. FB is a biobanking expert and director of the microbiological resource center CRBIP, Institut Pasteur. AS is a biobanking expert and ESBB councilor. SL‐G and CA are from NIST and convenor and secretary of ISO/TC 276/WG 3 “Analytical Methods.” AM belongs to the tissue engineering and biomedical research community. MM is a standard expert in the digital media, genomic sequencing, and annotation data fields, and convenor of ISO/IEC SC29/WG 8 “MPEG Genomic Coding.” AC contributes to the capture and handling of provenance within large organizations. CS is a Cell Biologists actively engaged in the development of quality control and reproducibility specifications and tools for light microscopy as a member of the Data Coordination and Integration Center of the NIH‐funded 4D Nucleome initiative, Chair of the Quality Control and Data Management WG of BioImaging North America, and Co‐Chair of the WG on Metadata (WG7) of the QUality Assessment and REProducibility for Instruments and Images in Light‐Microscopy (QUAREP‐LiMI) initiative. SLe is a member of the RO‐Crate community and co‐chair of a working group for the development of an RO‐Crate profile for capturing the provenance of scientific workflow executions. Publisher Copyright: © 2023 The Authors. Learning Health Systems published by Wiley Periodicals LLC on behalf of University of Michigan.
PY - 2024/1
Y1 - 2024/1
N2 - Open and practical exchange, dissemination, and reuse of specimens and data have become a fundamental requirement for life sciences research. The quality of the data obtained and thus the findings and knowledge derived is thus significantly influenced by the quality of the samples, the experimental methods, and the data analysis. Therefore, a comprehensive and precise documentation of the pre-analytical conditions, the analytical procedures, and the data processing are essential to be able to assess the validity of the research results. With the increasing importance of the exchange, reuse, and sharing of data and samples, procedures are required that enable cross-organizational documentation, traceability, and non-repudiation. At present, this information on the provenance of samples and data is mostly either sparse, incomplete, or incoherent. Since there is no uniform framework, this information is usually only provided within the organization and not interoperably. At the same time, the collection and sharing of biological and environmental specimens increasingly require definition and documentation of benefit sharing and compliance to regulatory requirements rather than consideration of pure scientific needs. In this publication, we present an ongoing standardization effort to provide trustworthy machine-actionable documentation of the data lineage and specimens. We would like to invite experts from the biotechnology and biomedical fields to further contribute to the standard.
AB - Open and practical exchange, dissemination, and reuse of specimens and data have become a fundamental requirement for life sciences research. The quality of the data obtained and thus the findings and knowledge derived is thus significantly influenced by the quality of the samples, the experimental methods, and the data analysis. Therefore, a comprehensive and precise documentation of the pre-analytical conditions, the analytical procedures, and the data processing are essential to be able to assess the validity of the research results. With the increasing importance of the exchange, reuse, and sharing of data and samples, procedures are required that enable cross-organizational documentation, traceability, and non-repudiation. At present, this information on the provenance of samples and data is mostly either sparse, incomplete, or incoherent. Since there is no uniform framework, this information is usually only provided within the organization and not interoperably. At the same time, the collection and sharing of biological and environmental specimens increasingly require definition and documentation of benefit sharing and compliance to regulatory requirements rather than consideration of pure scientific needs. In this publication, we present an ongoing standardization effort to provide trustworthy machine-actionable documentation of the data lineage and specimens. We would like to invite experts from the biotechnology and biomedical fields to further contribute to the standard.
KW - biotechnology
KW - International Organization for Standardization
KW - provenance information
KW - standardization
UR - http://www.scopus.com/inward/record.url?scp=85153398558&partnerID=8YFLogxK
U2 - 10.1002/lrh2.10365
DO - 10.1002/lrh2.10365
M3 - Comment/debate
AN - SCOPUS:85153398558
SN - 2379-6146
VL - 8
JO - Learning Health Systems
JF - Learning Health Systems
IS - 1
M1 - e10365
ER -