King's College London

Research portal

SIMON: open-source knowledge discovery platform

Research output: Contribution to journalArticlepeer-review

A Tomic, I Tomic, L Waldron, L Geistlinger, M Kuhn, RL Spreng, LC Dahora, KE Seaton, G Tomaras, J Hill, Niharika Duggal, Ross Pollock, Norman R. Lazarus, Stephen Harridge, Janet Lord, P Khatri, AJ Pollard, MM Davis

Original languageEnglish
Article number100178
Pages (from-to)100178
JournalPatterns
Volume2
Issue number1
DOIs
Published8 Jan 2021

Bibliographical note

Funding Information: We are grateful to all the individuals who participated in the research studies. We appreciate the helpful discussions with and support from many members of the Davis, Y. Chien, and Pollard labs. The clinical study on cyclists was supported by the NIHR Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham . The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health and Social Care. This work was supported by an NIH grant ( U19 AI057229 ) and the Howard Hughes Medical Institute to M.M.D. and by the EU's Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant (FluPRINT, project 796636 ) to A.T. Funding Information: We are grateful to all the individuals who participated in the research studies. We appreciate the helpful discussions with and support from many members of the Davis, Y. Chien, and Pollard labs. The clinical study on cyclists was supported by the NIHR Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health and Social Care. This work was supported by an NIH grant (U19 AI057229) and the Howard Hughes Medical Institute to M.M.D. and by the EU's Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant (FluPRINT, project 796636) to A.T. A.T. and I.T. designed and developed SIMON, performed the analysis, processed and analyzed the data, and wrote the manuscript. L.W. and L.G. helped with the analysis of the Zeller and LIHC datasets, advised on analysis design, and revised the manuscript. M.K. helped with the integration of caret library and revised the manuscript. R.L.S. L.D. K.E.S. G.T. J.H. and A.J.P. conducted the VAST study, guided the analysis of the VAST dataset, pre-processed data for the analysis, and revised the manuscript. N.A.D. R.D.P. N.R.L. S.D.R.H. and J.M.L. performed the Cyclists study, provided the Cyclists data for the analysis, helped with the analysis, and revised the manuscript. P.K. guided the development of SIMON, supported standardization of the ML process in SIMON, and revised the manuscript. A.J.P. and M.M.D. supervised the study and revised and edited the manuscript. The authors declare no competing interests. Publisher Copyright: © 2020 The Authors Copyright: Copyright 2021 Elsevier B.V., All rights reserved.

King's Authors

Abstract

Data analysis and knowledge discovery has become more and more important in biology and medicine with the increasing complexity of biological datasets, but the necessarily sophisticated programming skills and in-depth understanding of algorithms needed pose barriers to most biologists and clinicians to perform such research. We have developed a modular open-source software, SIMON, to facilitate the application of 180+ state-of-the-art machine-learning algorithms to high-dimensional biomedical data. With an easy-to-use graphical user interface, standardized pipelines, and automated approach for machine learning and other statistical analysis methods, SIMON helps to identify optimal algorithms and provides a resource that empowers non-technical and technical researchers to identify crucial patterns in biomedical data. Over the past years, technological advances have enabled the generation of large amounts of data at multiple scales. The integration of high-dimensional data is particularly important in biomedical sciences, as they can be used to identify biological mechanisms and predict clinical outcomes well in advance of their occurrence. Because of the lack of powerful analytical tools that can be used by the average biomedical researcher, translation of such knowledge has been extremely slow. We have developed an open-source software, SIMON, to facilitate the application of machine learning to high-dimensional biomedical data. In SIMON, analysis is performed using an intuitive graphical user interface and standardized, automated machine learning approach allowing non-technical researchers to identify patterns and extract knowledge from high-dimensional data and build high-quality predictive models. Tomic et al. developed SIMON, an open-source software for application of machine learning algorithms to high-dimensional biomedical data ranging from the transcriptome to flow cytometry to the microbiome. Using a graphical user interface, standardized pipelines for predictive modeling, and automated machine learning, SIMON empowers non-technical biomedical researchers to identify patterns in their data and build high-quality predictive models.

View graph of relations

© 2020 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454