Identifying homogeneous subgroups of patients and important features: a topological machine learning approach

Ewan Carr, Mathieu Carrière, Bertrand Michel, Frédéric Chazal, Raquel Iniesta*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Background: This paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph. Results: We present a pipeline to identify and summarise clusters based on statistically significant topological features from a point cloud using Mapper. Conclusions: Key strengths of this pipeline include the integration of prior knowledge to inform the clustering process and the selection of optimal clusters; the use of the bootstrap to restrict the search to robust topological features; the use of machine learning to inspect clusters; and the ability to incorporate mixed data types. Our pipeline can be downloaded under the GNU GPLv3 license at https://github.com/kcl-bhi/mapper-pipeline.

Original languageEnglish
Article number449
Number of pages1
JournalBMC Bioinformatics
Volume22
Issue number1
Early online date20 Sept 2021
DOIs
Publication statusPublished - Dec 2021

Keywords

  • Clustering
  • Machine learning
  • Topological data analysis

Cite this