Identifying homogeneous subgroups of patients and important features: a topological machine learning approach

Ewan Carr, Mathieu Carrière, Bertrand Michel, Frédéric Chazal, Raquel Iniesta*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


Background: This paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph. Results: We present a pipeline to identify and summarise clusters based on statistically significant topological features from a point cloud using Mapper. Conclusions: Key strengths of this pipeline include the integration of prior knowledge to inform the clustering process and the selection of optimal clusters; the use of the bootstrap to restrict the search to robust topological features; the use of machine learning to inspect clusters; and the ability to incorporate mixed data types. Our pipeline can be downloaded under the GNU GPLv3 license at

Original languageEnglish
Article number449
Number of pages1
JournalBMC Bioinformatics
Issue number1
Early online date20 Sept 2021
Publication statusPublished - Dec 2021


  • Clustering
  • Machine learning
  • Topological data analysis

Cite this