Balancing accuracy and Interpretability: An R package assessing complex relationships beyond the Cox model and applications to clinical prediction

Diana Shamsutdinova*, Daniel Stamate, Daniel Stahl

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Accurate and interpretable models are essential for clinical decision-making, where predictions can directly impact patient care. Machine learning (ML) survival methods can handle complex multidimensional data and achieve high accuracy but require post-hoc explanations. Traditional models such as the Cox Proportional Hazards Model (Cox-PH) are less flexible, but fast, stable, and intrinsically transparent. Moreover, ML does not always outperform Cox-PH in clinical settings, warranting a diligent model validation. We aimed to develop a set of R functions to help explore the limits of Cox-PH compared to the tree-based and deep learning survival models for clinical prediction modelling, employing ensemble learning and nested cross-validation. Methods: We developed a set of R functions, publicly available as the package “survcompare”. It supports Cox-PH and Cox-Lasso, and Survival Random Forest (SRF) and DeepHit are the ML alternatives, along with the ensemble methods integrating Cox-PH with SRF or DeepHit designed to isolate the marginal value of ML. The package performs a repeated nested cross-validation and tests for statistical significance of the ML's superiority using the survival-specific performance metrics, the concordance index, time-dependent AUC-ROC and calibration slope. To get practical insights, we applied this methodology to clinical and simulated datasets with varying complexities and sizes. Results: In simulated data with non-linearities or interactions, ML models outperformed Cox-PH at sample sizes ≥ 500. ML superiority was also observed in imaging and high-dimensional clinical data. However, for tabular clinical data, the performance gains of ML were minimal; in some cases, regularised Cox-Lasso recovered much of the ML's performance advantage with significantly faster computations. Ensemble methods combining Cox-PH and ML predictions were instrumental in quantifying Cox-PH's limits and improving ML calibration. Traditional models like Cox-PH or Cox-Lasso should not be overlooked while developing clinical predictive models from tabular data or data of limited size. Conclusion: Our package offers researchers a framework and practical tool for evaluating the accuracy-interpretability trade-off, helping make informed decisions about model selection.

Original languageEnglish
Article number105700
JournalInternational Journal of Medical Informatics
Volume194
DOIs
Publication statusPublished - Feb 2025

Keywords

  • Clinical prediction model
  • Ensemble methods
  • Internal validation
  • Interpretability
  • R
  • Survival analysis

Fingerprint

Dive into the research topics of 'Balancing accuracy and Interpretability: An R package assessing complex relationships beyond the Cox model and applications to clinical prediction'. Together they form a unique fingerprint.

Cite this