Predicting health outcomes such as a disease onset, recovery or mortality is an important part of medical research. Classical methods of survival analysis such as Cox proportionate hazards model have successfully been employed and proved robust and easy to interpret. Recent development of computational methods and digitalization of medical records brought new tools to survival analysis, which can handle large data with complex non-linear relationships. However, such methods often result in “black box” models hard to interpret. In this project we combine the Cox model with tree-based machine-learning algorithms to take advantage of both approaches’ strength and to boost the overall predictive performance. Moreover, we aimed to preserve interpretability of the results, quantify the contribution of linear and non-linear and cross-term dependencies, and get insight into a potential non-linearity. The first method includes the Cox model, ensembled with the survival random forest. The second employs a survival tree algorithm to cluster the data, and then fits a separate Cox model in each cluster. The third uses the clusters obtained with a survival tree to identify interaction and non-linear terms and adds them as new terms to the Cox model. We tested the methods on simulated and real-life medical data and compared their internally validated discrimination and calibration. Our results show that classical models outperform combined methods in data with predominantly linear relationships. The proposed methods were more effective in predicting survival outcomes with strong non-linear and inter-dependent relationships and provided an insight into where the non-linearity is placed.
|Title of host publication||Artificial Intelligence Applications and Innovations. AIAI 2022. IFIP Advances in Information and Communication Technology|
|Number of pages||12|
|Publication status||Published - 10 Jun 2022|
- Cox proportional hazards model
- Machine Learning
- prediction model