TY - JOUR
T1 - A divisive hierarchical clustering methodology for enhancing the ensemble prediction power in large scale population studies
T2 - the ATHLOS project
AU - Barmpas, Petros
AU - Tasoulis, Sotiris
AU - Vrahatis, Aristidis G.
AU - Georgakopoulos, Spiros V.
AU - Anagnostou, Panagiotis
AU - Prina, Matthew
AU - Ayuso-Mateos, José Luis
AU - Bickenbach, Jerome
AU - Bayes, Ivet
AU - Bobak, Martin
AU - Caballero, Francisco Félix
AU - Chatterji, Somnath
AU - Egea-Cortés, Laia
AU - García-Esquinas, Esther
AU - Leonardi, Matilde
AU - Koskinen, Seppo
AU - Koupil, Ilona
AU - Paja̧k, Andrzej
AU - Prince, Martin
AU - Sanderson, Warren
AU - Scherbov, Sergei
AU - Tamosiunas, Abdonas
AU - Galas, Aleksander
AU - Haro, Josep Maria
AU - Sanchez-Niubo, Albert
AU - Plagianakos, Vassilis P.
AU - Panagiotakos, Demosthenes
N1 - Funding Information:
This work is supported by the ATHLOS (Aging Trajectories of Health: Longitudinal Opportunities and Synergies) project, funded by the European Union’s Horizon 2020 Research and Innovation Program under Grant Agreement Number 635316.
Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Nature Switzerland AG.
PY - 2022/4/18
Y1 - 2022/4/18
N2 - The ATHLOS cohort is composed of several harmonized datasets of international groups related to health and aging. As a result, the Healthy Aging index has been constructed based on a selection of variables from 16 individual studies. In this paper, we consider additional variables found in ATHLOS and investigate their utilization for predicting the Healthy Aging index. For this purpose, motivated by the volume and diversity of the dataset, we focus our attention upon data clustering, where unsupervised learning is utilized to enhance prediction power. Thus we show the predictive utility of exploiting hidden data structures. In addition, we demonstrate that imposed computation bottlenecks can be surpassed when using appropriate hierarchical clustering, within a clustering for ensemble classification scheme, while retaining prediction benefits. We propose a complete methodology that is evaluated against baseline methods and the original concept. The results are very encouraging suggesting further developments in this direction along with applications in tasks with similar characteristics. A straightforward open source implementation for the R project is also provided (https://github.com/Petros-Barmpas/HCEP).
AB - The ATHLOS cohort is composed of several harmonized datasets of international groups related to health and aging. As a result, the Healthy Aging index has been constructed based on a selection of variables from 16 individual studies. In this paper, we consider additional variables found in ATHLOS and investigate their utilization for predicting the Healthy Aging index. For this purpose, motivated by the volume and diversity of the dataset, we focus our attention upon data clustering, where unsupervised learning is utilized to enhance prediction power. Thus we show the predictive utility of exploiting hidden data structures. In addition, we demonstrate that imposed computation bottlenecks can be surpassed when using appropriate hierarchical clustering, within a clustering for ensemble classification scheme, while retaining prediction benefits. We propose a complete methodology that is evaluated against baseline methods and the original concept. The results are very encouraging suggesting further developments in this direction along with applications in tasks with similar characteristics. A straightforward open source implementation for the R project is also provided (https://github.com/Petros-Barmpas/HCEP).
KW - ATHLOS cohort
KW - Clustering
KW - Ensemble methods
KW - Prediction enhancement
UR - http://www.scopus.com/inward/record.url?scp=85128326023&partnerID=8YFLogxK
U2 - 10.1007/s13755-022-00171-1
DO - 10.1007/s13755-022-00171-1
M3 - Article
AN - SCOPUS:85128326023
SN - 2047-2501
VL - 10
JO - Health Information Science and Systems
JF - Health Information Science and Systems
IS - 1
M1 - 6
ER -