Predicting type 2 diabetes prevalence for people with severe mental illness in a multi-ethnic East London population

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)


Background and aims: Prevalence of type two diabetes mellitus (T2DM) in people with severe mental illness (SMI) is 2–3 times higher than in general population. Predictive modelling has advanced greatly in the past decade, and it is important to apply cutting-edge methods to vulnerable groups. However, few T2DM prediction models account for the presence of mental illness, and none seemed to have been developed specifically for people with SMI. Therefore, we aimed to develop and internally validate a T2DM prevalence model for people with SMI. Methods: We utilised a large cross-sectional sample representative of a multi-ethnic population from London (674,000 adults); 10,159 people with SMI formed our analytical sample (1,513 T2DM cases). We fitted a linear logistic regression and XGBoost as stand-alone models and as a stacked ensemble. Age, sex, body mass index, ethnicity, area-based deprivation, past hypertension, cardiovascular diseases, prescribed antipsychotics, and SMI illness were the predictors. Results: Logistic regression performed well while detecting T2DM presence for people with SMI: area under the receiver operator curve (ROC-AUC) was 0.83 (95 % CI 0.79–0.87). XGBoost and LR-XGBoost ensemble performed equally well, ROC-AUC 0.83 (95 % CI 0.79–0.87), indicating a negligible contribution of non-linear terms to predictive power. Ethnicity was the most important predictor after age. We demonstrated how the derived models can be utilised and estimated a 2.14 % (95 %CI 2.03 %-2.24 %) increase in T2DM prevalence in East London SMI population in 20 years’ time, driven by the projected demographic changes. Conclusions: Primary care data, the setting where prediction models could be most fruitfully used, provide enough information for well-performing T2DM prevalence models for people with SMI. We demonstrated how thorough internal cross-validation of an ensemble of a linear and machine-learning model can quantify the predictive value of non-linearity in the data.

Original languageEnglish
Article number105019
JournalInternational Journal of Medical Informatics
Publication statusPublished - Apr 2023


  • Electronic health records
  • Physical and mental health
  • Prediction modelling
  • Schizophrenia
  • Severe mental illness
  • Type 2 diabetes


Dive into the research topics of 'Predicting type 2 diabetes prevalence for people with severe mental illness in a multi-ethnic East London population'. Together they form a unique fingerprint.

Cite this