Big data approaches to investigating Child Mental Health disorder outcomes

Student thesis: Doctoral ThesisDoctor of Philosophy


Background: In the UK, administrative data resources continue to expand across publically funded youth-orientated health, education and social services. Despite attempts to capture these data in structured formats, which are more accessible for analysis, most health information is stored as free text entry in electronic records. Big data techniques which combine large scale data linkage and automatic information extraction from free text, using Natural Language Processing (NLP), have considerable potential for enhancing the depth of information available from routinely collected public service data. There are a very limited number of published studies which have applied these big data techniques to answer questions relevant to child and adolescent psychiatry.
Methods: This thesis examined original and clinically relevant research questions using data from routinely collected electronic health records, enriched by NLP and linkages to external data sources. Five related studies were performed all using data obtained from the SLaM BRC Case Record Information Search (CRIS) extracted using a NLP approaches, with two studies using external linkages with routinely collected national electronic datasets (NHS Hospital Episode Statistics and DfE National Pupil Database, NPD).
Results: Using these data resources, I provide empirical support for the hypothesis that neurodevelopmental comorbidities increase children and adolescents’ risk for potentially more harmful treatments, greater treatment complexity and worse clinical outcomes. The NLP methods employed overcame limitations of structured data extraction, providing better assessment of a diverse range of symptom types, severity and related impairments, including suicidal risk, negative symptoms, antipsychotic treatment failure, and self-harm. External data linkages with the NPD enabled population level analyses by nesting clinical samples within their source population. NPD linkage also permitted the inclusion of education performance data, which were not routinely available within electronic health records.
Conclusion: The thesis illustrates how the legal, governance and technical challenges were surmountable to enable linkage between NHS and Department for Education public service data. Also, it demonstrated that NLP and data linkages of electronic health records, have a clear role in clinical epidemiological studies of child and adolescent mental health. These tools, combined with the continued digitisation of public service activity, can unlock huge and detailed data resources for population-based analyses. However, current approaches have deficiencies, including limitations in accuracy, construct validity, and restrictions in the data available, providing challenges for future research.
Date of Award2018
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorMatthew Hotopf (Supervisor), Richard Hayes (Supervisor) & Tamsin Ford (Supervisor)

Cite this