Machine Learning Methods in Neuroimaging

Student thesis: Doctoral ThesisDoctor of Philosophy


The use of Machine Learning (ML) in neuroscience has significantly increased over the last years. In contrast to previous methodologies, ML has important implications for clinical applications as it allows to make predictions for a single subject, rather than requiring a group of subjects to draw inference from. More and more data is becoming available, and as ML can find patterns in the data in a completely data-driven fashion, it is more commonly used to interpret data.
In neuroscience, in particular, ML is transforming and leading the field towards a paradigm shift where the focus is no longer on the explainability of the models but rather on the prediction. In this Thesis, I will discuss how neuroscience can profit from and is being shaped by ML methods.

As ML algorithms vary in their properties, complexity and assumptions, choosing the right model to analyse the problem at hand, is a difficult task. From the vast extent of possible ML algorithms, which one is the optimal model to predict the target variable? Which model will best fit the underlying statistical properties of the data? Which are the best parameter combinations for the chosen model? There are potentially infinite combinations of approaches that can influence the performance of the model significantly. In this Thesis, I will first assess the impact of feature selection and different inputs (i.e., voxel-based and region-based pre-processing of the structural MRI data) on the performance and generalisability of the three most used ML algorithms for brain-age prediction.
Because of the large availability of data and the presence of known a ground truth, predicting age based on changes obtained from neuroimaging datasets is becoming more popular as a way for benchmarking ML models in neuroscience.

I will also analyse the reliability of brain-age predictions yielded by different models and their reliability over time. As many neuroimaging studies are now collaborative efforts and more data is being shared online, it is not only important to evaluate the performance of ML models using larger and more heterogeneous datasets but also how well they perform on the same subjects over time. For this reason, I will not only evaluate the reliability of different ML algorithms used to predict brain-age over time but also study the impact of different scanners on the prediction.

Another way of cutting through the complexity of choosing the best algo-rithm is automated machine learning (autoML), which consists in automatically exploring different combinations and finding the best algorithms and their hy-perparameters for the current dataset under analysis. In this Thesis, I have evaluated the suitability of autoML to a neuroimaging problem. In particular, I used autoML to find the optimal pipeline to predict brain age.

Based on these results that highlight the suitability of autoML to neuroimag-ing datasets, I propose a novel autoML framework that could be particularly useful for small N datasets, such as those found in neuroimaging. I argue that autoML can be used as a data-driven approach to learn patterns in the data, to automatically select the best hyperparameters and models in a researcher un-biased fashion, which might help to avoid common pitfalls from ML algorithms such as overfitting.

As a whole, this Thesis provides an overview of the current status of machine learning and neuroscience and argues how autoML can advance and simplify the usage of ML in neuroscience.
Date of Award1 Nov 2020
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorPete Hellyer (Supervisor) & Federico Turkheimer (Supervisor)

Cite this