Novel Machine Learning Pipelines with Applications to Finance

Student thesis: Doctoral ThesisDoctor of Philosophy


Machine learning is an artificial intelligence technique used to automatically infer rules from data, and use these rules to perform some tasks on unknown data. This technique is widely used in the field of finance and other disciplines and is characterised by the combination of massive amounts of data and the powerful computing abilities of modern computers. The surge of cryptocurrency markets with their high fluctuations has challenged both traditional econometrics tools, based on statistics and time series analysis, and machine learning. Investigations into the analysis of cryptocurrency markets and the use of emerging machine learning techniques within are therefore useful for researchers to compare market performance and technological innovation in traditional equity/bond markets and cryptocurrency markets.

The problem within this area is however the wide array of disciplines contributing to the field. Although there exist a wealth of surveys related to the research on blockchain and cryptocurrencies, none is really comprehensive and able to cut across different fields. The first part of the thesis focuses on a widely cited survey on cryptocurrency trading, which informs much academic and industry work in this area. The survey provides an in-depth analysis of the literature from the perspective of research distribution among properties, categories, technologies, datasets, research trends and opportunities.

One of the research directions identified in the survey is the prediction of signals for cryptocurrency markets on live data. We address this research challenge in the second part of the thesis. An important finding of this work is that by using multi-layer architectures, deep learning model, and dynamic retraining methods, we can overcome the decay in predictive power on live data due to non-stationary features of the order book. A new dynamic retraining structure is proposed and compared to existing training frameworks in this part.

In the last part of the thesis, we look more closely at the model selection motivated by the success of regular retraining for cryptocurrency prediction. We study model selection from the first principles and independently from the application domain, with the objective to find techniques that are alternative to cross-validation (which often relies on the absence of temporal relationships in the data). We focus on tree models and use the dispersion of feature importance as a criterion for model selection. We show how this new method can help us choose models with a better generalisation more efficiently.
Date of Award1 Nov 2023
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorCarmine Ventre (Supervisor) & Maria Polukarov (Supervisor)

Cite this