Data reduction methods to study cancer susceptibility

    Student thesis: Doctoral ThesisDoctor of Philosophy

    Abstract

    Background and Aims
    Cancer burden continues to increase in an aging population and hence cancer data has evolved into complex and multidimensional datasets with the advent of the OMICs sciences. Pathogenesis varies between patients and presents an intricate gene-environment interplay, which is reflected by the multifactorial character of the population susceptibility to cancer.
    The current thesis, therefore, aims to comprehend population susceptibility to cancer and heterogeneity of the disease by investigating new statistical approaches using multidimensional cancer datasets to ultimately develop effective stratification models for cancer risk, with the potential to improve cancer prevention and early detection.
    Methods
    The thesis is divided into two main areas of study: Population susceptibility to disease and Individuals’ susceptibility to disease.
    1. Population susceptibility to disease.
    The following projects utilised data from the Apolipoprotein MOrtality RISk (AMORIS) study:
    a. The Blood exposome
    A subset of the internal-external blood exposome components were evaluated by exploring the reciprocity of 21 standard serum markers and 4 external factors following a four-step statistical analysis: correlation analysis, hierarchical clustering, principal component analysis and multivariable analysis of the variance (n=154,207).
    b. Metabolic profiles to assess cancer risk and mortality
    To identify metabolic profiles linked to carcinogenesis and mortality and their intrinsic associations, latent class analyses followed by multivariate Cox regression analyses were performed to characterise subgroups of individuals based on 19 standard blood biomarker measurements to reflect population heterogeneity (n=13,615).
    2. Individuals’ susceptibility to disease.
    c. Discrimination of breast cancer tissue
    Imaging data generated by scanning 44 ex vivo breast tissue samples, utilising a terahertz probe (n=257), was evaluated using a two-step statistical approach (Gaussian deconvolution processing followed by a Naïve Bayes Classifier) to distinguish malignant and benign breast tissue, with the ultimate aim to identify malignant tissue intraoperatively ensuring clear negative tumour margins in breast-conserving surgery.
    Results
    a) The subset of the blood exposome analysis in AMORIS showed a tight interaction between internal markers of related pathways such as iron markers, whilst less well-known correlations also appeared (Albumin and Calcium). External markers showed that males and lower education were associated with serum biomarker levels that might be indicative of worse health outcomes. The variability of the data was distributed among all the markers studied.
    b) The metabolic profiles analysis in AMORIS identified four LCA metabolic profiles within the population: (1) normal values for all markers (63% of population); (2) abnormal values for lipids (22%); (3) abnormal values for liver functioning (9%); (4) abnormal values for iron and inflammation metabolism (6%). All metabolic profiles (classes 2-4) increased risk of cancer and mortality, compared to class 1 (e.g. HR for overall death was 1.26 (95%CI: 1.16 - 1.37), 1.67 (95%CI: 1.47 - 1.90), and 1.21 (95%CI: 1.05 - 1.41) for class 2, 3, and 4, respectively).
    c) The Bayesian classifier for tumour tissue discrimination performed using the combined Gaussian derivatives, obtained the following values: 69%, 89%, 53%, 60%, 86%, for accuracy, sensitivity, specificity, positive predictive value and negative predictive value respectively. Tumour tissue was classified correctly in more than 89% of the cases with an accuracy of 0.7 and sensitivity of 0.9.
    Conclusion
    The subset of the blood exposome studied presented a complex synergy between the internal-external components, which demonstrates the need of systemic approaches involving multiple markers capable of evaluating the internal biological and external environment when assessing health outcomes. Moreover, the LCA analysis indicated that internal blood markers, when assembled into meaningful metabolic profiles by optimised statistical methods, could help stratify the population for cancer risk and mortality and provide insight in cancer susceptibility and aetiology. Finally, the Bayesian classifier effectively discriminated malignant from benign breast tissue using TPI imaging data, however presenting moderate specificity, which suggests the potential clinical applicability of this method to improve the adequate excision of the margins in BCS surgery, if the specificity can be optimised.
    Overall, the projects in this thesis demonstrate the capability of data reduction methods to explore cancer susceptibility and develop potentially effective stratification models, and highlight the importance of data driven approaches in the assessment of multifactorial diseases such as cancer when supported by robust statistical analysis.
    Date of Award2018
    Original languageEnglish
    Awarding Institution
    • King's College London
    SupervisorMieke Van Hemelrijck (Supervisor) & Anita Grigoriadis (Supervisor)

    Cite this

    '