King's College London

Research portal

Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework

Research output: Contribution to journalArticle

Standard

Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework. / Yang, Lingjian; Ainali, Chrysanthi; Tsoka, Sophia; Papageorgiou, Lazaros G.

In: BMC Bioinformatics, Vol. 15, No. 1, 390, 05.12.2014.

Research output: Contribution to journalArticle

Harvard

Yang, L, Ainali, C, Tsoka, S & Papageorgiou, LG 2014, 'Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework', BMC Bioinformatics, vol. 15, no. 1, 390. https://doi.org/10.1186/s12859-014-0390-2

APA

Yang, L., Ainali, C., Tsoka, S., & Papageorgiou, L. G. (2014). Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework. BMC Bioinformatics, 15(1), [390]. https://doi.org/10.1186/s12859-014-0390-2

Vancouver

Yang L, Ainali C, Tsoka S, Papageorgiou LG. Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework. BMC Bioinformatics. 2014 Dec 5;15(1). 390. https://doi.org/10.1186/s12859-014-0390-2

Author

Yang, Lingjian ; Ainali, Chrysanthi ; Tsoka, Sophia ; Papageorgiou, Lazaros G. / Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework. In: BMC Bioinformatics. 2014 ; Vol. 15, No. 1.

Bibtex Download

@article{3de9de83d9b949719acd3dc9d8254870,
title = "Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework",
abstract = "Background: Applying machine learning methods on microarray gene expression profiles for disease classification problems is a popular method to derive biomarkers, i.e. sets of genes that can predict disease state or outcome. Traditional approaches where expression of genes were treated independently suffer from low prediction accuracy and difficulty of biological interpretation. Current research efforts focus on integrating information on protein interactions through biochemical pathway datasets with expression profiles to propose pathway-based classifiers that can enhance disease diagnosis and prognosis. As most of the pathway activity inference methods in literature are either unsupervised or applied on two-class datasets, there is good scope to address such limitations by proposing novel methodologies. Results: A supervised multiclass pathway activity inference method using optimisation techniques is reported. For each pathway expression dataset, patterns of its constituent genes are summarised into one composite feature, termed pathway activity, and a novel mathematical programming model is proposed to infer this feature as a weighted linear summation of expression of its constituent genes. Gene weights are determined by the optimisation model, in a way that the resulting pathway activity has the optimal discriminative power with regards to disease phenotypes. Classification is then performed on the resulting low-dimensional pathway activity profile. Conclusions: The model was evaluated through a variety of published gene expression profiles that cover different types of disease. We show that not only does it improve classification accuracy, but it can also perform well in multiclass disease datasets, a limitation of other approaches from the literature. Desirable features of the model include the ability to control the maximum number of genes that may participate in determining pathway activity, which may be pre-specified by the user. Overall, this work highlights the potential of building pathway-based multi-phenotype classifiers for accurate disease diagnosis and prognosis problems.",
keywords = "Disease classification, Mathematical programming, Microarray, Optimisation, Pathway activity",
author = "Lingjian Yang and Chrysanthi Ainali and Sophia Tsoka and Papageorgiou, {Lazaros G.}",
year = "2014",
month = "12",
day = "5",
doi = "10.1186/s12859-014-0390-2",
language = "English",
volume = "15",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

RIS (suitable for import to EndNote) Download

TY - JOUR

T1 - Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework

AU - Yang, Lingjian

AU - Ainali, Chrysanthi

AU - Tsoka, Sophia

AU - Papageorgiou, Lazaros G.

PY - 2014/12/5

Y1 - 2014/12/5

N2 - Background: Applying machine learning methods on microarray gene expression profiles for disease classification problems is a popular method to derive biomarkers, i.e. sets of genes that can predict disease state or outcome. Traditional approaches where expression of genes were treated independently suffer from low prediction accuracy and difficulty of biological interpretation. Current research efforts focus on integrating information on protein interactions through biochemical pathway datasets with expression profiles to propose pathway-based classifiers that can enhance disease diagnosis and prognosis. As most of the pathway activity inference methods in literature are either unsupervised or applied on two-class datasets, there is good scope to address such limitations by proposing novel methodologies. Results: A supervised multiclass pathway activity inference method using optimisation techniques is reported. For each pathway expression dataset, patterns of its constituent genes are summarised into one composite feature, termed pathway activity, and a novel mathematical programming model is proposed to infer this feature as a weighted linear summation of expression of its constituent genes. Gene weights are determined by the optimisation model, in a way that the resulting pathway activity has the optimal discriminative power with regards to disease phenotypes. Classification is then performed on the resulting low-dimensional pathway activity profile. Conclusions: The model was evaluated through a variety of published gene expression profiles that cover different types of disease. We show that not only does it improve classification accuracy, but it can also perform well in multiclass disease datasets, a limitation of other approaches from the literature. Desirable features of the model include the ability to control the maximum number of genes that may participate in determining pathway activity, which may be pre-specified by the user. Overall, this work highlights the potential of building pathway-based multi-phenotype classifiers for accurate disease diagnosis and prognosis problems.

AB - Background: Applying machine learning methods on microarray gene expression profiles for disease classification problems is a popular method to derive biomarkers, i.e. sets of genes that can predict disease state or outcome. Traditional approaches where expression of genes were treated independently suffer from low prediction accuracy and difficulty of biological interpretation. Current research efforts focus on integrating information on protein interactions through biochemical pathway datasets with expression profiles to propose pathway-based classifiers that can enhance disease diagnosis and prognosis. As most of the pathway activity inference methods in literature are either unsupervised or applied on two-class datasets, there is good scope to address such limitations by proposing novel methodologies. Results: A supervised multiclass pathway activity inference method using optimisation techniques is reported. For each pathway expression dataset, patterns of its constituent genes are summarised into one composite feature, termed pathway activity, and a novel mathematical programming model is proposed to infer this feature as a weighted linear summation of expression of its constituent genes. Gene weights are determined by the optimisation model, in a way that the resulting pathway activity has the optimal discriminative power with regards to disease phenotypes. Classification is then performed on the resulting low-dimensional pathway activity profile. Conclusions: The model was evaluated through a variety of published gene expression profiles that cover different types of disease. We show that not only does it improve classification accuracy, but it can also perform well in multiclass disease datasets, a limitation of other approaches from the literature. Desirable features of the model include the ability to control the maximum number of genes that may participate in determining pathway activity, which may be pre-specified by the user. Overall, this work highlights the potential of building pathway-based multi-phenotype classifiers for accurate disease diagnosis and prognosis problems.

KW - Disease classification

KW - Mathematical programming

KW - Microarray

KW - Optimisation

KW - Pathway activity

UR - http://www.scopus.com/inward/record.url?scp=84923932604&partnerID=8YFLogxK

U2 - 10.1186/s12859-014-0390-2

DO - 10.1186/s12859-014-0390-2

M3 - Article

AN - SCOPUS:84923932604

VL - 15

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 390

ER -

View graph of relations

© 2018 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454