Community Structure Detection in Complex Biological Networks

Student thesis: Doctoral ThesisDoctor of Philosophy


With the advent of high-throughput technology there has been a large increase in the availability of biological data, such as interaction data of genes, proteins and metabolites. It is therefore necessary to develop ways in which these data can be efficiently modelled and analysed. Networks over a natural modelling framework for complex biological systems and as such, network theory and related computational approaches have proven important in bioinformatics. A particular facet of network theory that has been employed to analyse biological networks is community structure detection. Community structure is a modular network topology where communities are defined as groups nodes with dense intra-community connections and less dense inter-community connections.
Methods to uncover such communities in complex biological networks have
the potential to contribute towards a better understanding of the underlying organisation of a system. Consequently, this thesis focuses on the development of a series of mathematical programming models to address various manifestations of the community structure detection problem. The aim is to produce more information-rich models that can accurately represent the features of biological systems; with weighted and unweighted interactions, disjoint and overlapping communities and network dynamics all being considered.

First, the detection of disjoint communities in unweighted networks is approached through a two-stage procedure, known as iMod. A mixed integer nonlinear programming (MINLP) model optimises modularity to find an initial partition which is then improved by iteratively solving a mixed integer quadratic programming (MIQP) model. A comparative analysis shows that iMod finds globally optimal solutions for networks of up to 512 nodes and outperforms all other methods tested when applied to larger networks. Subsequently, the MINLP model is generalised to weighted networks, known accordingly as WeiMod. Competitive results are found when WeiMod is compared with several other well known methods from the literature. Next, the work on disjoint community structure
is extended to _nd overlapping modules. An MINLP model, known as OverMod, transforms
disjoint to overlapping communities through the optimisation of the community
strength metric. OverMod is compared with similar methods from the literature and is further assessed on protein-protein interaction (PPI) networks to test the method's ability to extract meaningful biological results. It is shown that proteins assigned to more than one module exhibit topological and functional properties indicative of their strategically important role in the organisation of the PPI networks. The work on disjoint and overlapping community structure concludes with the investigation of a network generated from sequence, protein interaction and co-expression data, for the fungal pathogen, Fusarium graminearum. The functional coherence of communities, properties of multiclustered genes and aspects of virulence-associated genes are all explored in an attempt to link topological and functional features.

Finally, the concept of community structure in dynamic networks is explored. Consensus clustering is tackled; de_ned as detecting a single partition of a dynamic network that is relevant across multiple snapshots. This is addressed by extending the previously proposed MIQP and MINLP models such that average modularity across network snapshots is now optimised. A comparison is made with a similar method from the literature showing that the proposed approaches achieve competitive results for small to medium sized networks. Overall, this thesis demonstrates that the exible nature of mathematical programming lends itself well to developing versatile solution procedures for community structure detection. The method evaluations show the proposed algorithms to be comparable to other approaches from the literature and able to detect meaningful results in biological applications. Finally, the methods described in this thesis have the potential to infer important topological-functional relationships and help to provide insight into the organisation and evolution of biological systems.
Date of Award1 Mar 2013
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorSophia Tsoka (Supervisor) & Lazaros Papageorgiou (Supervisor)

Cite this