Abstract
The study of Complex Systems focuses on how interactions of constituents within a system, individually or grouped into clusters, produce behavioral patterns locally or globally and how these interact with the external environment. Over the last few decades the study of Complex Systems has gone through a growing rate of interest and today, given a sufficiently big set of data, we are able to construct comprehensive models describing emerging characteristics and properties of complex phenomena transcending the different domains of physical, biological and social sciences.The use of network theory has shown, amongst others, a particular t in describing statical and dynamical correlations of complex data sets because its ability to deal not only with deterministic quantities but also with probabilistic methods. A complex system is generally an open system flexible in adapting to variable external conditions in the way that it exchanges information with environment and adjusts its internal structure in the process of self-organization. Moreover, it has been shown how real world phenomena that are represented by complex systems display interesting statistical properties such as power-law distributions, long-range interactions, scale invariance, criticality, multifractality and hierarchical structure.
In the era of big data where effort is largely put to collect large data sets carrying relevant information about given phenomena to be studied and analysed, the interesting field of quantitative semantics, e.g. dealing with information expressed in natural language, is becoming more and more relevant particularly in the social sciences. However, recent studies are expanding these techniques to become a tool for structuring and organising information across a number of disparate disciplines.
In this Thesis I propose a methodology that (i) extracts a structured complex data set from large corpora of descriptive language sources and efficiently exploits the power of quantitative semantics techniques to map the essence of a complex phenomena into a network representation, and (ii) combines such induced knowledge network with a graph theoretical framework utilising a number of graph theory tools to study the emerging properties of complex systems. Thus, leveraging on developments in Computational Linguistics and Network Theory, the proposed approach builds a graph representation of knowledge, which is analyzed with the aim of observing correlations between any two nodes or across clusters of nodes and highlights emerging properties by means of both topological structure analysis and dynamic evolution, i.e. the change in connectivity. Under this framework I will provide two real-world applications:
- The fist application deals with the creation of a structured network of biomedical concepts starting from an unstructured corpus of biological text-based data set (peer reviewed articles) and next it retrieves known pathophysiological Mode of Actions by applying a stochastic random-walk measure and finds new ones by meaningfully selecting and aggregating contributions from known bio-molecular interactions. By exploiting the proposed graph-theoretic model, this approach has proven to be an innovative way to find emergent mechanism of actions aimed at drug repurposing where existing biologic compounds originally intended to deal with certain pathophysiologic actions are redirected for treating other type of clinical indications.
- The second application consists of a representation of a finnancial and economic system through a network of interacting entities and to devise a novel semantic index influenced by the topological properties of agglomerated information in a semantic graph. I have shown how it is possible to fully capture the dynamical aspects of the phenomena under investigation by identifying clusters carrying in uflential information and tracking them over time. By computing graph-based statistics over such clusters I turn the evolution of textual information into a mathematically well-defined, multivariate time series, where each time series encodes the evolution of particular structural, topological and semantic properties of the set of concepts previously extracted and filtered. Eventually an autoregressive model with vectorial exogenous inputs is defined, which linearly mixes previous values of an index with the evolution of other time series induced by the semantic information in the graph.
The methodology brie y described above concludes the contribution of my research work in the field of Complex Systems and it has been instrumental in successfully defining a graph-theoretical model for the study of drug repurposing [1] and the construction of a framework for the analysis of financial and economic unstructured data (see chapter 6).
Date of Award | 2015 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Damiano Brigo (Supervisor) & Tiziana Di Matteo (Supervisor) |