Abstract:
Mathematical modelling of biological processes is important in systems biology, because it facilitates understanding of the nature and behaviour of biological systems. This thesis is in two parts, the first on constructing and validating Gene Regulatory Network (GRN) models, and the second on mathematical model representation. A method, BaSeTraM, for identifying transcription factor (TF) binding sites from position weight matrices was developed. The sites identi_ed were used to build a GRN by identifying genes near each site. BaSeTraM performed comparably to a widely used method when validated against experimental data, with the advantage that the selectivity-sensitivity trade-o_ is controlled by adjusting the posterior probability. A classifier for detecting genes with missing regulators in a GRN model was developed, using regression to convert a qualitative model into a quantitative model, with an iterative method to predict expression levels in gene knock-out strains. Errors for each gene were used to predict which genes were missing regulators. Validation of the classifier to detect regulators deleted from a yeast GRN model showed that it out-performed random guessing. Finally, a method was developed for validating entire models by converting to a quantitative model, and predicting gene expression by regulator levels. Validating models built using BaSeTraM against human microarray data showed that degraded models had lower errors than the original in > 50% of all predictions. The bimodal distribution of per-gene proportion of higher errors suggests that the original model described some genes more accurately. This method provides a general framework within which to validate GRN models against genome-wide gene expression data sets. An API for working with CellML models was developed, allowing applications to process mathematical models more easily. A model representation language, ModML, for representing models as a transformation from a domain specific language (DSL) into a data structure describing differential-algebraic equations was developed, along with tools for performing numerical simulations from models. Two DSLs based on ModML were developed; ModML Units, for equations with physical units, and ModML Reactions, for reaction systems. The utility of the DSLs has been demonstrated by expressing existing models in them. The development of ModML and DSLs built on top of it mean that models describing components of a system in different ways can be more easily composed to facilitate understanding of the system.