Abstract:
Genome studies have become an integral aspect of modern biology. As a result, there has been a need for methods to analyse genomic data. One aspect of genomic research is the analysis of variation in rate of evolution, both across a genome and between the genomes of species. In this study we explore the relationship between different types of rate heterogeneity. We develop several statistical methodologies to address issues associated with the phylogenomic analysis of genomic data. Developments were made to the area of lineage-specific variation in evolutionary rates, with improvements in more efficient computational implementations of relaxed molecular clock models and the proposal of new models of rate changes across branches. The practical application of relaxed molecular clock models was further examined with the proposal of methods of model averaging and model selection for relaxed molecular clock models using Bayesian stochastic search variable selection. Results show that our method identifies the most appropriate model for the underlying distribution of rates across branches in both simulated and real data. Our method of model averaging is particularly useful for preventing poor inference when the correct model is not known. We examined the correlation in rates of substitution between functionally related genes that are caused by co-evolution of genes. Previously, this correlation was thought to only exist between genes with physically interacting gene products. We demonstrate that these correlations are not limited to genes with protein interactions but often extend to functionally related genes. Such patterns of co-evolution are of concern for the multi-gene analysis of genomic data and how species distances are estimated. Finally, an attempt was made to develop a high-throughput method for detecting lineage-specific selection through identifying changes in rate of substitution. Results on simulated data indicate that our method had some success in characterising the variation in rate which occurs as a consequence of selection. Our methods were shown to provide significant speed benefits towards phylogenomic analyses. The outcome of this research has been a progression in methodologies for phylogenomic analysis. Computer software has been developed to allow these methods to be used for understanding rate variations on a genomic scale.