Abstract:
The field of evolutionary inference from molecular sequences is currently a
burgeoning field that requires strict standards as well as pioneering ideas to fully
interpret the plethora of information being generated in molecular biology.
Phylogenetic methods attempt to resolve evolutionary histories from the molecular
data. The glut of available information poses new challenges to the field because in
many instances the data do not all support the same hypotheses (topologies). This
thesis is a simulation-based study of phylogenetic methods that assesses error rates in
some of the methods of topology hypothesis testing, and provides an exploration of
real data for developing new methods for model testing and model selection. Overall,
the number of simulation runs presented in this thesis totals in the hundreds of
millions, including simulation of replicate data sets, resampling of sequence data to
generate pseudoreplicates and phylogenetic estimation from these replicate data sets.
It involves the use of many currently available software packages and the analysis of
these results with a number of Perl scripts. My results demonstrate that different tests,
both essentially designed to test for a significant difference in topologies, yield
answers that do not always agree. The reasons behind this and the implications for
systematists wanting to use these tests are discussed. In addition, the type I error of a
widely used test, the Swofford-Olsen-Waddell-Hillis test (SOWH test), was estimated
and shown to be excessive under conditions of model violation. The strict and correct
use of these tests still remains an issue with significantly more simulation work still
to be done. Finally, the site-patterns of real pre-aligned data sets were analysed using
a parametric bootstrapping approach that allows one to compare models and to
estimate hypotheses about the patterns of evolution amongst the taxa.