Abstract:
This thesis introduces sampled ancestor phylogenetic trees—trees in which sampled taxa can lie directly on the branches and represent ancestors of other taxa that are sampled later. Phylogenetic trees in molecular biology were traditionally reconstructed from molecular data of contemporaneous species where ancestors among sampled taxa cannot occur. More recently, phylogenetic analyses have been performed on sequences from pathogens sampled from different patients at different times where the possibility of sampled ancestors in transmission trees became apparent. The sampled ancestors in this case are patients who transmitted a disease (either directly or through a chain of transmissions) to other sampled patients. Similarly, in palaeontology, cladograms of fossil taxa traditionally have not contained sampled ancestors. Recently, it has been recognised that the probability of sampled ancestors is not negligible among fossil taxa and the methods that use fossils to date phylogenies should account for this. In this thesis, I develop a Bayesian Markov chain Monte Carlo framework for inferring sampled ancestor phylogenies, investigate properties of the sampled ancestor tree space, test birth-death sampled ancestor models and apply these models to empirical datasets. I also consider the problem of dating phylogenies. I review Bayesian methods to date phylogenies and address a computational problem connected to calibration methods — the most common dating methods of the past two decades. A more recent approach called total-evidence dating is a modelbased statistical method that utilises all available data (molecular, morphological and temporal fossil data) in one joint inference which contrasts with the sequential inference of the calibration methods. I apply total-evidence dating which allows sampled ancestors to a penguin dataset to reveal a very recent (compared to previous estimates) crown penguin radiation.