Abstract:
Coalescent-based estimation of population parameters from genetic data has been shown to be an effective and useful method, and is widely used. Recently the coalescent has been extended to include Measurably Evolving Populations (MEPs) where serially sampled sequences have a significant accumulation of substitutions over time. Previously Drummond et al. (2002) developed a Bayesian MCMC method that simultaneously estimates mutation rate and population sizes from serial samples. We present a Bayesian statistical inference approach that extends the method of Drummond et al. (2002) by simultaneously estimating mutation rate, population sizes and migration rates in an island-structured population, using temporal and spatial sequence data. Markov Chain Monte Carlo is used to collect samples from the posterior probability distribution. We demonstrate that this chain implementation successfully reaches equilibrium and recovers truth for simulated data. A dataset comprising DNA sequences from Human Immunodeficiency Virus (HIV-1) from two demes, semen and blood, is used to demonstrate the method by fitting asymmetric migration rates and different population sizes. This dataset exhibits a bimodal joint posterior distribution, with modes favouring different preferred migration directions. This full dataset was subsequently split temporally for further analysis. The qualitative behaviour of one subset was similar to the bimodal distribution observed with the full dataset. The temporally split data showed significant differences in the posterior distributions and estimates of parameter values over time. We expand this method to incorporate changes to the number of demes and patterns of colonization. Often, when estimating population parameters or other parameters of interest from genetic data, the demographic structure and parameters are not constant over evolutionary time. We extend the Bayesian MCMC method so that it allows for step changes in mutation, migration, and population sizes, as well as changing numbers of demes, where the times of these changes are also estimated. We show that, in parameter ranges of interest, reliable estimates can often be obtained, including the historical times of parameter changes. However posterior densities of migration rates can be quite diffuse and estimators somewhat biased, as reported by other authors. Finally we consider the case where we may not be able to sample a hidden deme at one or more time points. It was found that the problem of subpopulation model misspecification causes significant bias in mode estimators of population sizes. It was found, however, that the addition of the data for the final time point aided the inference significantly, by removing bias and allowing modest accuracy in estimation of population sizes and migration rates. The results indicate that a significant contribution to the field can be made by developing a Bayesian model averaging method to answer the question: what is the population structure? Further improvements can be made with the simultaneous estimation of recombination rates or selection, but such methods would be computationally expensive.