Abstract:
In this thesis, we present a stochastic model of a coupled host and parasite population. The model is similar to the Moran model in population genetics. The parasite may be passed from parent to child at birth and between any two individuals at an infectious contact event. Hosts uninfected by the parasite may have a selective advantage. The dual of this process models the intertwined genealogies of the host and the parasite as subgraphs of a graph we call the dual infection and selection graph. We derive the diffusive limit of the dual process and discuss its close relationship to other genealogical models such and Kingman's coalescent and the ancestral selection graph. We show that the marginal viral and host genealogies in the diffusive limit are equivalent to the structured coalescent where population size is determined by the prevalence of infection in the host population. We fit the dual graph process to data using a Bayesian framework and Markov chain Monte Carlo sampling techniques. The data types we consider are viral sequence data and partially observed host genealogies. The dual graph process defines a prior distribution over dual graphs and, therefore, genealogies. The general time reversible model of mutation defines a likelihood for the viral genealogies. We set out in detail a Markov chain Monte Carlo sampling algorithm on the space of dual graphs and describe a test suite for debugging and verification of the algorithm. We run the algorithm on three data sets: data simulated according to the forward model, data from cougars with the endemic feline immunodeficiency virus and data from inhabitants of the United Kingdom with the human immunodeficiency virus. Runs on the simulated data, where parameter values are chosen to be physically relevant, show that known host and viral genealogies can be informative about the contact and selection parameters. The runs on the cougar data show it to be somewhat limited but informative about the contact parameter. Runs on the HIV data show that we are able to predict contact rates and the time of introduction of the different viral strains into the population.