Abstract:
The Bayes factor is a common method for statistical model selection. The computation of such factor is based on the marginal likelihood, an integral that can be hard to estimate depending on the model complexity. The models employed in phylogenetic inference are of very high complexity. In this case, a direct computation of the Bayes factor is infeasible, and numerical methods or approximations are needed for its estimation. Model selection is an integral part of this eld, but it tends to be obstructed by the requirements of the established marginal likelihood estimation methods, such as generalized steppingstone sampling. In this work, we introduce nested sampling to phylogenetics, a Bayesian algorithm which provides the means to estimate the marginal likelihood, and simultaneously sample from the posterior distribution. We study the behaviour of nested sampling for several statistical and phylogenetic scenarios and compare its performance to established estimation methods like steppingstone sampling. We introduce and discuss extensions to the initial algorithm, allowing for variable tree topology, estimating Bayes factor directly, and using importance sampling approaches to further improve its performance. Nested sampling has been shown to work in situations where most MCMC methods fail, e.g., if the true distribution is a mixture of quite distinct distributions. We show that the algorithm and its extensions offer a relatively cheap alternative to estimate, in a single run, the marginal likelihood together with its uncertainty, unlike established methods. It also permits us to sample from the posterior distribution at no extra cost. Overall, we establish nested sampling as a valuable alternative in Bayesian phylogenetics, in particular for model selection and parameter inference.