Abstract:
Maximum likelihood estimation generally requires finding exact density or mass functions of probability distributions, which are often intractable for complicated statistical models. This PhD thesis shows that probability density approximation can be an effective tool to address this problem. Following this idea, we investigate two specific problems arising in the contexts of ecology and population genetics. In the first project, we investigate the problem of parameter estimation under latent multinomial models, in which observed data are obtained from a linear transformation of a latent vector of counts arising from a multinomial distribution with unknown parameters. Currently, inference under these models relies primarily on Bayesian methods, which involve long computation times and often require expert implementation. In this thesis, we present a novel likelihood-based approach suitable for all models in the class, using likelihoods constructed by the saddlepoint approximation method. We validate the method by applying it to specific models for which exact or approximate likelihoods are available, by comparing it with other estimation approaches, and by simulation. The saddlepoint method consistently gives accurate inference while being considerably faster than Bayesian methods and more general than other alternative estimation approaches. We show the generality of the approach by applying it to two new models for which no existing likelihood-based approach has been proposed. In the second project, we propose a new method for estimating the evolutionary parameters of mutation rate and recombination rate from sample data of r2, which is a common measure of linkage disequilibrium in population genetics. The probability density function of r2 is an unknown and complicated function of the evolutionary parameters. Our interest is focused on exploring the quantitative properties and sampling distribution of r2. We demonstrate that a finite sequence of moments of r2 can be computed without knowing its probability distribution under the diffusion approximation. From the moments obtained, we construct an approximate probability density function of r2 for a two-locus genetic model using the maximum entropy principle. This density is then used for parameter estimation. The performance of the proposed method is shown by simulation studies and real data analysis.