Two-phase subsampling for DNA sequencing with application to endangered species

Show simple item record

dc.contributor.advisor Lumley, Thomas
dc.contributor.advisor Stevenson, Ben
dc.contributor.author Luo, Pei (Zoe)
dc.date.accessioned 2024-02-26T20:51:12Z
dc.date.available 2024-02-26T20:51:12Z
dc.date.issued 2024 en
dc.identifier.uri https://hdl.handle.net/2292/67513
dc.description.abstract Whole-genome sequencing for New Zealand endangered parrot species kākāpō has been completed for the entire population. Despite the decreasing cost of DNA sequencing, this sort of effort is generally not feasible in conservation studies or large human cohorts. A cost-saving strategy is to obtain relatively inexpensive information for the whole sample, such as low-resolution genotype data, then resequence a small subsample from the original sample with higher resolution and use the combined data to infer the whole sample. Such sampling strategies are called two-phase sampling, where the initial sampling of the cohort is followed by a subsampling of the chosen individuals to be resequenced. This thesis explores the two classes of approaches to handling incomplete data in twophasing sampling designs under different situations. The first class of approaches is genotype imputation, which is a process of predicting the missing genotypes using low-resolution genotypes of the whole sample and high-resolution genotypes of the subsample. However, genotype imputation is much more complicated for endangered species than for well-studied species such as humans, livestock and other model organisms. Alternatively, statistical inference of model parameters under two-phase sampling designs can be carried out by maximum likelihood approaches that account for the missing mechanisms of the data, which is another class of approaches that I explore. In genetic association studies, the polygenic model is often used to describe the architecture of complex traits as it allows the possibility that thousands of variants could contribute to the phenotypic variation in the population. Under such a proposition, mixed models can be used to measure the genetic effect of a particular variant while attributing the remaining variation to the population correlation structure. In this thesis, I propose a weighted maximum likelihood approach for fitting mixed models that takes advantage of the fact that the kākāpō population relatedness structure is known, making it possible to incorporate the population covariance matrix rather than the sample covariance matrix into the model. The performance of the proposed method is evaluated using the kākāpō data and simulated data with a population structure similar to humans. Hence the method should provide a general solution for fitting mixed models under two-phase sampling designs in both endangered species and human populations.
dc.publisher ResearchSpace@Auckland en
dc.relation.ispartof PhD Thesis - University of Auckland en
dc.rights Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
dc.rights.uri https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm en
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/nz/
dc.title Two-phase subsampling for DNA sequencing with application to endangered species
dc.type Thesis en
thesis.degree.discipline Statistics
thesis.degree.grantor The University of Auckland en
thesis.degree.level Doctoral en
thesis.degree.name PhD en
dc.date.updated 2024-02-21T22:25:11Z
dc.rights.holder Copyright: The author en
dc.rights.accessrights http://purl.org/eprint/accessRights/OpenAccess en


Files in this item

Find Full text

This item appears in the following Collection(s)

Show simple item record

Share

Search ResearchSpace


Browse

Statistics