Investigation into Left Censored Stutter Data for the D22 Locus

Gasston, Julia

Investigation into Left Censored Stutter Data for the D22 Locus

Gasston, Julia

Identifier: http://hdl.handle.net/2292/50725

Issue Date: 2020

Degree Grantor: The University of Auckland

Rights: Copyright: The author

Rights (URI): https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm

Abstract:

Understanding stutter behaviour is key to getting an accurate likelihood statistic from DNA analysis. Many different methods have been explored to address understanding stutter, however none of these include up to date statistical left censored data methods. Methods discussed in this research include; a substitution method, censored regression with maximum likelihood estimation, and censored regression with Bayesian estimation. Stutter ratio and stutter height were utilised as the variables of interest and the behaviour of both were analysed. D22 was selected as the locus of interest as its stutter behaviour is different from all other loci, most importantly it tends to stutter more than other loci. Stutter behaviour for back stutter, double back stutter, and forward stutter was assessed and models created. All censored stutter data was skewed with long tails, so any normal distribution was quickly ruled out. The substitution technique was found to be successful in correctly modelling back stutter, where the substituted values were close to the values that would be expected to be seen. For forward and double back stutter these values were not accurate and hence the substitution technique was suboptimal. Overall the censored regression with maximum likelihood estimates did the best job at parameter estimation and modelling stutter behaviour. The Bayesian technique created estimates that were close to the maximum likelihood estimates. Results for back stutter height found that a Gamma Distribution censored regression model, which included both allele and allele height, could model stutter behaviour. Results for forward stutter height found that a Log Normal Distribution censored regression model, which included both allele and allele height, could model stutter behaviour. Results for double back stutter should not be applied as the percentage of censoring was too high (approximately 90%) to obtain meaningful results. Overall, back stutter, forward stutter and double back stutter behaved similar. Researchers should find that a simple Log Normal censored regression model with maximum likelihood estimates will appropriately model all three.

Description:

Full Text is available to authenticated members of The University of Auckland only.

Show full item record

Files in this item

Name: whole.pdf

Size: 4.565Mb

Format: PDF

This item appears in the following Collection(s)

Masters Theses - Authenticated Access [6749]

Investigation into Left Censored Stutter Data for the D22 Locus

Investigation into Left Censored Stutter Data for the D22 Locus

Abstract:

Description:

Files in this item

This item appears in the following Collection(s)

Search ResearchSpace

Browse

All of ResearchSpace

This Collection

Statistics

Investigation into Left Censored Stutter Data for the D22 Locus

Investigation into Left Censored Stutter Data for the D22 Locus

Abstract:

Description:

Files in this item

This item appears in the following Collection(s)

Share

Search ResearchSpace

Browse

All of ResearchSpace

This Collection

Statistics