Diagnostic tools and confidence regions for multinomial models

Nyangoma, Stephen Odundo

dc.contributor.advisor	Seber, G. A. F. (George Arthur Frederick), 1938-	en
dc.contributor.author	Nyangoma, Stephen Odundo	en
dc.date.accessioned	2007-06-26T03:39:29Z	en
dc.date.available	2007-06-26T03:39:29Z	en
dc.date.issued	1999	en
dc.identifier	THESIS 00-451	en
dc.identifier.citation	Thesis (PhD--Statistics)--University of Auckland, 1999	en
dc.identifier.uri	http://hdl.handle.net/2292/551	en
dc.description	Full text is available to authenticated members of The University of Auckland only.	en
dc.description.abstract	This thesis is devoted to the study of the asymptotic theory for categorical data models with an emphasis on moderate or small sample sizes. The current diagnostic methods used to analyze multinomial data are generally based on first-order asymptotics. In doing so they sacrifice some information about the geometry of the model. In this thesis, we concentrate therefore on the second-order asymptotics and study the effect of ignoring the second-order terms. In particular, we study the asymptotic properties of residuals and maximum likelihood parameter estimates from the parametric multinomial family of models. In residual analysis, the goal is to use residuals which come close to behaving like their normal linear-theory counterparts. This led to the use of the so-called adjusted residuals (cf. Haberman, 1973; Rao, 1973). Unfortunately these diagnostics are valid only under the requirement of reasonably large sample sizes. One problem of using such methods is the lack of guidelines about the right sample size necessary to warrant their use. In absence of such guidelines, the validity of these methods may be questionable and alternative residuals which do not depend on this requirement may be used. One of the aims of this thesis is to construct general multinomial residuals which not only behave like the linear regression residuals but which also can be used for moderate sample sizes. These residuals take into consideration the nature of the models used by incorporating the second-order information. The diagnostic methods discussed above are useful for finding general inadequacies in a multinomial model. In particular they are useful in detecting extreme multinomial cells. A related problem which cannot be easily addressed by those methods is that of stability, or the study of the variation in the results of the analysis when problem formulation is modified. For example, they cannot be used to assess the impact of individual cells on the various aspects of the fit e.g. parameter estimates and goodness-of-fit statistics. An approach which attempts to quantify the effect of individual observations on the fit is the perturbation method. A common perturbation scheme in regression is that of case deletion. This works well as the observations are independent. In multinomial models, the terms in the log-likelihood function corresponding to the cells are not independent however. In this case, it does not make sense merely to remove a term in the likelihood function. The diagnostics akin to the ones developed for the regression models may be derived by substituting the cell probability by the conditional probability given that the suspect cell is omitted and then forming the likelihood function from the remaining cells (cf. Andersen, 1992). Andersen used this idea to derive a scalar measure of Cook's distance for multinomial models. Related and important problems which he did not examine are the changes in the other diagnostics, such as the Pearson residuals, the deviance and the likelihood displacement resulting from his perturbation scheme. We develop a likelihood theory for the conditional model and study the impact of conditioning on these diagnostics. Moreover, it is not easy to interpret the numerical quantities resulting from Andersen's scalar measure. Consequently, we propose a new Cook's distance for multinomial models that can be interpreted in much the same way as the linear regression ones. A new perturbation scheme which includes the unconditional ("full") model as a special case is also proposed and used to derive further diagnostic measures which include influence curves. Its advantage over Andersen's scheme is that it allows simple perturbations of the cells of interest and can then be used to study the effects of infinitesimal changes in multinomial observations. It enables us to unify the likelihood theory for multinomial models. We also study how the influential observations affect biases in the maximum-likelihood parameter estimators and Pearson residuals. In particular, we study how the biases change when one uses a conditional model instead of the unconditional model. It is shown that the vector of biases are functions of the weighted version of the corresponding ones for the unconditional model. This means that the linear regression theory (cf. Cook and Weisberg, 1982) can be used to express them in terms of the well known quantities for the unconditional model. The main achievement of the whole process is the extension of the differential geometric framework for multinomial models (cf. Wei, 1993, Wei and Shouye, 1995) to conditional multinomial models. This generalizes the results by the latter authors. Another problem of particular interest in this thesis is that of constructing confidence regions for the multinomial parameters. We use the asymptotic theory to construct new asymptotic regions for multinomial parameters and study the effect of including the second-order terms on them. We propose and study two competing methods of constructing these regions.	en
dc.language.iso	en	en
dc.publisher	ResearchSpace@Auckland	en
dc.relation.ispartof	PhD Thesis - University of Auckland	en
dc.relation.isreferencedby	UoA9992980514002091	en
dc.rights	Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated	en
dc.rights	Restricted Item. Available to authenticated members of The University of Auckland.	en
dc.rights.uri	https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm	en
dc.title	Diagnostic tools and confidence regions for multinomial models	en
dc.type	Thesis	en
thesis.degree.discipline	Statistics	en
thesis.degree.grantor	The University of Auckland	en
thesis.degree.level	Doctoral	en
thesis.degree.name	PhD	en
dc.rights.holder	Copyright: The author	en
dc.identifier.wikidata	Q111963792

Files in this item

Name: whole.pdf

Size: 23.05Mb

Format: PDF

This item appears in the following Collection(s)

Doctoral Theses - Authenticated Access [1680]

Show simple item record

Diagnostic tools and confidence regions for multinomial models

Files in this item

This item appears in the following Collection(s)

Search ResearchSpace

Browse

All of ResearchSpace

This Collection

Statistics

Diagnostic tools and confidence regions for multinomial models

Files in this item

This item appears in the following Collection(s)

Share

Search ResearchSpace

Browse

All of ResearchSpace

This Collection

Statistics