Efficiency of the semi-parametric maximum likelihood estimator in generalized case-control studies
Reference
Degree Grantor
Abstract
In this thesis, we investigate the efficiency of the semi-parametric maximum likelihood estimator in the context of generalized case-control studies. We introduce the idea of a multi-sample model and show that it enables us to treat a number of variations of the basic case-control study under the same framework in a natural way. For example, data from a case-control study are a multi-sample, since they consist of two independent samples, one from the case population and one from the control population. Some missing data can also be treated as a multi-sample. We select full data from the full data population and partial data from the other population. Moreover, data gathered using an outcome-dependent two-phase sampling design can also be regarded as a multi-sample. We show that the theory of M-estimation for an i.i.d. model can be extended naturally to multi-sample models, and treat maximum likelihood in these models as a special case of M-estimation. The efficiency of the maximum likelihood estimator can then be studied using the theory of M-estimators in a multi-sample model. Scott & Wild (1997, 2001) use a profile likelihood approach to calculate the semiparametric maximum likelihood estimator in generalized case-control studies. The resulting estimating equations cannot be treated using standard M-estimator theory, since the estimating functions depend on the sample size. We extend the standard treatment of estimating functions to include the derivative of the profile log-likelihood so that the maximum likelihood estimator, the solution to its corresponding estimating equation, is a special case of an M-estimator. We then demonstrate that the semi-parametric MLE is the most efficient among the class of extended M-estimators.