Optimal sampling for design-based estimators of regression models in two-phase designs

Reference

Degree Grantor

The University of Auckland

Abstract

The two-phase design collects additional information on a subsample which is selected from the study cohort. It is a cost-effective sampling method when the covariates of interest are expensive to measure for every individual in the cohort. With considerate choices of stratification and phase-two sampling strategies, a two-phase design will be more efficient than simple random sampling. At the design stage, it is desirable to conduct the sampling with the optimal design which will end up with the most efficient estimations. We develop the optimal design for analysis via the IPW estimator. In order to approximate the optimal design, we propose to use a multiwave sampling framework and incorporate the whole cohort information. We show the design efficiency can be further improved using the multiwave sampling with informative priors. Generalized raking is a more efficient class of design-based estimators. We derive the optimal design for analysis via generalized raking estimators. We then compare it with the optimal design for analysis via the IPW estimator and other two-phase designs in measurement-error settings. We show the optimal design for analysis via the IPW estimator is not optimal for the generalized raking estimation but typically gives nearoptimal efficiency. It has previously been shown that semiparametric efficiency under two-phase sampling is not robust to contiguous model misspecification, if the target of inference is defined by a hypothetical analysis of complete data. In two-phase studies, the optimal design for the efficient estimator is often very different from that for design-based estimators. We show this design optimality can also be sensitive to contiguous model misspecification.

Description

DOI

Related Link

Keywords

ANZSRC 2020 Field of Research Codes

Collections