Abstract:
Linear mixed models have long been the method of choice for risk prediction analysis
on high-dimensional data, where random effect terms are used to capture predictive
effects from multiple markers. However, it remains computationally challenging to
simultaneously model a large number of variables that can be noise or have predictive
effects of complex forms. In this thesis, we first develop a penalized linear mixed
model with generalized method of moments estimators for prediction analyses. The
proposed method adopts the generalized method of moments estimators to improve
computational efficiency and uses the L1 penalty to select predictors. We show that
generalized method of moments estimators have oracle properties, including variable
selection consistency, estimation consistency, and asymptotic normality. We further
develop a hybrid screening rule that constitutes of the sequential strong rule and the
enhanced dual polytope projection rule to reduce data dimension and improve computational
efficiency. The proposed hybrid screening rule projects solutions to the
objective function of the proposed penalized linear mixed model into the dual space,
and then uses the sequential strong rule and the enhanced dual polytope projection
rule to detect inactive variables in the space. We show that the hybrid screening rule
aligns well with the proposed downstream prediction model, and it can correctly and
efficiently discard a large number of variables with no predictive effects in the corresponding
penalized linear mixed model. Lastly, we incorporate multiple kernels into the
proposed penalized linear mixed model to model high-dimensional multi-omics data,
where the interactive roles of multi-omics data and their complex types of predictive
effects are captured. Through extensive simulation studies, we have demonstrated that
the proposed methods are computationally efficient and can be applied to genome-wide
data. They can capture predictive effects of complex forms and outperform competing
linear mixed models. In the prediction analyses of PET-imaging outcomes using
high-dimensional omics data, we find that the proposed method has better prediction
performance than commonly used methods, and our analyses show that genetic variants
on APOE, APOC1, TOMM40 and FADS3 genes are highly predictive.