Abstract:
Accurate disease prediction is expected to facilitate the precision medicine with emerging genetic findings and other demonstrated knowledge (Ashley, 2015). While rare genetic variants, multi-omic information and family structure have provided unprecedented data resources for predictive studies, few related analytical
approaches were developed in predicting the risk of complex traits.
In the first project, I developed a Bayesian linear mixed model (BLMM),
where genetic effects were modelled using a hybrid of the sparsity regression and
linear mixed model with multiple random effects. The parameters in BLMM were
inferred through a computationally efficient variational Bayes algorithm. The
proposed method can resemble the shape of the true effect size distributions,
captures the predictive effects from both common and rare variants, and is robust
against various disease models. Through extensive simulations and the application
to a whole-genome sequencing dataset obtained from the Alzheimer’s Disease
Neuroimaging Initiatives, I have demonstrated that BLMM has better prediction
performance than existing methods and can detect variables and/or genetic regions
that are predictive.
In the second project, I developed a Bayesian linear mixed model for the
prediction analysis of sequencing data obtained from family-based studies. Our
method can not only capture predictive effects from both common and rare variants,
but also easily accommodate various disease model assumptions. It uses information
embedded in the study design to form surrogates, where the predictive effects from unmeasured/unknown genetic and environmental risk factors can be modelled.
Through extensive simulation studies and the analysis of sequencing data obtained
from the Michigan State University Twin Registry study, I have demonstrated
that the proposed method outperforms commonly adopted techniques.
In the third project, I proposed a two-step BLMM-based Bayesian framework
(TBLMM) for risk prediction with multi-level omics data. It can not only capture
various types of effects from multi-omics data simultaneously, but also can capture
complex within/between omics layer interactions through fused kernel functions.
Through extensive simulations and the application to PET-imaging outcomes
from the Alzheimer’s Disease Neuroimaging Initiative, I have demonstrated that
TBLMM can consistently outperform the existing method in predicting the risk of
complex traits.