Abstract:
Complex diseases impact millions of people worldwide, caused by a variety of genetic and environmental factors. Genome-wide association studies have done an adequate job of identifying genetic variants associated with these complex diseases. Subsequently, polygenic risk models have been used to predict the disease risk of individuals with meaningful accuracy. However, the association studies and risk modelling cannot determine nor predict the underlying genetic architecture of the associated variants. In this thesis, I have developed a computational approach that integrates complex disease variants, their related tissue-specific gene regulation information, and individual genotype data. The essential information was selected from the combined data by the Mann Whitney U test and machine learning regularization. This information was then evaluated by a series of logistic regression predictor models to predict individual disease risk. With validation across multiple genotyped populations, the best predictor model was used to identify the most predictive regulatory elements conferring the complex disease risk. Applying this computational approach to study T1D and PD, my regularized predictor models revealed tissue-specific gene regulation impacting T1D and PD disease risk. The regularized logistic regression models supported a clear platform for interpreting the molecular mechanisms underlying the genetic components of the predictor model. These analyses implicate important insights into the mechanisms acting on different tissues to modulate T1D and PD onset and development.
The novelties of the regularised predictor modelling approach are the ability to distinguish trans and cis eQTL regulatory effects of disease-associated SNPs across tissues. Using Mann Whitney U Test filtering controlled by Benjamini Yekutieli FDR and machine learning regularisation, I can establish the curated associations of the eQTL regulatory effects in different tissues. Furthermore, my predictor models can estimate the risk contribution of each tissue-specific eQTL regulatory effect for identifying the crucial tissues and their essential SNP modulated eQTL elements.