With the development of high-throughput sequencing, whole-genome analysis, such as genomic prediction and genome-wide association studies (GWAS), plays an important role in animal and plant breeding. Under the infinitesimal model, complex traits are assumed to be affected by many genes with small additive effects, and the relationship between genotypes and phenotypes is linear. In most GWAS and genomic prediction studies, one goal is to estimate the joint effects of all SNP markers. The landmark paper by Meuwissen et al. (2001) introduced the Bayesian linear mixed model for whole-genome prediction, which has been widely used in breeding programs.
However, as the amount and diversity of omics data continue to grow, several challenges arise for the linear mixed model. First, there is a need to extend mixed models to incorporate multiple sequential layers of data as one connected network (e.g., the regulatory cascades). Second, due to increasing concerns about data privacy, there is a need to adopt mixed models for encrypted data, enabling the sharing of confidential data in genome-to-phenome analyses.
New methods were proposed in this thesis that try to solve these two challenges. For the first challenge, we provide a novel framework named mixed model neural network ("NNMM") to extend the mixed model ("MM") to a multilayer neural network ("NN"), thus incorporating sequential layers of data as a unified multilayer network. Nonlinear relationships between different layers of data are allowed via nonlinear activation functions in neural networks. Moreover, NNMM allows various missing patterns for the data in the middle layer, and the network architecture of NNMM can be predefined to be partially connected.
For the second challenge, a homomorphic encryption method based on high-dimensional random orthogonal transformations of the raw data has been proposed in Mott et al. (2020). This method is specifically suited for single-marker regression in GWAS using linear mixed models with Gaussian errors. In this thesis, we will further generalize this homomorphic encryption for genome-to-phenome analysis using mixed models.