Inferring Gene Regulatory Networks from a Population of Yeast Segregants (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41598-018-37667-4.pdf

Inferring Gene Regulatory Networks from a Population of Yeast Segregants

www.nature.com/scientificreports OPEN Received: 31 July 2018 Accepted: 30 November 2018 Published: xx xx xxxx Inferring Gene Regulatory Networks from a Population of Yeast Segregants Chen Chen1, Dabao Zhang1,3, Tony R. Hazbun2,3 & Min Zhang1,3 Constructing gene regulatory networks is crucial to unraveling the genetic architecture of complex traits and to understanding the mechanisms of diseases. On the basis of gene expression and single nucleotide polymorphism data in the yeast, Saccharomyces cerevisiae, we constructed gene regulatory networks using a two-stage penalized least squares method. A large system of structural equations via optimal prediction of a set of surrogate variables was established at the first stage, followed by consistent selection of regulatory effects at the second stage. Using this approach, we identified subnetworks that were enriched in gene ontology categories, revealing directional regulatory mechanisms controlling these biological pathways. Our mapping and analysis of expression-based quantitative trait loci uncovered a known alteration of gene expression within a biological pathway that results in regulatory effects on companion pathway genes in the phosphocholine network. In addition, we identify nodes in these gene ontology-enriched subnetworks that are coordinately controlled by transcription factors driven by trans-acting expression quantitative trait loci. Altogether, the integration of documented transcription factor regulatory associations with subnetworks defined by a system of structural equations using quantitative trait loci data is an effective means to delineate the transcriptional control of biological pathways. Gene expression is a fundamental step in the flow of information from an organism’s genotype to phenotype. The genetic information encoded in an organism’s DNA is transferred into a functional gene product (e.g., protein) via the process of gene expression, and gene expression leads to the formation of the organism’s phenotype. Gene expression have been found to be associated with a broad range of complex traits and diseases1, and thus play an important role in determining an organism’s development. Numerous efforts have been made to map phenotypes to gene expression in order to dissect their genetic basis. Genes rarely act in isolation; instead, they interact with each other and make up gene regulatory networks to function as a whole2. The study of this mechanism is crucial for understanding the properties and functions of genes, which help reveal the genetic architecture of complex traits and diseases. Although genetic experiments can be conducted to discover interactions among genes, this approach can be costly and time consuming. Alternatively, measurements of gene expression levels reveal gene expression patterns in a specific condition and can be exploited to infer gene regulatory networks. Various approaches have been proposed to infer gene regulatory networks using gene expression data, such as relevance networks3–7, Bayesian networks8–11, Gaussian graphical models12–15, and many others. Recent advances in sequencing technologies make it feasible to obtain both whole-genome genotype and gene expression for each individual, i.e., genetical genomics data16. Combining genetics with gene expression reveals additional information on genetic structure and holds great promise for improving the accuracy of gene regulatory network inference. Numerous genetical genomics experiments, such as the Genotype-Tissue Expression (GTEx) project17, have been conducted to collect genetical genomics data. Much effort has been devoted to using genetical genomics data for genome-wide association (GWA) analysis of gene expression, i.e., expression quantitative trait loci (eQTL) mapping18. Mapping of eQTL intends to elucidate variation of expression traits attributed to genomic variation, and to identify chromosomal loci (i.e., eQTL) 1 Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA. 2Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, IN, 47907, USA. 3Purdue University Center for Cancer Research, Purdue University, West Lafayette, IN, 47907, USA. Correspondence and requests for materials should be addressed to D.Z. (email: ) or T.R.H. (email: ) or M.Z. (email: ) Scientific Reports | (2019) 9:1197 | https://doi.org/10.1038/s41598-018-37667-4 1 www.nature.com/scientificreports/ of genetic polymorphisms associated to the expression of a gene under investigation. An eQTL located within the region of the gene under investigation is called a cis-eQTL, otherwise it is called a trans-eQTL. While the cis effects of a gene represent direct regulations, indirect regulations of trans-eQTL are likely caused by interactions among genes. These eQTL provide insight on the functional sequences of the gene expression, and thus an indirect interrogation of the functional landscape of gene regulations19. Gene regulatory networks can be characterized using a system of structural equations20, with each equation describing the causal effects of cis-eQTL and the regulatory effects of other genes on a given gene. Such a framework makes it feasible to take a genome-wide survey and to directly reveal interactions among genes. Application of structural equations in genetical genomics studies have been previously demonstrated21–24. Two studies are applicable to constructing gene regulatory networks for a small number of genes21,22. However, genetical genomics experiments usually collect whole-genome gene expressions for a very limited number of samples, therefore the number of genes is much larger than the sample size. For such consideration, another study23 proposed to apply the adaptive lasso25 to construct a sparse gene regulatory network. An additional approach instead proposed to maximize a penalized likelihood for constructing a sparse gene regulatory network24. Here we construct gene regulatory networks in yeast via building up a large system of structural equations with the two-stage penalized least squares (2SPLS) method26. We applied the 2SPSLS method to an eQTL dataset derived from a cross between a wild yeast vineyard strain and a laboratory strain27. Fitting one linear model for each gene at each stage, the 2SPLS method develops optimal prediction of a set of conditional expectations at the first stage, and consistent selection of regulatory effects from massive candidates at the second stage. It is computationally fast and allows for parallel implementation, outperforming the adaptive lasso based algorithm23, and the sparsity-aware maximum likelihood algorithm24, in terms of both accuracy and speed, for identifying regulatory effects in different network structures. This parallel implementation makes it feasible to evaluate the significance of regulatory effects via the bootstrap method. Using this approach we identified subnetworks that were enriched in gene ontology (...truncated)