Skip to main content
ARS Home » Midwest Area » Columbia, Missouri » Plant Genetics Research » Research » Publications at this Location » Publication #347226

Title: Integrating co-expression networks with GWAS to prioritize causal genes in maize

Author
item SCHAEFER, ROBERT - University Of Minnesota
item MICHNO, JEAN-MICHEL - University Of Minnesota
item JEFFERS, JOSEPH - University Of Minnesota
item HOEKENGA, OWEN - Cayuga Genetics Consulting Group, Llc
item DILKES, BRIAN - Purdue University
item Baxter, Ivan
item MEYERS, CHAD - University Of Minnesota

Submitted to: The Plant Cell
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 10/31/2018
Publication Date: 11/8/2018
Citation: Schaefer, R.J., Michno, J., Jeffers, J., Hoekenga, O., Dilkes, B., Baxter, I.R., Meyers, C.L. 2018. Integrating co-expression networks with GWAS to prioritize causal genes in maize. The Plant Cell. 30(12):2922–2942. https://doi.org/10.1105/tpc.18.00299.
DOI: https://doi.org/10.1105/tpc.18.00299

Interpretive Summary: Genome wide association studies (GWAS) are a powerful approach for leveraging diverse lines and modern sequencing techniques to identify loci linked to traits of interest. To date, researchers have used this approach to identify thousands of loci linked to hundreds of traits in many different species. However, the causal genes and the cellular processes they contribute to remain unknown for most loci. This problem is especially pronounced in species where our understanding of individual functions are sparse, including most crop species. Gene expression data available from high throughput sequencing, such as RNA-Seq, are a powerful resource to leverage in identifying candidate genes linked to single nucleotide polymorphisms (SNPs) identified by GWAS. We developed a framework to integrate information from these two data sources to identify candidate genes and applied it to mineral nutrient traits in maize. We tested the method in many different ways, identifying the limitations and which situations were likely to be most promising for success. We demonstrated that the type of analysis, the population and tissue from which the RNA-seq data was derived from have large effects on the signal observed. We also demonstrated that this a valuable approach for identifying candidate genes suggesting that it will have utility in many crop species, including soybean.

Technical Abstract: Genome-wide association studies (GWAS) have identified loci linked to hundreds of traits in many different species. Yet, because linkage equilibrium implicates a broad region surrounding each identified locus, the causal genes often remain unknown. This problem is especially pronounced in nonhuman, nonmodel species, where functional annotations are sparse and there is frequently little information available for prioritizing candidate genes. We developed a computational approach, Camoco, that integrates loci identified by GWAS with functional information derived from gene coexpression networks. Using Camoco, we prioritized candidate genes from a large-scale GWAS examining the accumulation of 17 different elements in maize (Zea mays) seeds. Strikingly, we observed a strong dependence in the performance of our approach based on the type of coexpression network used: expression variation across genetically diverse individuals in a relevant tissue context (in our case, roots that are the primary elemental uptake and delivery system) outperformed other alternative networks. Two candidate genes identified by our approach were validated using mutants. Our study demonstrates that coexpression networks provide a powerful basis for prioritizing candidate causal genes from GWAS loci but suggests that the success of such strategies can highly depend on the gene expression data context. Both the software and the lessons on integrating GWAS data with coexpression networks generalize to species beyond maize.