Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #369435

Research Project: Improving Crop Efficiency Using Genomic Diversity and Computational Modeling

Location: Plant, Soil and Nutrition Research

Title: Multiple maize reference genomes impact the identification of variants by genome-wide association study in a diverse inbred panel

Author
item Gage, Joseph
item VAILLANCOURT, BRIEANNE - Michigan State University
item HAMILTON, JOHN - Michigan State University
item MANRIQUE-CARPINTERO, NORMA - Michigan State University
item GUSTAFSON, TIMOTHY - Monsanto Corporation
item BARRY, KERRIE - Department Of Energy
item LIPZEN, ANNA - Department Of Energy
item TRACY, WILLIAM - University Of Wisconsin
item MIKEL, MARK - University Of Illinois
item KAEPPLER, SHAWN - University Of Illinois
item BUELL, ROBIN - Michigan State University
item DE LEON, NATALIA - University Of Wisconsin

Submitted to: The Plant Genome
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 2/4/2019
Publication Date: 6/28/2019
Citation: Gage, J.L., Vaillancourt, B., Hamilton, J.P., Manrique-Carpintero, N.C., Gustafson, T.J., Barry, K., Lipzen, A., Tracy, W.F., Mikel, M.A., Kaeppler, S.M., Buell, R.C., De Leon, N. 2019. Multiple maize reference genomes impact the identification of variants by genome-wide association study in a diverse inbred panel. The Plant Genome. 12:2. https://doi.org/10.3835/plantgenome2018.09.0069.
DOI: https://doi.org/10.3835/plantgenome2018.09.0069

Interpretive Summary: Identifying genes that are associated with traits of interest is important for making plant breeding more efficient and for studying how traits are controlled. The corn genome varies considerably between varieties, with some genes being present in certain varieties but absent in others. This makes it considerably more difficult to identify genes that control certain traits, especially if the genes of interest are not present in varieties of interest. Using genomes from three different maize varieties, we conducted standard approaches for identifying genes associated with disease resistance. We show that using genomes from multiple varieties can help identify genes controlling traits, particularly when those genes are not always present in all varieties. These results help demonstrate how the use of multiple genome assemblies can improve identification of candidate genes for agronomically important traits. More genome assemblies are becoming publicly available, and by integrating information from all of them, researchers will increase their likelihood of identifying genes that can be used as markers for breeding or for further study of particular biological phenomena. This paper gives an example of a particular gene that took nearly 15 years to identify and characterize. By using the methods we describe, the gene was easily identified from a single analysis. In addition to the methods described, this paper also serves to release the genome assembly for corn variety ‘PHJ89’, making yet another genome assembly publicly available for use in other researchers’ studies.

Technical Abstract: Use of a single reference genome for genome-wide association studies (GWAS) limits the gene space represented to that of a single accession. This limitation can complicate identification and characterization of genes located within presence–absence variations (PAVs). In this study, we present the draft de novo genome assembly of ‘PHJ89’, an ‘Oh43’-type inbred line of maize (Zea mays L.). From three separate reference genome assemblies (‘B73’, ‘PH207’, and PHJ89) that represent the predominant germplasm groups of maize, we generated three separate whole-seedling gene expression profiles and single nucleotide polymorphism (SNP) matrices from a panel of 942 diverse inbred lines. We identified 34,447 (B73), 39,672 (PH207), and 37,436 (PHJ89) transcripts that are not present in the respective reference genome assemblies. Genome-wide association studies were conducted in the 942 inbred panel with both the SNP and expression data values to map Sugarcane mosaic virus (SCMV) resistance. Highlighting the impact of alternative reference genomes in gene discovery, the GWAS results for SCMV resistance with expression values as a surrogate measure of PAV resulted in robust detection of the physical location of a known resistance gene when the B73 reference that contains the gene was used, but not the PH207 reference. This study provides the valuable resource of the Oh43-type PHJ89 genome assembly as well as SNP and expression data for 942 individuals generated from three different reference genomes.