Location: Corn, Soybean and Wheat Quality Research
Title: COMPILE: A GWAS computational pipeline for gene discovery in complex genomesAuthor
HILL, MATTHEW - Massachusetts Institute Of Technology | |
Penning, Bryan | |
MCCANN, MAUREEN - Purdue University | |
CARPITA, NICHOLAS - Purdue University |
Submitted to: BMC Plant Biology
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 5/16/2022 Publication Date: 7/2/2022 Citation: Hill, M.J., Penning, B., Mccann, M.C., Carpita, N.C. 2022. COMPILE: A GWAS computational pipeline for gene discovery in complex genomes. BMC Plant Biology. 22:Article 315. https://doi.org/10.1186/s12870-022-03668-9. DOI: https://doi.org/10.1186/s12870-022-03668-9 Interpretive Summary: Genome-wide association studies have become more common to find chromosome locations containing genes that influence important agronomic traits. However, traits involving many genes require enormous computational power for discovery of candidate genes with reasonable certainty. We established a streamlined computational pipeline (COMPILE) to accelerate identification of potential genes and their possible function. COMPILE uses more recent method to account for how related individuals are in a population to improve accuracy of results. It implements simultaneous analyses of all chromosomes rather than one at a time to speed discovery of chromosome locations. It automatically finds information about genes in that location based on closest similar genes in two plants with better known gene function. This saves days of manual work and is accomplished in only a few minutes. COMPILE was used to uncover the majority of locations found in two previous well-known corn systems, one with a single location/gene responsible for a large portion of a trait and one where many genes and several locations contribute to the trait. Additionally, new locations with potential involvement in each trait based on gene function and expression were discovered. A novel unknown gene was found for tolerance to corn borer damage in corn stems along with several other potential locations in a newly reported data set. While designed around the model system of corn, COMPILE is customizable and readily adaptable for application to other plants that have the appropriate data available. Thus, scientists can customize it for their plant or animal of choice. Since COMPILE is designed to be modular, data for corn, Arabidopsis, rice, or all three can be replaced by other available plant data. This paper and associated program provides an improved tool for scientists using different populations to discover genes that are involved in agronomic traits of interest. Technical Abstract: Genome-Wide Association Studies (GWAS) are used to identify genes and alleles that contribute to quantitative traits in large and genetically diverse populations. However, traits with complex genetic architectures create an enormous computational load for discovery of candidate genes with acceptable statistical certainty. We established a streamlined computational pipeline for GWAS (COMPILE) to accelerate identification and annotation of candidate genes. COMPILE executes GWAS using a Mixed Linear Model that incorporates, without compression, recent advancements in population structure control, then links significant Quantitative Trait Loci (QTL) to candidate maize genes and RNA regulatory elements contained in the maize genome. The algorithm then matches maize genes to their closest rice and Arabidopsis homologs by sequence similarity. A sub-program FOCUS allows comparison of results obtained with different marker sets and provides a data visualization tool to examine chromosomal regions at higher resolution. We validated COMPILE using published data to identify QTL associated with two traits of a-tocopherol biosynthesis and flowering time and identified published candidate genes as well as additional genes and non-coding RNAs. We then applied COMPILE to 274 genotypes of the maize Goodman Association Panel to identify candidate loci contributing to resistance of maize stems to penetration by larvae of the European Corn Borer (Ostrinia nubilalis). Candidate genes included those that encode a gene of unknown function, WRKY and MYB80 transcriptional factors, receptor-kinase signaling, riboflavin synthesis, nucleotide-sugar interconversion, and prolyl hydroxylation. Expression of the gene of unknown function has been association with pathogen stress in maize and in rice homologs closest in sequence identity. The relative speed of data analysis using COMPILE allowed comparison of population size and compression. Limitations in population size and diversity are major constraints for the resistance trait and are not overcome by increasing marker density. COMPILE is customizable and is readily adaptable for application to other species with genomic and proteome databases. |