Skip to main content
ARS Home » Midwest Area » Columbia, Missouri » Plant Genetics Research » Research » Publications at this Location » Publication #389381

Research Project: Genetic and Physiological Mechanisms Underlying Complex Agronomic Traits in Grain Crops

Location: Plant Genetics Research

Title: Trait association and prediction through integrative K-mer analysis

Author
item HE, CHENG - Kansas State University
item Washburn, Jacob
item SCHLEIF, NATHANIEL - University Of Wisconsin
item HAO, YANGFAN - Kansas State University
item KAEPPLER, HEIDI - University Of Wisconsin
item KAEPPLER, SHAWN - University Of Wisconsin
item ZHANG, ZHIWU - Washington State University
item YANG, JINLIANG - University Of Nebraska
item LIU, SANZHEN - Kansas State University

Submitted to: The Plant Journal
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 8/22/2024
Publication Date: 9/11/2024
Citation: He, C., Washburn, J.D., Schleif, N., Hao, Y., Kaeppler, H., Kaeppler, S., Zhang, Z., Yang, J., Liu, S. 2024. Trait association and prediction through integrative K-mer analysis. The Plant Journal. 120(2): 833-850. https://doi.org/10.1111/tpj.17012.
DOI: https://doi.org/10.1111/tpj.17012

Interpretive Summary: Genome-wide association study (GWAS) and genomic prediction (GP) are popular and effective methods for determining which genes potentially contribute to a trait, and for predicting how different individuals manifest that trait. Both methods traditionally require the mapping of DNA sequences to a reference sequenced genome. This mapping process is error prone and depends on the quality and existence of a reference genome. An alternative approach was developed and tested for using k-mers, short k-length fragments from DNA sequences, directly without a mapping step. This approach was shown to work in ways that are complimentary to traditional methods, and in some cases more accurate than those methods.

Technical Abstract: Genome-wide association study (GWAS) with single nucleotide polymorphisms (SNPs) has been widelyused to explore genetic controls of phenotypic traits. Alternatively, GWAS can use counts of substrings of length k from longer sequencing reads, k-mers, as genotyping data. Using maize cob and kernel color traits, we demonstrated that k-mer GWAS can effectively identify associated k-mers. Co-expression analysis of kernel color k-mers and genes directly found k-mers from known causal genes. Analyzing complex traits of kernel oil and leaf angle resulted in k-mers from both known and candidate genes. A gene encoding a MADS transcription factor was functionally validated by showing that ectopic expression of the gene led to less upright leaves. Evolution analysis revealed most k-mers positively correlated with kernel oil were strongly selected against in maize populations, while most k-mers for upright leaf angle were positively selected. In addition, genomic prediction of kernel oil, leaf angle, and flowering time using k-mer data resulted in a similarly high prediction accuracy to the standard SNP-based method. Collectively, we showed k-mer GWAS is a powerful approach for identifying trait-associated genetic elements. Further, our results demonstrated the bridging role of k-mers for data integration and functional gene discovery.