Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Animal Genomics and Improvement Laboratory » Research » Publications at this Location » Publication #399665

Research Project: Increasing Accuracy of Genomic Prediction, Developing Algorithms, Selecting Markers, and Evaluating New Traits to Improve Dairy Cattle

Location: Animal Genomics and Improvement Laboratory

Title: SLEMM: million-scale genomic predictions with window-based SNP weighting

Author
item CHENG, JIAN - North Carolina State University
item MALTECCA, CHRISTIAN - North Carolina State University
item Vanraden, Paul
item O'CONNELL, JEFFREY - University Of Maryland School Of Medicine
item MA, LI - University Of Maryland
item JIANG, JICAI - University Of Maryland

Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 3/7/2023
Publication Date: 3/10/2023
Citation: Cheng, J., Maltecca, C., Van Raden, P.M., O'Connell, J., Ma, L., Jiang, J. 2023. SLEMM: million-scale genomic predictions with window-based SNP weighting. Bioinformatics. 39(3):btad127. https://doi.org/10.1093/bioinformatics/btad127.
DOI: https://doi.org/10.1093/bioinformatics/btad127

Interpretive Summary: Genomic prediction is a method for estimating breeding values with whole-genome genotypes. The number of genotyped animals is quickly increasing. It is appealing yet computationally challenging to use many genotyped individuals for potentially improving genomic predictions. We present SLEMM, a new software tool, to address the computational challenge. SLEMM builds on fast algorithms and parallel programing. Extensive data analysis demonstrated that SLEMM is accurate, fast, and memory-efficient compared to existing state-of-the-art methods including Bayesian mixture models (BayesR and LDAK) and machine-learning-optimized mixed model (KAML). In particular, testing on large data showed that SLEMM can effectively process millions of genotyped individuals.

Technical Abstract: Motivation: The amount of genomic data is increasing exponentially. Using many genotyped and phenotyped individuals for genomic prediction is appealing yet challenging. Results: We present SLEMM, a new software tool, to address the computational challenge. SLEMM builds on an efficient implementation of the stochastic Lanczos algorithm for REML in a framework of mixed models. We further implement SNP weighting in SLEMM to improve its predictions. Extensive analyses on seven public datasets, covering 19 polygenic traits in three plant and three livestock species, showed that SLEMM with SNP weighting had overall the best predictive ability among a variety of genomic prediction methods including GCTA’s empirical BLUP, BayesR, KAML, and LDAK’s BOLT and BayesR models. We also compared the methods using nine dairy traits of ~300k genotyped cows. All had overall similar prediction accuracies, except that KAML failed to process the data. Additional simulation analyses on up to three million individuals and one million SNPs showed that SLEMM was advantageous over counterparts as for computational performance. Overall, SLEMM can do million-scale genomic predictions with an accuracy comparable to BayesR. Availability: https://github.com/jiang18/slemm.