Publication : USDA ARS

ARS Home » Plains Area » Miles City, Montana » Livestock and Range Research Laboratory » Research » Publications at this Location » Publication #345774

Title: Fast genomic prediction of breeding values using parallel Markov chain Monte Carlo with convergence diagnosis

Author

	GUO, PENG - Chinese Academy Of Agricultural Sciences
	GAO, HUIJIANG - Chinese Academy Of Agricultural Sciences
	ZHU, BO - Chinese Academy Of Agricultural Sciences
	GAO, XUE - Chinese Academy Of Agricultural Sciences
	XU, LINGYANG - Chinese Academy Of Agricultural Sciences
	Hay, El Hamidi
	NIU, HONG - Chinese Academy Of Agricultural Sciences
	WANG, ZHEZAO - Chinese Academy Of Agricultural Sciences
	LI, JUNYA - Chinese Academy Of Agricultural Sciences
	LINAG, YONGHU - Chinese Academy Of Agricultural Sciences
	CHEN, YEN - Chinese Academy Of Agricultural Sciences
	ZHANG, LUPEI - Chinese Academy Of Agricultural Sciences
	NI, HEMIN - Chinese Academy Of Agricultural Sciences
	GUO, YONG - Chinese Academy Of Agricultural Sciences

Submitted to: BMC Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 12/18/2017
Publication Date: 1/3/2018
Citation: Guo, P., Gao, H., Zhu, B., Gao, X., Xu, L., Hay, E.A., Niu, H., Wang, Z., Li, J., Linag, Y., Chen, Y., Zhang, L., Ni, H., Guo, Y. 2018. Fast genomic prediction of breeding values using parallel Markov chain Monte Carlo with convergence diagnosis. BMC Bioinformatics. 19:1-11. https://doi.org/10.1186/s12859-017-2003-3.
DOI: https://doi.org/10.1186/s12859-017-2003-3

Interpretive Summary: Predicting genomically enhanced breeding values using Bayesian models is computationally demanding. One approach to mitigate this computational cost is the use of parallel computing. This approach allows several processing units to execute the computational task simultaneously. In addition, the number of samples in the burn-in period of Markov chain Monte Carlo is often set to a large value in order to improve the accuracy of genomic predictions. However, increasing the number of burn-in samples does not improve the performance of MCMC after it reaches its equilibrium distribution. Therefore, in this study, we proposed an automatically tuned strategy for setting the number of samples in the burn-in period via a parallel multiple chain when implementing Bayesian models for genomic prediction.

Technical Abstract: Genomic prediction based on Bayesian models via Markov Chain Monte Carlo (MCMC) is computationally intensive. One approach to reduce this computational cost is parallel computing. This type of computing architecture allows several processing units to execute the computational task simultaneously. In addition, the number of samples in the burn-in period of MCMC is often set to a large value in order to improve the accuracy of genomic prediction. However, increasing the number of burn-in samples does not improve the performance of MCMC after it reaches its equilibrium distribution. Therefore, determining the optimum number of samples in the burning period is of great importance to reduce the computational cost. In this study, we proposed an automatically tuned strategy for setting the number of samples in the burn-in period via a multiple chain parallel MCMC scheme when implementing Bayesian models for genomic prediction. The diagnosis of the convergence of the multiple chains was used to determine the optimum burn-in period value. Using simulated data, we implemented several models to predict genomic values, and compare their prediction accuracies. The models implemented were the tuned burn-in multiple chain parallel BayesA (TunBpBayesA), tuned burn-in multiple chain parallel BayesCp (TunBpBayesCp),fixed burn-in multiple chain parallel BayesA (FixBpBayesA), fixed burn-in multiple chain parallel BayesCp (FixBpBayesCp) and Genomic Best Linear Unbiased Prediction (GBLUP). In our study, prediction accuracies of TunBpBayesA (or TunBpBayesCp) were consistent with those of FixBpBayesA(or FixBpBayesCp), while speedup ratios of TunBpBayesA (or TunBpBayesCp) were higher than those of FixBpBayesA (or FixBpBayesCp). Moreover, using 1217 real data of Chinese Simmental beef cattle genotyped with Illumina Bovine 770K SNP BeadChip, we found that TunBpBayesCp performed better than TunBpBayesA and GBLUP for four different traits using a five-fold cross validation.

U.S. DEPARTMENT OF AGRICULTURE

Livestock and Range Research Laboratory: Miles City, MT