Skip to main content
ARS Home » Southeast Area » Raleigh, North Carolina » Plant Science Research » Research » Publications at this Location » Publication #178230

Title: ESTIMATING GENOTYPIC CORRELATIONS AND THEIR STANDARD ERRORS USING MULTIVARIATE RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION WITH SAS PROC MIXED

Author
item Holland, Jim - Jim

Submitted to: Crop Science
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 9/30/2005
Publication Date: 3/1/2006
Citation: Holland, J.B. 2006. Estimating genotypic correlations and their standard errors using multivariate restricted maximum likelihood estimation with sas proc mixed. Crop Science.

Interpretive Summary: Plant breeders are often interested in the genetic correlation between different traits, because this correlation determines if selection on one trait will affect another trait. This paper describes the theory and application of a statistical method called restricted maximum likelihood (REML) estimation of genotypic and phenotypic correlations. This method has been used by some animal breeders, but rarely by plant breeders, because of a lack of software. This paper provides software code that plant breeders can implement with the widely-used SAS statistical analysis package to estimate genetic correlations with REML. An extensive simulation study was conducted to demonstrate that the REML correlation estimates are better than estimates based on traditional analyses when there are missing data. Missing data often occurs in plant breeding experiments (because plots can be lost due to experimental errors or poor growing conditions), so the REML method outlined here is recommended for use over the traditional methods.

Technical Abstract: Plant breeders traditionally have estimated genotypic and phenotypic correlations between traits using the method of moments based on a multivariate analysis of variance (MANOVA). Drawbacks to using the method of moments to estimate variance and covariance components include the possibility of obtaining estimates outside of parameter bounds and loss of efficiency and ignorance of the estimators’ distributional properties when data are missing. An alternative approach that does not suffer these problems, but depends on the assumption of normally distributed random effects, is restricted maximum likelihood (REML). REML is often more computationally intensive than least squares methods, but advances in computer processing speed have made REML computationally feasible on modern personal computers. Proc MIXED of the SAS system was designed as a univariate analysis procedure, but its application to REML estimation of genotypic and phenotypic correlations is demonstrated. Additionally, a method to obtain approximate parametric estimates of the sampling variances of the correlation estimates is presented. MANOVA and REML methods were compared with a real data set and with simulated data. The simulation study examined the effects of different correlation parameter values, genotypic and environmental sample sizes, and proportion of missing data on Type I and Type II error rates and on accuracy of confidence intervals. The two methods provided similar results when data were balanced or only 5% of data were missing. However, when 15% or 25% data were missing, the REML method generally performed better, resulting in higher power of detection of correlations and more accurate 95% confidence intervals.