Location: Animal Genomics and Improvement Laboratory
Title: Assessing different cross-validation schemes for predicting novel traits using sensor data: an application to dry matter intake and residual feed intake using milk spectral dataAuthor
ADKINSON, A - Michigan State University | |
ABOUHAWWASH, MOHAMED - Michigan State University | |
PARKER GADDIS, KRISTEN - Council On Dairy Cattle Breeding | |
BURCHARD, JAVIER - Council On Dairy Cattle Breeding | |
PENAGARICANO, FRANCISCO - University Of Wisconsin | |
WHITE, HEATHER - University Of Wisconsin | |
WEIGEL, KENT - University Of Wisconsin | |
Baldwin, Ransom - Randy | |
SANTOS, JOSE - University Of Florida | |
VANDEHAAR, MIKE - Michigan State University | |
KOLTES, JAMES - Iowa State University | |
TEMPELMAN, ROBERT - Michigan State University |
Submitted to: Journal of Dairy Science
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 5/15/2024 Publication Date: 6/13/2024 Citation: Adkinson, A.Y., Abouhawwash, M., Parker Gaddis, K.L., Burchard, J., Penagaricano, F., White, H.M., Weigel, K.A., Baldwin, R.L., Santos, J.E., Vandehaar, M.J., Koltes, J.E., Tempelman, R.J. 2024. Assessing different cross-validation schemes for predicting novel traits using sensor data: an application to dry matter intake and residual feed intake using milk spectral data. Journal of Dairy Science. https://doi.org/10.3168/jds.2024-24701. DOI: https://doi.org/10.3168/jds.2024-24701 Interpretive Summary: Direct recording of individual dry matter intakes (DMI) is prohibitively expensive on most commercial farms. We explored the potential of using milk mid-infrared (MIR) spectral data to predict DMI and residual feed intake using research data from four research farms across the US. Three different cross-validation schemes were compared. The first scheme (cow-independent cross-validation) was based on keeping all records from the same cow together in either training or test sets. Likewise, experiment-independent schemes kept data from the same experiments together, whereas herd-independent schemes kept data from the same herds together in either training or test sets. Whereas MIR data appears to improve predictive performance using cow-independent schemes, the same was not observed for experiment-independent or herd-independent schemes. Since the is more pertinent for predicting DMI on herds not routinely collecting DMI, more research is needed to improve across-herd predictions using sensor data like MIR data. Technical Abstract: Feed efficiency is important for economic profitability of dairy farms; however, recording dry matter intakes (DMI) is prohibitively expensive. Our objective was to investigate the potential use of milk mid-infrared (MIR) spectral data to predict proxy phenotypes for DMI based on different cross-validation schemes. We were specifically interested in comparisons between a model that included only MIR data (Model M1), a model that incorporated different energy sink predictors, such as body weight, body weight change, and milk energy (Model M2) and an extended model that incorporated both energy sinks and MIR data (Model M3. Models M2 and M3 also included various cow level variables (stage of lactation, age at calving, parity) such that any improvement in model performance from M2 to M3, whether through a lower root mean squared error (RMSE) or a higher squared predictive correlation (R2), could indicate a potential benefit of MIR to predict residual feed intake. The data used in our study originated from a multi-institutional project on the genetics of feed efficiency in US Holsteins. Analysis was conducted on two different trait definitions based on different period lengths: weekly vs. 28-d records. Specifically, there were 19,942 weekly records on 1,812 cows across 46 experiments or cohorts and 3,724 28-d records on 1,700 cows across 43 different cohorts. The cross-validation analyses involved three different k-fold schemes. Firstly, a 10-fold cow-independent cross-validation was conducted whereby all records from any one cow were kept together in either training or test sets. Similarly, a 10-fold experiment-independent cross-validation kept entire experiments together whereas a 4-fold herd-independent cross-validation kept entire herds together in either training or test sets. Based on cow-independent cross-validation for both weekly and 28-d DMI, adding MIR predictors to energy sinks (Models M3 vs M2) significantly (P<10-10) lowered RMSE although there seemed to be no evidence of a difference for R2 and RMSE for experiment-independent cross-validation. Adding MIR to energy sinks (M3) to predict DMI in a herd-independent cross-validation scheme seemed to demonstrate no merit compared to an energy sink model (M2) for either R2 or RMSE. We further noted that with broader constructions of cross-validation, i.e., from cow-independent to experiment-independent to herd-independent schemes, there were substantially greater levels of mean and slope bias. Given that proxy DMI phenotypes for cows would need to be almost entirely generated in herds having no DMI or training data of their own, herd-independent cross-validation assessments of predictive performance should be emphasized. Hence, more research on predictive algorithms and a more earnest effort on calibration of spectrophotometers against each other should be considered. |