Publication : USDA ARS

ARS Home » Research » Publications at this Location » Publication #264654

Title: Data structures and visualization

Author

Cole, John

Submitted to: Journal of Dairy Science
Publication Type: Abstract Only
Publication Acceptance Date: 2/24/2011
Publication Date: 6/30/2011
Citation: Cole, J.B. 2011. Data structures and visualization. Journal of Animal Science 89(E-Suppl. 1)/Journal of Dairy Science 94(E-Suppl. 1):226(abstr. 198).

Interpretive Summary:

Technical Abstract: Genomic tools for genetic improvement have been rapidly adopted in many livestock species over the past few years. This presents new challenges for data collection and management, as well as opportunities for analysis and presentation. The U.S. national dairy database currently includes genotypes for 83,117 bulls and cows and 2,620 imputed dams representing three different densities and four chip versions. Storage requirements for these genotypes are modest, even when high-density (>500,000K) genotypes are imputed from lower densities. However, storage requirements for intermediate and results files for genetic evaluations are much more substantial, particularly when multiple runs must be stored for research and validation studies. Full-sequence data will be available at reasonable cost in the near future, and will require much more storage. The greatest gains in accuracy from genomic selection have been realized for traits of low heritability, such as fertility and longevity, and there is increasing interest in new health and management traits. In addition to data on novel traits, potentially useful economic and demographic information is being collected by on-farm computer and analytical systems. There is increasing interest in traits such as feed efficiency and resistance to climate change, but the collection of sufficient phenotypes to produce accurate evaluations may take several years, and high-reliability proofs for older bulls are needed in order to precisely estimate marker effects. As traits proliferate and the number of genotyped animals continues to grow increasingly sophisticated analytical approaches will be tractable. Machine learning algorithms may be useful in identifying previously unrecognized relationships among traits, and the analysis of genetic (co)variances among loci could help identify important gene networks. Improved visualization tools, particularly those capable of processing very large volumes of data in a reasonable amount of time, are needed to help better understand the results of analyses. The challenges and opportunities presented by growing amounts of phenotypic and genomic data are generally similar regardless of the species in question.