Skip to main content
ARS Home » Southeast Area » Miami, Florida » Subtropical Horticulture Research » Research » Publications at this Location » Publication #331410

Title: Novel Applications of Multi-task Learning and Multiple Output Regression to Multiple Genetic Trait Prediction

Author
item HE, DAN - Computational Biology Center, Ibm, Tj Watson Research
item Kuhn, David
item PARIDA, LAXMI - Computational Biology Center, Ibm, Tj Watson Research

Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 4/27/2016
Publication Date: 6/15/2016
Citation: He, D., Kuhn, D.N., Parida, L. 2016. Novel Applications of Multi-task Learning and Multiple Output Regression to Multiple Genetic Trait Prediction. Bioinformatics. 32(12):i37-i43.

Interpretive Summary: Tree breeding is a difficult and time consuming endeavour. To improve the efficiency of tree breeding, breeders search for some type of marker that will help them identify potentially improved trees at the seedling stage. We have developed many thousands of DNA markers for cacao, avocado and mango and begun the association of traits with those markers to aid both breeders and producers of these crops. This paper explores the possibility of improving the identification of markers associated with commercially important traits such as fruit color, size, shape, and weight by taking into account all traits at once, rather than one at a time (multiple genetic trait prediction). Indeed, we discovered that the machine learning approach employed did improve the ability to predict the traits of the mature tree from the genotype of the seedling. This result will improve the efficiency of screening seedlings to find improved cultivars of fruit trees. Since fruit trees are vegetatively propagated, this method also will enhance the selection, evaluation and release of new improved varieties of cacao, avocado and mango. The information presented in this paper is of importance to research scientists, breeders and producers.

Technical Abstract: Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multi-task learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multi-task learning algorithms and multiple output regression algorithms to solve the multi-trait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits.