Location: Corn Insects and Crop Genetics Research
Title: Maize GO annotation—methods, evaluation, and review (maize-GAMER)Author
WIMALANATHAN, KOKULAPALAN - Iowa State University | |
FRIEDBERG, IDDO - Iowa State University | |
Andorf, Carson | |
LAWRENCE-DILL, CAROLYN - Iowa State University |
Submitted to: Plant Direct
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 2/16/2018 Publication Date: 4/11/2018 Citation: Wimalanathan, K., Friedberg, I., Andorf, C.M., Lawrence-Dill, C. 2018. Maize GO annotation—methods, evaluation, and review (maize-GAMER). Plant Direct. 2(4):e00052. https://doi.org/10.1002/pld3.52. DOI: https://doi.org/10.1002/pld3.52 Interpretive Summary: The maize genome is a valuable resource in both that maize is a valuable crop and an important research model for plant genetics. Although a maize genome assembly and gene structure annotation has been available since 2009, there still is not a lot of high quality functional annotations available. We created a high-coverage, robust, and reproducible functional annotation of maize protein-coding genes. This study provides annotations for 100% of the genes which is a 44% increase over the next best annotation set. Evaluations based on the gold standard data indicate that our new annotation set is measurably more accurate than previous methods. Both the data and methods for this study are open-source and publicly available. Technical Abstract: We created a new high-coverage, robust, and reproducible functional annotation of maize protein-coding genes based on Gene Ontology (GO) term assignments. Whereas the existing Phytozome and Gramene maize GO annotation sets only cover 41% and 56% of maize protein-coding genes, respectively, this study provides annotations for 100% of the genes. We also compared the quality of our newly derived annotations with the existing Gramene and Phytozome functional annotation sets by comparing all three to a manually annotated gold standard set of 1,619 genes where annotations were primarily inferred from direct assay or mutant phenotype. Evaluations based on the gold standard indicate that our new annotation set is measurably more accurate than those from Phytozome and Gramene. To derive this new high-coverage, high-confidence annotation set, we used sequence similarity and protein domain presence methods as well as mixed-method pipelines that were developed for the Critical Assessment of Function Annotation (CAFA) challenge. Our project to improve maize annotations is called maize-GAMER (GO Annotation Method, Evaluation, and Review), and the newly derived annotations are accessible via MaizeGDB (http://download.maizegdb.org/maize-GAMER) and CyVerse (B73 RefGen_v3 5b+ at doi.org/10.7946/P2S62P and B73 RefGen_v4 Zm00001d.2 at doi.org/10.7946/P2M925). |