Publication : USDA ARS

ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #309657

Title: Ensembl Genomes 2013: scaling up access to genome-wide data

Author

	KERSEY, PAUL - European Bioinformatics Institute
	ALLEN, JAMES - European Bioinformatics Institute
	CHRISTENSEN, MIKKEL - European Bioinformatics Institute
	DAVIS, PAUL - European Bioinformatics Institute
	FALIN, LEE - European Bioinformatics Institute
	GRABMUELLER, CHRISTOPHE - European Bioinformatics Institute
	HUGHES, DANIEL - European Bioinformatics Institute
	HUMPHREY, JAY - European Bioinformatics Institute
	KERHORNOU, ARNAUD - European Bioinformatics Institute
	KHOBOVA, JULIA - European Bioinformatics Institute
	LANGRIDGE, NICHOLAS - European Bioinformatics Institute
	MCDOWALL, MARK - European Bioinformatics Institute
	MAHESWARI, UMA - European Bioinformatics Institute
	MASLEN, GARETH - European Bioinformatics Institute
	NUHN, MICHAEL - European Bioinformatics Institute
	ONG, CHUANG - European Bioinformatics Institute
	PAULINI, MICHAEL - European Bioinformatics Institute
	PEDRO, HELDER - European Bioinformatics Institute
	TONEVA, ILIANA - European Bioinformatics Institute
	TULI, MARY ANN - Wellcome Trust Sanger Institute
	WALTS, BRANDON - European Bioinformatics Institute
	WILLIAMS, GARETH - European Bioinformatics Institute
	WILSON, DEREK - European Bioinformatics Institute
	YOUENS-CLARK, KEN - Cold Spring Harbor Laboratory
	MONACO, MARCELA - Cold Spring Harbor Laboratory
	STEIN, JOSHUA - Cold Spring Harbor Laboratory
	WEI, XUEHONG - Cold Spring Harbor Laboratory
	Ware, Doreen
	BOLSER, DANIEL - European Bioinformatics Institute
	HOWE, KEVIN - European Bioinformatics Institute
	KULESHA, E. - European Bioinformatics Institute

Submitted to: Nucleic Acids Research
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 10/11/2013
Publication Date: 1/1/2014
Publication URL: http://DOI: 10.1093/nar/gkt979
Citation: Kersey, P.J., Allen, J.E., Christensen, M., Davis, P., Falin, L.J., Grabmueller, C., Hughes, D.S., Humphrey, J., Kerhornou, A., Khobova, J., Langridge, N., Mcdowall, M.D., Maheswari, U., Maslen, G., Nuhn, M., Ong, C.K., Paulini, M., Pedro, H., Toneva, I., Tuli, M., Walts, B., Williams, G., Wilson, D., Youens-Clark, K., Monaco, M.K., Stein, J., Wei, X., Ware, D., Bolser, D.M., Howe, K.L., Kulesha, E. 2014. Ensembl Genomes 2013: scaling up access to genome-wide data. Nucleic Acids Research. 45:D546-D552.

Interpretive Summary: Ensembl Genomes (http://www.ensemblgenomes.org) is an internet resource that integrates genomic data from the complete genomes of non-vertebrate organisms, including crop plants, vectors of human disease, and infectious disease-causing parasites. The website is organized as five sites, each focused on one of the traditional kingdoms of life: bacteria, unicellular organisms, mushrooms, plants, and multicellular invertebrates. The project exploits and extends technologies to annotate, analyze and disseminate genome knowledge, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, gene expression data, genetic variants and comparative analysis. This article provides an update to previous publications about the resource, with a focus on recent developments, which include the addition of important new genomes (e.g., bread wheat) and related data sets. The project is driven by several collaborations, each with a specific scientific community (e.g., Gramene, a plant comparative database). In the context of bacteria, the resource has scaled up representation of over 9000 bacterial genomes. To support users in navigating such large data sets, specific extensions to the web and programmatic interfaces have been developed. In the future, analytic tools to allow targeted selection of data for graphic visualization and download are likely to become increasingly important, as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for more complex organisms.

Technical Abstract: Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future.

U.S. DEPARTMENT OF AGRICULTURE

Plant, Soil and Nutrition Research: Ithaca, NY