Author
MATUKUMALLI, LAKSHMI - GEORGE MASON UNIVERSITY | |
GREFENSTETTE, JOHN - GEORGE MASON UNIVERSITY | |
Sonstegard, Tad | |
Van Tassell, Curtis - Curt |
Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 9/15/2003 Publication Date: 1/1/2004 Citation: Matukumalli, L.K., Grefenstette, J.J., Sonstegard, T.S., Van Tassell, C.P. 2004. Est-page - a simple web interface for managing and analyzing est data. Bioinformatics, 20:286-288. Interpretive Summary: Expressed sequence tags (EST) are partial sequences of expressed genes prepared. In performing large-scale EST sequencing projects, many different types of information are generated. This data can include contact, publication and library information that are to be submitted to GenBank, the public data repository, along with the EST sequence. EST-PAGE provides a system for EST data entry, database management, process control and data retrieval from a unified web interface that can be easily customized. Although several EST pipeline applications were developed, this software is not freely available. For these reasons, we developed a simple web interface for managing and analyzing data generated from EST sequencing projects, which is named EST-PAGE. PAGE is an acronym corresponding to the data management steps available in the interface that can be summarized as Processing, Analysis, GenBank submission and Exploration of EST data. This software will be made freely available. Technical Abstract: Expressed sequence tags (EST) are partial sequences of expressed genes prepared by reverse transcribing mRNA and cloning the cDNA fragments into a plasmid. In performing large-scale EST projects for any species, many different types of information and data types are generated. This can include contact, publication and library information that is to be submitted to GenBank along with the EST sequence and the analysis data related to annotation. EST-PAGE provides a bioinformatics solution for EST data entry, database management, process control and data retrieval from a unified web interface that can be easily customized and adapted by groups working on diverse EST sequencing projects. Although several EST pipeline applications were developed, this software is not freely available. For these reasons, we developed a simple web interface for managing and analyzing data generated from EST sequencing projects, which is named EST-PAGE. PAGE is an acronym corresponding to the data management steps available in the interface that can be summarized as: P (Processing of chromatogram for base calling), A (Analysis of the sequence data with vector screening, filtering low complexity sequences and checking for E.coli contamination), G (GenBank submission of good sequences to dbEST), E (Exploration of EST data for redundancy, presence of novel sequences by clustering and annotation). EST-PAGE is written in Perl, and takes advantage of standard modules such as bioperl and CGI-Perl to reduce coding costs and to encourage standardization and easy modification of the system to suit other needs. Open source free database Postgres (MySql version also available) is used to store the data. EST- PAGE can be used by all groups within ARS-USDA. This software will be made freely available. |