Location: Corn Insects and Crop Genetics Research
Title: GenomeQC: A quality assessment tool for genome assemblies and gene structure annotationsAuthor
MANCHANDA, NANCY - Iowa State University | |
Portwood, John | |
Woodhouse, Margaret | |
SEETHARAM, ARUN - Iowa State University | |
LAWRENCE-DILL, CAROLYN - Iowa State University | |
Andorf, Carson | |
HUFFORD, MATTHEW - Iowa State University |
Submitted to: BMC Genomics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 2/7/2020 Publication Date: 3/2/2020 Citation: Manchanda, N., Portwood II, J.L., Woodhouse, M.H., Seetharam, A., Lawrence-Dill, C.J., Andorf, C.M., Hufford, M. 2020. GenomeQC: A quality assessment tool for genome assemblies and gene structure annotations. BMC Genomics. 21. https://doi.org/10.1186/s12864-020-6568-2. DOI: https://doi.org/10.1186/s12864-020-6568-2 Interpretive Summary: In recent years, the genomes of many species have been sequenced, but the quality of these genomes can be variable. Several tools exist to measure in different ways the quality of sequenced genomes, but to date there is no program that contains all these tools together, so as to measure all at once the different factors that are associated with genome quality. Here we present GenomeQC, an easy-to-use program that integrates multiple genome-quality measurement tools in one step, and which will also create figures and graphs depicting these measurements which can be shared with others. In this way we have simplified the process of testing the quality of genomes, making this process easier for users. Technical Abstract: Genome assemblies are foundational for understanding species’ biology. They provide a physical framework for mapping additional sequence, thereby enabling characterization of nucleotide diversity and gene expression across individuals. However, without quality metrics for such assemblies, it is possible to make incorrect assumptions regarding the completeness and contiguity of an assembly, leading to incorrect conclusions. Currently, the quality of a newly sequenced genome is assessed using a set of commonly calculated metrics that are then compared to gold standard reference genomes. While several tools exist for calculating individual quality metrics, applications providing comprehensive evaluations are surprisingly non-existent. Here, we describe a new toolkit that integrates multiple metrics to characterize assembly and gene annotation quality. GenomeQC is an easy-to-use and interactive web framework based on the R/Shiny package that integrates various quantitative measures to characterize genome assemblies and annotations. Our application, named GenomeQC, provides the user with a comprehensive summary of these statistics and allows for benchmarking against gold standard reference assemblies. Conclusions: The GenomeQC web application is implemented in R/Shiny version 1.5.9 and Python 3.6 and is freely available at https://genomeqc.maizegdb.org/ under the GPL license. All the source code and the container version of the GenomeQC pipeline is available in the GitHub repository https://github.com/HuffordLab/GenomeQC. |