Location: Sunflower and Plant Biology Research
Title: Gene space and transcriptome assemblies of leafy spurge (Euphorbia esula) identify promoter sequences, repetitive elements, high-quality markers, and a full-length chloroplast genomeAuthor
Horvath, David | |
PATEL, SAGAR - South Dakota State University | |
DOGRAMACI, MUNEVVER - Former ARS Employee | |
Chao, Wun | |
Anderson, James | |
Foley, Michael | |
Scheffler, Brian | |
Lazo, Gerard | |
DORN, KEVIN - Kansas State University | |
YAN, CHANGHUI - North Dakota State University | |
Childers, Anna | |
SCHATZ, MICHEL - Johns Hopkins University | |
MARCUS, SHOSHANA - Kingsborough Community College |
Submitted to: Weed Science
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 12/23/2017 Publication Date: 2/28/2018 Citation: Horvath, D.P., Patel, S., Dogramaci, M., Chao, W.S., Anderson, J.V., Foley, M.E., Scheffler, B., Lazo, G., Dorn, K., Yan, C., Childers, A., Schatz, M., Marcus, S. 2018. Gene space and transcriptome assemblies of leafy spurge (Euphorbia esula) identify promoter sequences, repetitive elements, high-quality markers, and a full-length chloroplast genome. Weed Science. 66(3):355-367. https://doi.org/10.1017/wsc.2018.2. DOI: https://doi.org/10.1017/wsc.2018.2 Interpretive Summary: Leafy spurge is an invasive perennial weed that is considered one of the ten worst weeds in the US. Here we report the assembly and characterization of the leafy spurge transcriptome (a collection of protein coding gene sequences present in leafy spurge) and gene-space (the coding and non-coding DNA sequence of genes). Based on analysis of the data, we determined that our assembled datasets contain more than 90% of the predicted leafy spurge genes. We also obtained sequences for over 500 million DNA fragments from the leafy spurge genome (the complete set of genetic material in an organism). We tested various computer programs for assembling these smaller fragments into longer contiguous pieces (called contigs). We determined that two computer programs (Trinity and Velvet) did the best job of assembling the small fragments into larger ones. Overall, our results indicate that over 88% of the genes we identified in our assembled transcriptome were present in our genome assemblies. Based on these results, we now have promoter sequences (promoters are the part of a gene that control how it is turned on and off) for over 20,000 of the leafy spurge genes. This study provides an efficient blueprint for low cost sequence analysis of other weed species that should help weed scientists gain a better understanding of genetic factors regulating weediness traits, weed evolution, and herbicide resistance. Technical Abstract: Leafy spurge is an invasive perennial weed infesting range and recreational lands of North America. Previous research and omics projects with leafy spurge have helped develop it as a model for studying numerous aspects of perennial plant development and response to abiotic stress. However, the lack of an assembled genome for leafy spurge has limited the power of previous transcriptomic studies to identify functional promoter elements and transcription factor binding sites. An assembled genome for leafy spurge would enhance our understanding of signaling processes controlling plant development and responses to environmental stress and provide a better understanding of genetic factors impacting weediness traits, evolution, and herbicide resistance. A comprehensive transcriptome database would also assist in analyzing future RNAseq studies and is needed to annotate and assess genomic sequence assemblies. Here, we assembled and annotated 56,234 unigenes from an assembly of 589,235 RNAseq-derived contigs and a previously published Sanger-sequenced EST collection. The resulting data indicates we now have sequence for more than 90% of the expressed leafy spurge protein-coding genes. We also assembled the gene-space of leafy spurge utilizing a limited coverage (18X) genomic sequence database. In this study, the programs Velvet and Trinity produced the best gene-space assemblies based on representation of expressed and conserved eukaryotic genes. The results indicate that leafy spurge contains as much as 23% repetitive sequences, of which 11% are unique. Our sequence data was also sufficient for assembling a full chloroplast and partial mitochondrial genome. Further, marker analysis identified over 150K high quality variants in our leafy spurge Trinity assembled genome. Based on these results, leafy spurge appears to have limited heterozygosity. This study provides a blueprint for low-cost genomic assemblies in weed species and new resources for identifying conserved and novel promoter regions among coordinately expressed genes of leafy spurge. |