Location: Emerging Pests and Pathogens Research
Title: Assessing protein sequence database suitability using de novosequencingAuthor
JOHNSON, RICHARD - University Of Washington | |
SEARLE, BRIAN - Institute For Systems Biology | |
NUNN, BROOK - University Of Washington | |
GILMORE, JASON - University Of Washington | |
PHILLIPS, MOLLY - University Of Washington | |
AMEMIYA, CHRIS - University Of California | |
Heck, Michelle | |
MACCOSS, MICHAEL - University Of Washington |
Submitted to: Molecular and Cellular Proteomics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 10/5/2019 Publication Date: 1/5/2020 Citation: Johnson, R., Searle, B.C., Nunn, B.L., Gilmore, J.M., Phillips, M., Amemiya, C.T., Heck, M.L., Maccoss, M.J. 2020. Assessing protein sequence database suitability using de novosequencing. Molecular and Cellular Proteomics. http://dx.doi.org/10.1074/mcp.TIR119.001752. DOI: https://doi.org/10.1074/mcp.TIR119.001752 Interpretive Summary: Proteomics is the study of all the proteins produced by a cell, tissue, or organism. Proteomics analysis enables scientists to understand the response of an organism to its biotic and abiotic environment. Proteomics studies are fraught with difficulty because proteins have extraordinarily complex biochemical and physiochemical properties, making software development to interpret proteomics data difficult. Difficulties are also encountered when conducting a proteomics analysis of an organism without a sequenced genome to provide a suitable database for protein searching or an analysis of a complex sample made up of one or more different species, such as the citrus greening insect vector, the Asian citrus psyllid, which harbors beneficial bacterial partners and also the citrus greening bacterial pathogen. In this work, a new approach to the analysis of proteomics data is proposed and tested. The new method provides information on the quality of the proteomics data and the usefulness of the database used in the data analysis. The applications of this approach enable proteomics analysis of closely related species, complex samples made up of one or more species, and proteomics of extant organisms. Technical Abstract: The analysis of samples from unsequenced and/or understudied species as well as samples where the proteome is derived from multiple organisms poses two key questions. The first is whether the proteomic data obtained from an unusual sample type even contains peptide tandem mass spectra. The second question is whether an appropriate protein sequence database is available for proteomic searches. We describe the use of automated de novo sequencing for evaluating both the quality of a collection of tandem mass spectra and the suitability of a given protein sequence database for searching that data. Applications of this method include the proteome analysis of closely related species, metaproteomics, and proteomics of extant organisms. |