Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #344693

Title: SciApps: A cloud-based platform for reproducible bioinformatics workflows

Author
item WANG, LIYA - Cold Spring Harbor Laboratory
item ZU, ZHENYUAN - Cold Spring Harbor Laboratory
item VAN BUREN, PETER - Cold Spring Harbor Laboratory
item Ware, Doreen

Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 5/25/2018
Publication Date: 6/12/2018
Citation: Wang, L., Zu, Z., Van Buren, P., Ware, D. 2018. SciApps: A cloud-based platform for reproducible bioinformatics workflows. Bioinformatics. https://doi.org/10.1093/bioinformatics/bty439.
DOI: https://doi.org/10.1093/bioinformatics/bty439

Interpretive Summary: The rapid growth of both sequence and phenotype data generated by high-throughput methods has imposed an increasing need to store and analyze data on remotely located storage and computing systems. A workflow management system is needed to ensure efficient data management across heterogeneous systems, simplify the task of analysis through automation, and make large scale bioinformatics analysis accessible and reproducible. We have developed SciApps, a cloud-based platform for reproducible bioinformatics workflows. The platform is powered by a federated CyVerse system located at Cold Spring Harbor Laboratory (CSHL), and is fully integrated with CyVerse Cyber-infrastructure (CI) through their Agave platform for job management and iRODS-based Data Store for data management. To create a workflow, each analysis job is submitted, recorded, and accessed through the web portal. Part or all of a series of recorded jobs are saved as reproducible, sharable workflows for future execution, using the original or modified inputs and parameters. The platform is designed to automate the execution of modular Agave apps and make it easy to bring reproducible workflows to both local and cloud-based computing systems. Two workflows, association and annotation, are provided as exemplar scientific use cases.

Technical Abstract: Motivation: The rapid growth of both sequence and phenotype data generated by high-throughput methods has imposed an increasing need to store and analyze data on distributed storage and computing systems. A workflow management system is needed to ensure efficient data management across heterogeneous systems, simplify the task of analysis through automation, and make largescale bioinformatics analysis accessible and reproducible. Results: We have developed SciApps, a cloud-based platform for reproducible bioinformatics workflows. The platform is powered by a federated CyVerse system located at Cold Spring Harbor Laboratory (CSHL). The system is fully integrated with CyVerse Cyber-infrastructure (CI) through the Agave platform for job management and iRODS-based CyVerse Data Store for data management. To create a workflow, each analysis job is submitted, recorded, and accessed through the web portal. Part or all of a series of recorded jobs can be saved as reproducible, sharable workflows for future execution using the original or modified inputs and parameters. The platform is designed to automate the execution of modular Agave apps and make it easy to bring reproducible workflows to both local and cloud-based computing systems. Two workflows, association and annotation, are provided as exemplar scientific use cases.