Author
WANG, LIYA - Cold Spring Harbor Laboratory | |
ZU, ZHENYUAN - Cold Spring Harbor Laboratory | |
VAN BUREN, PETER - Cold Spring Harbor Laboratory | |
Ware, Doreen |
Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 5/25/2018 Publication Date: 6/12/2018 Citation: Wang, L., Zu, Z., Van Buren, P., Ware, D. 2018. SciApps: A cloud-based platform for reproducible bioinformatics workflows. Bioinformatics. https://doi.org/10.1093/bioinformatics/bty439. DOI: https://doi.org/10.1093/bioinformatics/bty439 Interpretive Summary: The rapid growth of both sequence and phenotype data generated by high-throughput methods has imposed an increasing need to store and analyze data on remotely located storage and computing systems. A workflow management system is needed to ensure efficient data management across heterogeneous systems, simplify the task of analysis through automation, and make large scale bioinformatics analysis accessible and reproducible. We have developed SciApps, a cloud-based platform for reproducible bioinformatics workflows. The platform is powered by a federated CyVerse system located at Cold Spring Harbor Laboratory (CSHL), and is fully integrated with CyVerse Cyber-infrastructure (CI) through their Agave platform for job management and iRODS-based Data Store for data management. To create a workflow, each analysis job is submitted, recorded, and accessed through the web portal. Part or all of a series of recorded jobs are saved as reproducible, sharable workflows for future execution, using the original or modified inputs and parameters. The platform is designed to automate the execution of modular Agave apps and make it easy to bring reproducible workflows to both local and cloud-based computing systems. Two workflows, association and annotation, are provided as exemplar scientific use cases. Technical Abstract: Motivation: The rapid growth of both sequence and phenotype data generated by high-throughput methods has imposed an increasing need to store and analyze data on distributed storage and computing systems. A workflow management system is needed to ensure efficient data management across heterogeneous systems, simplify the task of analysis through automation, and make largescale bioinformatics analysis accessible and reproducible. Results: We have developed SciApps, a cloud-based platform for reproducible bioinformatics workflows. The platform is powered by a federated CyVerse system located at Cold Spring Harbor Laboratory (CSHL). The system is fully integrated with CyVerse Cyber-infrastructure (CI) through the Agave platform for job management and iRODS-based CyVerse Data Store for data management. To create a workflow, each analysis job is submitted, recorded, and accessed through the web portal. Part or all of a series of recorded jobs can be saved as reproducible, sharable workflows for future execution using the original or modified inputs and parameters. The platform is designed to automate the execution of modular Agave apps and make it easy to bring reproducible workflows to both local and cloud-based computing systems. Two workflows, association and annotation, are provided as exemplar scientific use cases. |