Location: Genetics and Animal Breeding
Title: Pipelines, workflows and virtualization to build institutional informatics capacityAuthor
Submitted to: Meeting Proceedings
Publication Type: Abstract Only Publication Acceptance Date: 10/20/2019 Publication Date: 12/10/2019 Citation: Dickey, A.M., Nonneman, D.J., Freetly, H.C., Workman, A.M., Kuehn, L.A. 2019. Pipelines, workflows and virtualization to build institutional informatics capacity [abstract]. In: Proceedings of Rocky 2019, December 5-7, 2019, Aspen/Snowmass, Colorado. Paper No. P22. Interpretive Summary: Technical Abstract: Volmers et al. 2017 defined the “bioinformatics middle class” as being comprised of competent and informed users rather than tool developers. Increasingly, these middle class bioinformaticians are being employed in supporting roles where they can collaborate to advance the research programs of multiple principal investigators across an institution to access, manipulate and analyze large datasets. The daily routine of the middle class bioinformatician may vary with both individual strengths as well as client and institutional needs but will often comprise a variety of activities. Such activities might include delivering trainings, scripting, developing pipelines and curating databases. Middle class bioinformaticians occupy a similar role to departmental statisticians and may face some of the same professional challenges; among them, maintaining an active publication record. A data analysis pipeline is an end-to-end multi-step data management solution where the data is not manually inspected between each step. In contrast, a user interacts with the data at each step of a data analysis workflow. Either methodological class can take advantage of computer platform virtualization. The purpose of this presentation is to summarize multiple research projects where the bioinformatic support was in the form of a pipeline or workflow with the goal of highlighting differences between these two approaches. Use of virtualization in different projects is also highlighted. Workflows have greater real-time flexibility for integrating custom statistics and outputs whereas pipelines offer greater speed. |