Daesung the Data Scientist
- Daesung needs fast computing with advanced capabilities to analyze large data sets of sequenced ’omics data.
- He struggles to save his lab money by minimizing his cloud compute time, and to make sure he is following best practices for reproducible code.
- We can help him by enabling PROOF to run on a cloud back-end and making it user-friendly enough that he can use it to collaborate with his less computationally skilled labmates.
Daesung needs a fast track to the cloud
Daesung has been using HPC in his work for a while now, and he knows his way around a terminal window. But that doesn’t mean he doesn’t appreciate a GUI! He loved the Data Science Lab’s new PROOF tool because it lets him validate code before running it and easily monitor job status. What he really wants now is a way to connect it to the cloud, so that he can use PROOF’s handy job tracking features for his WDLs but access advanced functionality on AWS, like GPU computing to support machine learning.
Collaborators: Larry the learner, Bisei the bioinformatics researcher
Downstream users: Preeti the PI
Key Challenges
- In-house tool that aids in workflow validation and management (PROOF) currently only works on the cluster
Needs and Wants
- A way to use PROOF to run analyses on the cloud
- A way to help labmates less comfortable with the command line use cloud computing
Types of data used
- ’Omics data
- Phenotypic data
Image attribution: Photo by Valeriy Khan on StockSnap
last updated July 2024