Daesung the Data Scientist

  • Daesung needs fast computing with advanced capabilities to analyze large data sets of sequenced ’omics data.
  • He struggles to save his lab money by minimizing his cloud compute time, and to make sure he is following best practices for reproducible code.
  • We can help him by enabling PROOF to run on a cloud back-end and making it user-friendly enough that he can use it to collaborate with his less computationally skilled labmates.

Daesung needs a fast track to the cloud

Daesung has been using HPC in his work for a while now, and he knows his way around a terminal window. But that doesn’t mean he doesn’t appreciate a GUI! He loved the Data Science Lab’s new PROOF tool because it lets him validate code before running it and easily monitor job status. What he really wants now is a way to connect it to the cloud, so that he can use PROOF’s handy job tracking features for his WDLs but access advanced functionality on AWS, like GPU computing to support machine learning.

Key Challenges

  • In-house tool that aids in workflow validation and management (PROOF) currently only works on the cluster

Needs and Wants

  • A way to use PROOF to run analyses on the cloud
  • A way to help labmates less comfortable with the command line use cloud computing

Types of data used

  • ’Omics data
  • Phenotypic data

Image attribution: Photo by Valeriy Khan on StockSnap

last updated July 2024