Research Informatics
Research Informatics supports the use of biomedical datasets in the research context that often include large scale public or private genomic data, licensed or regulated datasets (such as those under DUAs or legal protection like GDPR), or laboratory-generated data that require significant computational processing as part of their analysis. We focus on providing open-source tooling, best practices for computational workloads and data management, and other resources such as computational workflows and templates that staff can use as a jumping off point to customize for their own work. We work closely with Fred Hutch IT’s Scientific Computing group who provide support for our on-prem computing cluster, scientific data storage, and research applications.
Our Work
This group is recently formed and is starting on developing its support offerings for the community.
Workflow Support in WILDS
We’re developing resources in collaboration with the Fred Hutch IT Scientific Computing group to support simplified interfaces to computing resources through our project PROOF. This will facilitate users running WDL workflows on our cluster and you can find our emerging resources in our WILDS GitHub organization, such as our WDL supports.
Research Data Applications
We support the soon-to-be-released cBioPortal instance in collaboration with Fred Hutch IT’s Scientific Computing group.
Connect with Us
The DaSL Research Informatics team provides support to the community through documentation on the Biomedical Data Science Wiki, through our new Community Studios program, and through our Data House Calls program.
Research Data Management House Calls
Use this Data House Call to discuss questions about research data management, infrastructure, or tools:
- Developing lab or group data management schemes that support research, leverage Fred Hutch infrastructure, and are cost-effective
- Transferring, storing, or collaborating on large datasets
- Accessing data in cloud storages
- Accessing public datasets
- How to work with DUAs, licensed or private datasets
- Learning about and accessing Fred Hutch hosted datasets (e.g., data from our clinics, our large research groups, and reference datasets)
- Managing and accessing reference datasets such as genome data and common bioinformatic tool reference data sets
Or other questions you have that you feel might relate to this topic!
Research Computing House Calls
Use this Data House Call to discuss questions about how to run your code or analysis more effectively and at scale:
- Using our computing cluster for your analyses
- Writing and running scientific workflows
- Managing and executing WDL workflows via PROOF
- Bioinformatics software and environment management (including Docker!)
- Leveraging Git/GitHub for code management and sharing
- Leveraging cloud-based data storage and/or computing
- Collaboration and sharing of workflows, environments, and large datasets
Or other questions you have that you feel might relate to this topic!
Note: Often we pull in folks from the Scientific Computing team to provide expert advice as well so please describe your needs well so we can try to have the right staff ready.
Our Staff
Taylor Firman - Research Informatics Lead
Taylor’s background is in computational biophysics, clinical bioinformatics, and high-performance computing. His work is focused on generating open source bioinformatics software tools for the Fred Hutch research community while fostering a spirit of collaboration through data consultations and community building.
Sitapriya Moorthi - Affiliate Staff Scientist
Sita’s research background is in cancer biology, genetics, and genomics. Her experience spans a breadth of projects across multiple cancer types including leukemia, lung, breast, prostate, and gastric cancers. Sita has hands-on experience, ranging from conducting intricate bench experiments in molecular-cellular biology and genetics, to analyzing complex data from whole genome, exome, and single-cell sequencing. She is enthusiastic about creating efficient workflows and systems that bridge the data chasm, enabling fellow scientists to focus more on the research at hand, and less on the data gap. Sita works with our Research Informatics group.
Ted Laderas - Data Scientist
Ted Laderas is a Data Scientist helping to establish Community Studios at DaSL. He has worked with lots of different data types and knows his way around a workflow. He champions building learning communities of practice in science and research that are psychologically safe and inclusive.