This repository contains all bioinformatics workflows used on the St. Jude Cloud project. Officially, the repository is in beta — the project is adding workflows as they are developed and put into production.
Resources requirements have been optimized to minimize failures in our computing environment, but they may not reflect the best settings for your use case. Please ensure that you tailor these parameters to fit your needs.
🏠 Homepage
Please excuse the state of our documentation. We are working on some big changes around here, and with those changes will come much improved documentation.
Repository Structure
The repository is laid out as follows:
workflows/
- Directory containing all end-to-end bioinformatics workflows.tools/
- All tools we have wrapped as individual WDL tasks.data_structures/
- WDLstruct
definitions and tasks or workflows related to their construction, parsing, or validation.docker/
- Dockerfiles used in our workflows. All docker images are published to the GitHub Container Registy as a part of our CI and are versioned.tests/
- Home to all of our testing infrastructure. We use pytest-workflow for validating our code.bin/
- no longer in use Scripts used by Cromwell configuration settings. Add this to$PATH
prior to using configurations inconf
with Cromwell.conf/
- no longer in use Cromwell configuration files created for various environments that we use across our team. Feel free to use/fork/suggest improvements.
Bootstrap guide
This repository implements workflows using the Workflow Description Language (WDL). If unfamiliar with WDL, a short overview is available in the WDL spec.
The workflows and tasks in this repository should require minimal set-up and configuration before you're ready to run. You don't even need to clone the repo! The bare minimum requirements are a locally installed WDL runner and an internet connection.
The exact steps for installation, configuration, and execution are going to depend on you environment and preferred engine. There are a variety of WDL engines you could use, though our team prefers miniwdl. We also make use of the miniwdl-lsf
plugin for running on our LSF cluster.
Most WDL runners are capable of running a WDL file from a URL. This is how we most commonly execute our workflows and tasks. The below command could be used to submit a run of our rnaseq-standard workflow using miniwdl
:
miniwdl run --verbose --input inputs.json https://raw.githubusercontent.com/stjudecloud/workflows/rnaseq-standard/v3.0.1/workflows/rnaseq/rnaseq-standard.wdl
For an introduction to WDL, there are many guides, one of which is from Terra.
Author
👤 St. Jude Cloud Team
- Website: https://stjude.cloud
- Github: @stjudecloud
- Twitter: @StJudeResearch
Tests
Every task in this repository is covered by at least one test (see all of our tests in tests/tools/
). These are run using pytest-workflow.
The command for running our tests should be executed at the root of the repo: python -m pytest --kwdof --git-aware
🤝 Contributing
Contributions, issues and feature requests are welcome!
Feel free to check issues page. You can also take a look at the contributing guide.
Links worth checking out
Our preferred WDL runner: miniwdl
Most of our tasks are run inside a BioContainers image
Our tasks are validated using pytest-workflow
📝 License
Copyright © 2020-Present St. Jude Cloud Team.
This project is MIT licensed.