Utilities
download
- description
- Uses wget to download a file from a remote URL to the local filesystem
- outputs
- {'downloaded_file': 'File downloaded from provided URL'}
Inputs
Required
_runtime
(Any, required)disk_size_gb
(Int, required): Disk space to allocate for task, specified in GBoutfile_name
(String, required): Name of the output fileurl
(String, required): URL of the file to download
Optional
md5sum
(String?): Optional md5sum to check against downloaded file. Recommended to use in order to catch corruption or an unintentional file swap.
Outputs
downloaded_file
(File)
get_read_groups
- description
- Gets read group information from a BAM file and writes it out to as a string
- outputs
- {'read_groups': 'An array of strings containing read group information. If
format_for_star = true
, all found read groups are contained in one string (read_groups[0]
). Ifformat_for_star = false
, each found @RG line will be its own entry in output arrayread_groups
.'}
Inputs
Required
_runtime
(Any, required)bam
(File, required); description: Input BAM format file to get read groups from; stream: true
Defaults
format_for_star
(Boolean, default=true); description: Format read group information for the STAR aligner (true) or output @RG lines of the header without further processing (false)? STAR formatted results will be an array of length 1, where all found read groups are contained in one string (read_groups[0]
). If no processing is selected, each found @RG line will be its own entry in output arrayread_groups
.; common: truemodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
Outputs
read_groups
(Array[String])
split_string
- description
- Split a string into an array of strings based on a delimiter
- outputs
- {'split_strings': 'Split string as an array'}
Inputs
Required
_runtime
(Any, required)string
(String, required): String to split on occurences ofdelimiter
Defaults
delimiter
(String, default=" , "); description: Delimiter on which to splitinput_string
; common: true
Outputs
split_strings
(Array[String])
calc_gene_lengths
- description
- Calculate gene lengths from a GTF feature file using the non-overlapping exonic length algorithm
- help
- The non-overlapping exonic length algorithm can be implemented as the sum of each base covered by at least one exon; where each base is given a value of 1 regardless of how many exons overlap it.
- outputs
- {'gene_lengths': 'A two column headered TSV file with gene names in the first column and feature lengths (as integers) in the second column'}
Inputs
Required
_runtime
(Any, required)gtf
(File, required): GTF feature file
Defaults
idattr
(String, default="gene_name"); description: GTF attribute to be used as feature ID. The value of this attribute will be used as the first column in the output file.; common: truemodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.outfile_name
(String, default=basename(gtf,".gtf.gz") + ".genelengths.txt"): Name of the gene lengths file
Outputs
gene_lengths
(File)
compression_integrity
- description
- Checks the compression integrity of a bgzipped file
- outputs
- {'check': 'Dummy output to indicate success and to enable call-caching'}
Inputs
Required
_runtime
(Any, required)bgzipped_file
(File, required): Input bgzipped file to check integrity of
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
Outputs
check
(String)
add_to_bam_header
- description
- Adds another line of text to the bottom of a BAM header
- outputs
- {'reheadered_bam': 'The BAM after its header has been modified'}
Inputs
Required
_runtime
(Any, required)additional_header
(String, required): A string to add as a new line in the BAM header. No format checking is done, so please ensure you do not invalidate your BAM with this task. Add only spec compliant entries to the header.bam
(File, required): Input BAM format file which will have its header added to
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.prefix
(String, default=basename(bam,".bam") + ".reheader"): Prefix for the reheadered BAM. The extension.bam
will be added.
Outputs
reheadered_bam
(File)
unpack_tarball
- description
- Accepts a
.tar.gz
archive and converts it into a flat array of files. Any directory structure of the archive is ignored. - outputs
- {'tarball_contents': 'An array of files found in the input tarball'}
Inputs
Required
_runtime
(Any, required)tarball
(File, required): A.tar.gz
archive to unpack into individual files
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
Outputs
tarball_contents
(Array[File])
make_coverage_regions_bed
- description
- Takes in a GTF file, converts it to BED, then filters it down to a 3 column BED file from all lines which match a given feature type
- outputs
- {'bed': '3 column BED file corresponding to all records in the input GTF with a feature type matching
feature_type
'}
Inputs
Required
_runtime
(Any, required)feature_type
(String, required):choices
below are the possible values from a GENCODE GTF file. If you are using a different GTF source, you may need to adjust this parameter.; Choices: ['gene', 'transcript', 'exon', 'CDS', 'UTR', 'start_codon', 'stop_codon', 'Selenocysteine']; description: Feature type to filter on. Only lines with this feature type will be included in the output BED file.gtf
(File, required): Gzipped GTF feature file from which to derive a coverage regions BED file
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.outfile_name
(String, default=basename(gtf,"gtf.gz") + feature_type + ".bed"): Name of the output BED file
Outputs
bed
(File)
global_phred_scores
- description
- Calculates statistics about PHRED scores of the input BAM
- outputs
- {'phred_scores': 'Headered TSV file containing PHRED score statistics'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file to calculate PHRED score statistics for
Defaults
fast_mode
(Boolean, default=true): Enable fast mode (true) or calculate statistics for every base in the BAM (false)?modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.prefix
(String, default=basename(bam,".bam")): Prefix for the output TSV file. The extension.global_PHRED_scores.tsv
will be added.
Outputs
phred_scores
(File)
qc_summary
- description
- [OUT OF DATE] This WDL task pulls out keys metrics that can provide a high level overview of the sample, without needing to examine the entire MultiQC report. Currently, these key metrics come from Qualimap and ngsderive.
- outputs
- {'summary': 'QC summary file in JSON format'}
Inputs
Required
_runtime
(Any, required)multiqc_tar_gz
(File, required): MultiQC report tarball from which to extract key metrics
Defaults
outfile_name
(String, default=basename(multiqc_tar_gz,".multiqc.tar.gz") + ".qc_summary.json"): Name for the JSON file
Outputs
summary
(File)
split_fastq
- description
- Splits a FASTQ into multiple files based on the number of reads per file
- outputs
- {'fastqs': 'Array of FASTQ files, each containing a subset of the input FASTQ'}
Inputs
Required
_runtime
(Any, required)fastq
(File, required); description: Gzipped FASTQ file to split; stream: true
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.ncpu
(Int, default=2): Number of cores to allocate for taskprefix
(String, default=sub(basename(fastq),"(fastq|fq)\.gz$","")): Prefix for the FASTQ file. The extension.fq.gz
will be added.reads_per_file
(Int, default=10000000): Number of reads to include in each output FASTQ file
Outputs
fastqs
(Array[File])