Utilities

download

description: Uses wget to download a file from a remote URL to the local filesystem
outputs: {'downloaded_file': 'File downloaded from provided URL'}

Inputs

Required

_runtime (Any, required)
disk_size_gb (Int, required): Disk space to allocate for task, specified in GB
outfile_name (String, required): Name of the output file
url (String, required): URL of the file to download

Optional

md5sum (String?): Optional md5sum to check against downloaded file. Recommended to use in order to catch corruption or an unintentional file swap.

Outputs

downloaded_file (File)

get_read_groups

description: Gets read group information from a BAM file and writes it out to as a string
outputs: {'read_groups': 'An array of strings containing read group information. If format_for_star = true, all found read groups are contained in one string (read_groups[0]). If format_for_star = false, each found @RG line will be its own entry in output array read_groups.'}

Inputs

Required

_runtime (Any, required)
bam (File, required); description: Input BAM format file to get read groups from; stream: true

Defaults

format_for_star (Boolean, default=true); description: Format read group information for the STAR aligner (true) or output @RG lines of the header without further processing (false)? STAR formatted results will be an array of length 1, where all found read groups are contained in one string (read_groups[0]). If no processing is selected, each found @RG line will be its own entry in output array read_groups.; common: true
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.

Outputs

read_groups (Array[String])

split_string

description: Split a string into an array of strings based on a delimiter
outputs: {'split_strings': 'Split string as an array'}

Inputs

Required

_runtime (Any, required)
string (String, required): String to split on occurences of delimiter

Defaults

delimiter (String, default=" , "); description: Delimiter on which to split input_string; common: true

Outputs

split_strings (Array[String])

calc_gene_lengths

description: Calculate gene lengths from a GTF feature file using the non-overlapping exonic length algorithm
help: The non-overlapping exonic length algorithm can be implemented as the sum of each base covered by at least one exon; where each base is given a value of 1 regardless of how many exons overlap it.
outputs: {'gene_lengths': 'A two column headered TSV file with gene names in the first column and feature lengths (as integers) in the second column'}

Inputs

Required

_runtime (Any, required)
gtf (File, required): GTF feature file

Defaults

idattr (String, default="gene_name"); description: GTF attribute to be used as feature ID. The value of this attribute will be used as the first column in the output file.; common: true
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
outfile_name (String, default=basename(gtf,".gtf.gz") + ".genelengths.txt"): Name of the gene lengths file

Outputs

gene_lengths (File)

compression_integrity

description: Checks the compression integrity of a bgzipped file
outputs: {'check': 'Dummy output to indicate success and to enable call-caching'}

Inputs

Required

_runtime (Any, required)
bgzipped_file (File, required): Input bgzipped file to check integrity of

Defaults

modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.

Outputs

check (String)

add_to_bam_header

description: Adds another line of text to the bottom of a BAM header
outputs: {'reheadered_bam': 'The BAM after its header has been modified'}

Inputs

Required

_runtime (Any, required)
additional_header (String, required): A string to add as a new line in the BAM header. No format checking is done, so please ensure you do not invalidate your BAM with this task. Add only spec compliant entries to the header.
bam (File, required): Input BAM format file which will have its header added to

Defaults

modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
prefix (String, default=basename(bam,".bam") + ".reheader"): Prefix for the reheadered BAM. The extension .bam will be added.

Outputs

reheadered_bam (File)

unpack_tarball

description: Accepts a .tar.gz archive and converts it into a flat array of files. Any directory structure of the archive is ignored.
outputs: {'tarball_contents': 'An array of files found in the input tarball'}

Inputs

Required

_runtime (Any, required)
tarball (File, required): A .tar.gz archive to unpack into individual files

Defaults

modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.

Outputs

tarball_contents (Array[File])

make_coverage_regions_bed

description: Takes in a GTF file, converts it to BED, then filters it down to a 3 column BED file from all lines which match a given feature type
outputs: {'bed': '3 column BED file corresponding to all records in the input GTF with a feature type matching feature_type'}

Inputs

Required

_runtime (Any, required)
feature_type (String, required): choices below are the possible values from a GENCODE GTF file. If you are using a different GTF source, you may need to adjust this parameter.; Choices: ['gene', 'transcript', 'exon', 'CDS', 'UTR', 'start_codon', 'stop_codon', 'Selenocysteine']; description: Feature type to filter on. Only lines with this feature type will be included in the output BED file.
gtf (File, required): Gzipped GTF feature file from which to derive a coverage regions BED file

Defaults

modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
outfile_name (String, default=basename(gtf,"gtf.gz") + feature_type + ".bed"): Name of the output BED file

Outputs

bed (File)

global_phred_scores

description: Calculates statistics about PHRED scores of the input BAM
outputs: {'phred_scores': 'Headered TSV file containing PHRED score statistics'}

Inputs

Required

_runtime (Any, required)
bam (File, required): Input BAM format file to calculate PHRED score statistics for

Defaults

fast_mode (Boolean, default=true): Enable fast mode (true) or calculate statistics for every base in the BAM (false)?
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
prefix (String, default=basename(bam,".bam")): Prefix for the output TSV file. The extension .global_PHRED_scores.tsv will be added.

Outputs

phred_scores (File)

qc_summary

description: [OUT OF DATE] This WDL task pulls out keys metrics that can provide a high level overview of the sample, without needing to examine the entire MultiQC report. Currently, these key metrics come from Qualimap and ngsderive.
outputs: {'summary': 'QC summary file in JSON format'}

Inputs

Required

_runtime (Any, required)
multiqc_tar_gz (File, required): MultiQC report tarball from which to extract key metrics

Defaults

outfile_name (String, default=basename(multiqc_tar_gz,".multiqc.tar.gz") + ".qc_summary.json"): Name for the JSON file

Outputs

summary (File)

split_fastq

description: Splits a FASTQ into multiple files based on the number of reads per file
outputs: {'fastqs': 'Array of FASTQ files, each containing a subset of the input FASTQ'}

Inputs

Required

_runtime (Any, required)
fastq (File, required); description: Gzipped FASTQ file to split; stream: true

Defaults

modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
ncpu (Int, default=2): Number of cores to allocate for task
prefix (String, default=sub(basename(fastq),"(fastq|fq)\.gz$","")): Prefix for the FASTQ files. The extension .fastq.gz (preceded by a split index) will be added.
reads_per_file (Int, default=10000000): Number of reads to include in each output FASTQ file

Outputs

fastqs (Array[File])