Utilities

download

description
Uses wget to download a file from a remote URL to the local filesystem
outputs
{'downloaded_file': 'File downloaded from provided URL'}

Inputs

Required

  • _runtime (Any, required)
  • disk_size_gb (Int, required): Disk space to allocate for task, specified in GB
  • outfile_name (String, required): Name of the output file
  • url (String, required): URL of the file to download

Optional

  • md5sum (String?): Optional md5sum to check against downloaded file. Recommended to use in order to catch corruption or an unintentional file swap.

Outputs

  • downloaded_file (File)

get_read_groups

description
Gets read group information from a BAM file and writes it out to as a string
outputs
{'read_groups': 'An array of strings containing read group information. If format_for_star = true, all found read groups are contained in one string (read_groups[0]). If format_for_star = false, each found @RG line will be its own entry in output array read_groups.'}

Inputs

Required

  • _runtime (Any, required)
  • bam (File, required); description: Input BAM format file to get read groups from; stream: true

Defaults

  • format_for_star (Boolean, default=true); description: Format read group information for the STAR aligner (true) or output @RG lines of the header without further processing (false)? STAR formatted results will be an array of length 1, where all found read groups are contained in one string (read_groups[0]). If no processing is selected, each found @RG line will be its own entry in output array read_groups.; common: true
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.

Outputs

  • read_groups (Array[String])

split_string

description
Split a string into an array of strings based on a delimiter
outputs
{'split_strings': 'Split string as an array'}

Inputs

Required

  • _runtime (Any, required)
  • string (String, required): String to split on occurences of delimiter

Defaults

  • delimiter (String, default=" , "); description: Delimiter on which to split input_string; common: true

Outputs

  • split_strings (Array[String])

calc_gene_lengths

description
Calculate gene lengths from a GTF feature file using the non-overlapping exonic length algorithm
help
The non-overlapping exonic length algorithm can be implemented as the sum of each base covered by at least one exon; where each base is given a value of 1 regardless of how many exons overlap it.
outputs
{'gene_lengths': 'A two column headered TSV file with gene names in the first column and feature lengths (as integers) in the second column'}

Inputs

Required

  • _runtime (Any, required)
  • gtf (File, required): GTF feature file

Defaults

  • idattr (String, default="gene_name"); description: GTF attribute to be used as feature ID. The value of this attribute will be used as the first column in the output file.; common: true
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • outfile_name (String, default=basename(gtf,".gtf.gz") + ".genelengths.txt"): Name of the gene lengths file

Outputs

  • gene_lengths (File)

compression_integrity

description
Checks the compression integrity of a bgzipped file
outputs
{'check': 'Dummy output to indicate success and to enable call-caching'}

Inputs

Required

  • _runtime (Any, required)
  • bgzipped_file (File, required): Input bgzipped file to check integrity of

Defaults

  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.

Outputs

  • check (String)

add_to_bam_header

description
Adds another line of text to the bottom of a BAM header
outputs
{'reheadered_bam': 'The BAM after its header has been modified'}

Inputs

Required

  • _runtime (Any, required)
  • additional_header (String, required): A string to add as a new line in the BAM header. No format checking is done, so please ensure you do not invalidate your BAM with this task. Add only spec compliant entries to the header.
  • bam (File, required): Input BAM format file which will have its header added to

Defaults

  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • prefix (String, default=basename(bam,".bam") + ".reheader"): Prefix for the reheadered BAM. The extension .bam will be added.

Outputs

  • reheadered_bam (File)

unpack_tarball

description
Accepts a .tar.gz archive and converts it into a flat array of files. Any directory structure of the archive is ignored.
outputs
{'tarball_contents': 'An array of files found in the input tarball'}

Inputs

Required

  • _runtime (Any, required)
  • tarball (File, required): A .tar.gz archive to unpack into individual files

Defaults

  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.

Outputs

  • tarball_contents (Array[File])

make_coverage_regions_bed

description
Takes in a GTF file, converts it to BED, then filters it down to a 3 column BED file from all lines which match a given feature type
outputs
{'bed': '3 column BED file corresponding to all records in the input GTF with a feature type matching feature_type'}

Inputs

Required

  • _runtime (Any, required)
  • feature_type (String, required): choices below are the possible values from a GENCODE GTF file. If you are using a different GTF source, you may need to adjust this parameter.; Choices: ['gene', 'transcript', 'exon', 'CDS', 'UTR', 'start_codon', 'stop_codon', 'Selenocysteine']; description: Feature type to filter on. Only lines with this feature type will be included in the output BED file.
  • gtf (File, required): Gzipped GTF feature file from which to derive a coverage regions BED file

Defaults

  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • outfile_name (String, default=basename(gtf,"gtf.gz") + feature_type + ".bed"): Name of the output BED file

Outputs

  • bed (File)

global_phred_scores

description
Calculates statistics about PHRED scores of the input BAM
outputs
{'phred_scores': 'Headered TSV file containing PHRED score statistics'}

Inputs

Required

  • _runtime (Any, required)
  • bam (File, required): Input BAM format file to calculate PHRED score statistics for

Defaults

  • fast_mode (Boolean, default=true): Enable fast mode (true) or calculate statistics for every base in the BAM (false)?
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • prefix (String, default=basename(bam,".bam")): Prefix for the output TSV file. The extension .global_PHRED_scores.tsv will be added.

Outputs

  • phred_scores (File)

qc_summary

description
[OUT OF DATE] This WDL task pulls out keys metrics that can provide a high level overview of the sample, without needing to examine the entire MultiQC report. Currently, these key metrics come from Qualimap and ngsderive.
outputs
{'summary': 'QC summary file in JSON format'}

Inputs

Required

  • _runtime (Any, required)
  • multiqc_tar_gz (File, required): MultiQC report tarball from which to extract key metrics

Defaults

  • outfile_name (String, default=basename(multiqc_tar_gz,".multiqc.tar.gz") + ".qc_summary.json"): Name for the JSON file

Outputs

  • summary (File)

split_fastq

description
Splits a FASTQ into multiple files based on the number of reads per file
outputs
{'fastqs': 'Array of FASTQ files, each containing a subset of the input FASTQ'}

Inputs

Required

  • _runtime (Any, required)
  • fastq (File, required); description: Gzipped FASTQ file to split; stream: true

Defaults

  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • ncpu (Int, default=2): Number of cores to allocate for task
  • prefix (String, default=sub(basename(fastq),"(fastq|fq)\.gz$","")): Prefix for the FASTQ files. The extension .fastq.gz (preceded by a split index) will be added.
  • reads_per_file (Int, default=10000000): Number of reads to include in each output FASTQ file

Outputs

  • fastqs (Array[File])