Homepage

count

description
Performs read counting for a set of features in the input BAM file
outputs
{'feature_counts': 'A two column TSV file. First column is feature names and second column is counts. Presence of a header is determined by the include_custom_header parameter.'}

Inputs

Required

  • _runtime (Any, required)
  • bam (File, required): Input BAM format file to generate feature counts for
  • gtf (File, required): Input genomic features in gzipped GTF format to count reads for
  • strandedness (String, required); description: Strandedness protocol of the RNA-Seq experiment; external_help: https://htseq.readthedocs.io/en/latest/htseqcount.html#cmdoption-htseq-count-s; choices: ['yes', 'reverse', 'no']

Defaults

  • feature_type (String, default="exon"); description: Feature type (3rd column in GTF file) to be used, all features of other type are ignored; common: true
  • idattr (String, default="gene_name"); description: GFF attribute to be used as feature ID; common: true
  • include_custom_header (Boolean, default=true); description: Include a custom header for the output file? This is not an official feature of HTSeq. If true, the first line of the output file will be ~{idattr} ~{prefix}. This may break downstream tools that expect the typical headerless HTSeq output format.; common: true
  • minaqual (Int, default=10); description: Skip all reads with alignment quality lower than the given minimum value; common: true
  • mode (String, default="union"); description: Mode to handle reads overlapping more than one feature. union is recommended for most use-cases.; external_help: https://htseq.readthedocs.io/en/latest/htseqcount.html#htseq-count-counting-reads-within-features; choices: ['union', 'intersection-strict', 'intersection-nonempty']
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB.
  • nonunique (Boolean, default=false); description: Score reads that align to or are assigned to more than one feature?; common: true
  • pos_sorted (Boolean, default=false); description: Is the BAM position sorted (true) or name sorted (false)? It is highly recommended to use a name sorted BAM file. This is because HTSeq will re-sort position-sorted BAMs with an inefficient algorithm, causing very large memory and disk space allocations (especially for large BAMs).; common: true
  • prefix (String, default=basename(bam,".bam")): Prefix for the feature counts file. The extension .feature-counts.txt will be added.
  • secondary_alignments (Boolean, default=false); description: Score secondary alignments (SAM flag 0x100)?; common: true
  • supplementary_alignments (Boolean, default=false); description: Score supplementary/chimeric alignments (SAM flag 0x800)?; common: true

Outputs

  • feature_counts (File)

calc_tpm

description
Given a gene counts file and a gene lengths file, calculate Transcripts Per Million (TPM)
outputs
{'tpm_file': 'Transcripts Per Million (TPM) file. A two column headered TSV file.'}

Inputs

Required

  • _runtime (Any, required)
  • counts (File, required): A two column TSV file with gene names in the first column and counts (as integers) in the second column. Entries starting with '__' will be discarded. Can be generated with the count task.
  • gene_lengths (File, required): A two column headered TSV file with gene names (matching those in the counts file) in the first column and feature lengths (as integers) in the second column. Can be generated with the calc_gene_lengths task in util.wdl.

Defaults

  • has_header (Boolean, default=true): Does the counts file have a header line? If true, the first line will be ignored.
  • prefix (String, default=basename(counts,".feature-counts.txt")): Prefix for the TPM file. The extension .TPM.txt will be added.

Outputs

  • tpm_file (File)