Htseq

Homepage

count

description: Performs read counting for a set of features in the input BAM file
outputs: {'feature_counts': 'A two column TSV file. First column is feature names and second column is counts. Presence of a header is determined by the include_custom_header parameter.'}

Inputs

Required

_runtime (Any, required)
bam (File, required): Input BAM format file to generate feature counts for
gtf (File, required): Input genomic features in gzipped GTF format to count reads for
strandedness (String, required); description: Strandedness protocol of the RNA-Seq experiment; external_help: https://htseq.readthedocs.io/en/latest/htseqcount.html#cmdoption-htseq-count-s; choices: ['yes', 'reverse', 'no']

Defaults

feature_type (String, default="exon"); description: Feature type (3rd column in GTF file) to be used, all features of other type are ignored; common: true
idattr (String, default="gene_name"); description: GFF attribute to be used as feature ID; common: true
include_custom_header (Boolean, default=true); description: Include a custom header for the output file? This is not an official feature of HTSeq. If true, the first line of the output file will be ~{idattr} ~{prefix}. This may break downstream tools that expect the typical headerless HTSeq output format.; common: true
minaqual (Int, default=10); description: Skip all reads with alignment quality lower than the given minimum value; common: true
mode (String, default="union"); description: Mode to handle reads overlapping more than one feature. union is recommended for most use-cases.; external_help: https://htseq.readthedocs.io/en/latest/htseqcount.html#htseq-count-counting-reads-within-features; choices: ['union', 'intersection-strict', 'intersection-nonempty']
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB.
nonunique (Boolean, default=false); description: Score reads that align to or are assigned to more than one feature?; common: true
pos_sorted (Boolean, default=false); description: Is the BAM position sorted (true) or name sorted (false)? It is highly recommended to use a name sorted BAM file. This is because HTSeq will re-sort position-sorted BAMs with an inefficient algorithm, causing very large memory and disk space allocations (especially for large BAMs).; common: true
prefix (String, default=basename(bam,".bam")): Prefix for the feature counts file. The extension .feature-counts.txt will be added.
secondary_alignments (Boolean, default=false); description: Score secondary alignments (SAM flag 0x100)?; common: true
supplementary_alignments (Boolean, default=false); description: Score supplementary/chimeric alignments (SAM flag 0x800)?; common: true

Outputs

feature_counts (File)

calc_tpm

description: Given a gene counts file and a gene lengths file, calculate Transcripts Per Million (TPM)
outputs: {'tpm_file': 'Transcripts Per Million (TPM) file. A two column headered TSV file.'}

Inputs

Required

_runtime (Any, required)
counts (File, required): A two column TSV file with gene names in the first column and counts (as integers) in the second column. Entries starting with '__' will be discarded. Can be generated with the count task.
gene_lengths (File, required): A two column headered TSV file with gene names (matching those in the counts file) in the first column and feature lengths (as integers) in the second column. Can be generated with the calc_gene_lengths task in util.wdl.

Defaults

has_header (Boolean, default=true): Does the counts file have a header line? If true, the first line will be ignored.
prefix (String, default=basename(counts,".feature-counts.txt")): Prefix for the TPM file. The extension .TPM.txt will be added.

Outputs

tpm_file (File)