Ngsderive

Homepage

strandedness

description: Derives the experimental strandedness protocol used to generate the input RNA-Seq BAM file. Reports evidence supporting final results.
outputs: {'strandedness_file': 'TSV file containing the ngsderive strandedness report', 'strandedness_string': 'The derived strandedness, in string format'}

Inputs

Required

_runtime (Any, required)
bam (File, required): Input BAM format file to derive strandedness for
bam_index (File, required): BAM index file corresponding to the input BAM
gene_model (File, required): Gene model as a GFF/GTF file

Defaults

min_mapq (Int, default=30); description: Minimum MAPQ to consider for supporting reads; common: true
min_reads_per_gene (Int, default=10); description: Filter any genes that don't have at least min_reads_per_gene reads mapping to them; common: true
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
num_genes (Int, default=1000); description: How many genes to sample; common: true
outfile_name (String, default=basename(bam,".bam") + ".strandedness.tsv"): Name for the strandedness TSV file
split_by_rg (Boolean, default=false); description: Contain one entry in the output TSV per read group, in addition to an overall entry; common: true

Outputs

strandedness_file (File)
strandedness_string (String)

instrument

description: Derives the instrument used to sequence the input BAM file. Reports evidence supporting final results.
outputs: {'instrument_file': 'TSV file containing the ngsderive isntrument report for the input BAM file', 'instrument_string': 'The derived instrument, in string format'}

Inputs

Required

_runtime (Any, required)
bam (File, required): Input BAM format file to derive instrument for

Defaults

modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
num_reads (Int, default=10000); description: How many reads to analyze from the start of the file. Any n < 1 to parse whole file.; common: true
outfile_name (String, default=basename(bam,".bam") + ".instrument.tsv"): Name for the instrument TSV file

Outputs

instrument_file (File)
instrument_string (String)

read_length

description: Derives the original experimental read length of the input BAM. Reports evidence supporting final results.
outputs: {'read_length_file': 'TSV file containing the ngsderive readlen report for the input BAM file'}

Inputs

Required

_runtime (Any, required)
bam (File, required): Input BAM format file to derive read length for
bam_index (File, required): BAM index file corresponding to the input BAM

Defaults

majority_vote_cutoff (Float, default=0.7); description: To call a majority readlen, the maximum read length must have at least majority-vote-cutoff% reads in support; common: true
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
num_reads (Int, default=-1); description: How many reads to analyze from the start of the file. Any n < 1 to parse whole file.; common: true
outfile_name (String, default=basename(bam,".bam") + ".readlength.tsv"): Name for the readlen TSV file

Outputs

read_length_file (File)

encoding

description: Derives the encoding of the input NGS file(s). Reports evidence supporting final results.
outputs: {'inferred_encoding': 'The most permissive encoding found among the input files, in string format', 'encoding_file': 'TSV file containing the ngsderive encoding report for all input files'}

Inputs

Required

_runtime (Any, required)
ngs_files (Array[File], required): An array of FASTQs and/or BAMs for which to derive encoding
outfile_name (String, required): Name for the encoding TSV file

Defaults

modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
num_reads (Int, default=1000000); description: How many reads to analyze from the start of the file(s). Any n < 1 to parse whole file(s).; common: true

Outputs

inferred_encoding (String)
encoding_file (File)

junction_annotation

description: Annotates junctions found in an RNA-Seq BAM as known, novel, or partially novel
external_help: https://stjudecloud.github.io/ngsderive/subcommands/junction_annotation/
outputs: {'junction_summary': 'TSV file containing the ngsderive junction-annotation summary', 'junctions': 'TSV file containing a detailed list of annotated junctions'}

Inputs

Required

_runtime (Any, required)
bam (File, required): Input BAM format file to annotate junctions for
bam_index (File, required): BAM index file corresponding to the input BAM
gene_model (File, required): Gene model as a GFF/GTF file

Defaults

fuzzy_junction_match_range (Int, default=0); description: Consider found splices within +-k bases of a known splice event annotated; common: true
min_intron (Int, default=50); description: Minimum size of intron to be considered a splice; common: true
min_mapq (Int, default=30); description: Minimum MAPQ to consider for supporting reads; common: true
min_reads (Int, default=2); description: Filter any junctions that don't have at least min_reads reads supporting them; common: true
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
prefix (String, default=basename(bam,".bam")): Prefix for the summary TSV and junction files. The extensions .junction_summary.tsv and .junctions.tsv will be added.

Outputs

junction_summary (File)
junctions (File)

endedness

description: Derives the endedness of the input BAM file. Reports evidence for final result.
outputs: {'endedness_file': 'TSV file containing the ngsderive endedness report'}

Inputs

Required

_runtime (Any, required)
bam (File, required): Input BAM format file to derive endedness from

Defaults

calc_rpt (Boolean, default=false); description: Calculate and output Reads-Per-Template. This will produce a more sophisticated estimate for endedness, but uses substantially more memory (can reach up to 200% of BAM size in memory consumption for some inputs).; common: true
lenient (Boolean, default=false); description: Return a zero exit code on unknown results; common: true
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by value of calc_rpt and the size of the input. Specified in GB.
num_reads (Int, default=-1); description: How many reads to analyze from the start of the file. Any n < 1 to parse whole file.; common: true
outfile_name (String, default=basename(bam,".bam") + ".endedness.tsv"): Name for the endedness TSV file
paired_deviance (Float, default=0.0); description: Distance from 0.5 split between number of f+l- reads and f-l+ reads allowed to be called 'Paired-End'. Default of 0.0 only appropriate if the whole file is being processed.; common: true
round_rpt (Boolean, default=false); description: Round RPT to the nearest INT before comparing to expected values. Appropriate if using --num-reads > 0.; common: true
split_by_rg (Boolean, default=false); description: Contain one entry per read group; common: true

Outputs

endedness_file (File)