Homepage

strandedness

description
Derives the experimental strandedness protocol used to generate the input RNA-Seq BAM file. Reports evidence supporting final results.
outputs
{'strandedness_file': 'TSV file containing the ngsderive strandedness report', 'strandedness_string': 'The derived strandedness, in string format'}

Inputs

Required

  • _runtime (Any, required)
  • bam (File, required): Input BAM format file to derive strandedness for
  • bam_index (File, required): BAM index file corresponding to the input BAM
  • gene_model (File, required): Gene model as a GFF/GTF file

Defaults

  • min_mapq (Int, default=30); description: Minimum MAPQ to consider for supporting reads; common: true
  • min_reads_per_gene (Int, default=10); description: Filter any genes that don't have at least min_reads_per_gene reads mapping to them; common: true
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • num_genes (Int, default=1000); description: How many genes to sample; common: true
  • outfile_name (String, default=basename(bam,".bam") + ".strandedness.tsv"): Name for the strandedness TSV file
  • split_by_rg (Boolean, default=false); description: Contain one entry in the output TSV per read group, in addition to an overall entry; common: true

Outputs

  • strandedness_file (File)
  • strandedness_string (String)

instrument

description
Derives the instrument used to sequence the input BAM file. Reports evidence supporting final results.
outputs
{'instrument_file': 'TSV file containing the ngsderive isntrument report for the input BAM file', 'instrument_string': 'The derived instrument, in string format'}

Inputs

Required

  • _runtime (Any, required)
  • bam (File, required): Input BAM format file to derive instrument for

Defaults

  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • num_reads (Int, default=10000); description: How many reads to analyze from the start of the file. Any n < 1 to parse whole file.; common: true
  • outfile_name (String, default=basename(bam,".bam") + ".instrument.tsv"): Name for the instrument TSV file

Outputs

  • instrument_file (File)
  • instrument_string (String)

read_length

description
Derives the original experimental read length of the input BAM. Reports evidence supporting final results.
outputs
{'read_length_file': 'TSV file containing the ngsderive readlen report for the input BAM file'}

Inputs

Required

  • _runtime (Any, required)
  • bam (File, required): Input BAM format file to derive read length for
  • bam_index (File, required): BAM index file corresponding to the input BAM

Defaults

  • majority_vote_cutoff (Float, default=0.7); description: To call a majority readlen, the maximum read length must have at least majority-vote-cutoff% reads in support; common: true
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • num_reads (Int, default=-1); description: How many reads to analyze from the start of the file. Any n < 1 to parse whole file.; common: true
  • outfile_name (String, default=basename(bam,".bam") + ".readlength.tsv"): Name for the readlen TSV file

Outputs

  • read_length_file (File)

encoding

description
Derives the encoding of the input NGS file(s). Reports evidence supporting final results.
outputs
{'inferred_encoding': 'The most permissive encoding found among the input files, in string format', 'encoding_file': 'TSV file containing the ngsderive encoding report for all input files'}

Inputs

Required

  • _runtime (Any, required)
  • ngs_files (Array[File], required): An array of FASTQs and/or BAMs for which to derive encoding
  • outfile_name (String, required): Name for the encoding TSV file

Defaults

  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • num_reads (Int, default=1000000); description: How many reads to analyze from the start of the file(s). Any n < 1 to parse whole file(s).; common: true

Outputs

  • inferred_encoding (String)
  • encoding_file (File)

junction_annotation

description
Annotates junctions found in an RNA-Seq BAM as known, novel, or partially novel
external_help
https://stjudecloud.github.io/ngsderive/subcommands/junction_annotation/
outputs
{'junction_summary': 'TSV file containing the ngsderive junction-annotation summary', 'junctions': 'TSV file containing a detailed list of annotated junctions'}

Inputs

Required

  • _runtime (Any, required)
  • bam (File, required): Input BAM format file to annotate junctions for
  • bam_index (File, required): BAM index file corresponding to the input BAM
  • gene_model (File, required): Gene model as a GFF/GTF file

Defaults

  • fuzzy_junction_match_range (Int, default=0); description: Consider found splices within +-k bases of a known splice event annotated; common: true
  • min_intron (Int, default=50); description: Minimum size of intron to be considered a splice; common: true
  • min_mapq (Int, default=30); description: Minimum MAPQ to consider for supporting reads; common: true
  • min_reads (Int, default=2); description: Filter any junctions that don't have at least min_reads reads supporting them; common: true
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • prefix (String, default=basename(bam,".bam")): Prefix for the summary TSV and junction files. The extensions .junction_summary.tsv and .junctions.tsv will be added.

Outputs

  • junction_summary (File)
  • junctions (File)

endedness

description
Derives the endedness of the input BAM file. Reports evidence for final result.
outputs
{'endedness_file': 'TSV file containing the ngsderive endedness report'}

Inputs

Required

  • _runtime (Any, required)
  • bam (File, required): Input BAM format file to derive endedness from

Defaults

  • calc_rpt (Boolean, default=false); description: Calculate and output Reads-Per-Template. This will produce a more sophisticated estimate for endedness, but uses substantially more memory (can reach up to 200% of BAM size in memory consumption for some inputs).; common: true
  • lenient (Boolean, default=false); description: Return a zero exit code on unknown results; common: true
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by value of calc_rpt and the size of the input. Specified in GB.
  • num_reads (Int, default=-1); description: How many reads to analyze from the start of the file. Any n < 1 to parse whole file.; common: true
  • outfile_name (String, default=basename(bam,".bam") + ".endedness.tsv"): Name for the endedness TSV file
  • paired_deviance (Float, default=0.0); description: Distance from 0.5 split between number of f+l- reads and f-l+ reads allowed to be called 'Paired-End'. Default of 0.0 only appropriate if the whole file is being processed.; common: true
  • round_rpt (Boolean, default=false); description: Round RPT to the nearest INT before comparing to expected values. Appropriate if using --num-reads > 0.; common: true
  • split_by_rg (Boolean, default=false); description: Contain one entry per read group; common: true

Outputs

  • endedness_file (File)