Homepage

download_taxonomy

description
Downloads the NCBI taxonomy which Kraken2 uses to create a tree and taxon map during the database build
outputs
{'taxonomy': 'The NCBI taxonomy, which is needed by the build_db task. This output is not human-readable or meant for anything other than building a Kraken2 database.'}

Inputs

Required

  • _runtime (Any, required)

Defaults

  • protein (Boolean, default=false): Construct a protein database?

Outputs

  • taxonomy (File)

download_library

description
Downloads a predefined library of reference genomes from NCBI. Detailed organism list for libraries (except nt) available here
warning
This task is particularly prone to failure due to network issues; especially when downloading the larger libraries (bacteria, nr, and nt).
outputs
{'library': 'A library of reference genomes, which is needed by the build_db task. This output is not human-readable or meant for anything other than building a Kraken2 database.'}

Inputs

Required

  • _runtime (Any, required)
  • library_name (String, required); description: Library to download. Note that protein must equal true if downloading the nr library, and protein must equal false if downloading the UniVec or UniVec_Core library.; choices: ['archaea', 'bacteria', 'plasmid', 'viral', 'human', 'fungi', 'plant', 'protozoa', 'nt', 'nr', 'UniVec', 'UniVec_Core']

Defaults

  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation, specified in GB. Default disk size is determined dynamically based on library_name. Note that the default sizes are adequate as of April 2023, but new genomes are constantly being added to the NCBI database. More disk space may be required depending on when in the future this task is run.
  • protein (Boolean, default=false): Construct a protein database?

Outputs

  • library (File)

create_library_from_fastas

description
Adds custom entries from FASTA files to a Kraken2 DB
outputs
{'custom_library': 'Kraken2 compatible library, which is needed by the build_db task. This output is not human-readable or meant for anything other than building a Kraken2 database.'}

Inputs

Required

  • _runtime (Any, required)
  • fastas_gz (Array[File], required): Array of gzipped FASTA files. Each FASTA sequence ID must contain either an NCBI accession number or an explicit assignment of the taxonomy ID using kraken:taxid

Defaults

  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • protein (Boolean, default=false): Construct a protein database?

Outputs

  • custom_library (File)

build_db

description
Builds a custom Kraken2 database
outputs
{'built_db': 'A complete Kraken2 database'}

Inputs

Required

  • _runtime (Any, required)
  • tarballs (Array[File], required): Tarballs containing the NCBI taxonomy (generated by the download_taxonomy task) and at least one library (generated by the download_library or create_library_from_fastas task). Tarballs must not have a root directory.

Defaults

  • db_name (String, default="kraken2_db"); description: Name for output in compressed, archived format. The suffix .tar.gz will be added.; common: true
  • kmer_len (Int, default=if protein then 15 else 35): K-mer length in bp that will be used to build the database
  • max_db_size_gb (Int, default=-1): Maximum number of GBs for Kraken 2 hash table; if the Kraken 2 estimator determines more would normally be needed, the reference library will be downsampled to fit.
  • minimizer_len (Int, default=if protein then 12 else 31): Minimizer length in bp that will be used to build the database
  • minimizer_spaces (Int, default=if protein then 0 else 7): Number of characters in minimizer that are ignored in comparisons
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB.
  • ncpu (Int, default=4); description: Number of cores to allocate for task; common: true
  • protein (Boolean, default=false): Construct a protein database?
  • use_all_cores (Boolean, default=false); description: Use all cores? Recommended for cloud environments.; common: true

Outputs

  • built_db (File)

kraken

description
Runs Kraken2 on a pair of fastq files
outputs
{'report': {'description': 'A Kraken2 summary report', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#sample-report-output-format'}, 'sequences': {'description': 'Detailed Kraken2 output that has been gzipped', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#standard-kraken-output-format'}}

Inputs

Required

  • _runtime (Any, required)
  • db (File, required): Kraken2 database. Can be generated with make-qc-reference.wdl. Must be a tarball without a root directory.
  • read_one_fastq_gz (File, required): Gzipped FASTQ file with 1st reads in pair
  • read_two_fastq_gz (File, required): Gzipped FASTQ file with 2nd reads in pair

Defaults

  • min_base_quality (Int, default=0): Minimum base quality used in classification
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB.
  • ncpu (Int, default=4); description: Number of cores to allocate for task; common: true
  • prefix (String, default=sub(basename(read_one_fastq_gz),"([_\.][rR][12])?(\.subsampled)?\.(fastq|fq)(\.gz)?$","")): Prefix for the Kraken2 output files. The extensions .kraken2.txt and .kraken2.sequences.txt.gz will be added.
  • store_sequences (Boolean, default=false); description: Store and output main Kraken2 output in addition to the summary report?; common: true
  • use_all_cores (Boolean, default=false); description: Use all cores? Recommended for cloud environments.; common: true
  • use_names (Boolean, default=true): Print scientific names instead of just taxids?

Outputs

  • report (File)
  • sequences (File?)