Kraken2

download_taxonomy

description: Downloads the NCBI taxonomy which Kraken2 uses to create a tree and taxon map during the database build
outputs: {'taxonomy': 'The NCBI taxonomy, which is needed by the build_db task. This output is not human-readable or meant for anything other than building a Kraken2 database.'}

description: Downloads a predefined library of reference genomes from NCBI. Detailed organism list for libraries (except nt) available here
warning: This task is particularly prone to failure due to network issues; especially when downloading the larger libraries (bacteria, nr, and nt).
outputs: {'library': 'A library of reference genomes, which is needed by the build_db task. This output is not human-readable or meant for anything other than building a Kraken2 database.'}

_runtime (Any, required)
library_name (String, required); description: Library to download. Note that protein must equal true if downloading the nr library, and protein must equal false if downloading the UniVec or UniVec_Core library.; choices: ['archaea', 'bacteria', 'plasmid', 'viral', 'human', 'fungi', 'plant', 'protozoa', 'nt', 'nr', 'UniVec', 'UniVec_Core']

modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation, specified in GB. Default disk size is determined dynamically based on library_name. Note that the default sizes are adequate as of April 2023, but new genomes are constantly being added to the NCBI database. More disk space may be required depending on when in the future this task is run.
protein (Boolean, default=false): Construct a protein database?

description: Adds custom entries from FASTA files to a Kraken2 DB
outputs: {'custom_library': 'Kraken2 compatible library, which is needed by the build_db task. This output is not human-readable or meant for anything other than building a Kraken2 database.'}

_runtime (Any, required)
fastas_gz (Array[File], required): Array of gzipped FASTA files. Each FASTA sequence ID must contain either an NCBI accession number or an explicit assignment of the taxonomy ID using kraken:taxid

modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
protein (Boolean, default=false): Construct a protein database?

_runtime (Any, required)
tarballs (Array[File], required): Tarballs containing the NCBI taxonomy (generated by the download_taxonomy task) and at least one library (generated by the download_library or create_library_from_fastas task). Tarballs must not have a root directory.

db_name (String, default="kraken2_db"); description: Name for output in compressed, archived format. The suffix .tar.gz will be added.; common: true
kmer_len (Int, default=if protein then 15 else 35): K-mer length in bp that will be used to build the database
max_db_size_gb (Int, default=-1): Maximum number of GBs for Kraken 2 hash table; if the Kraken 2 estimator determines more would normally be needed, the reference library will be downsampled to fit.
minimizer_len (Int, default=if protein then 12 else 31): Minimizer length in bp that will be used to build the database
minimizer_spaces (Int, default=if protein then 0 else 7): Number of characters in minimizer that are ignored in comparisons
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB.
ncpu (Int, default=4); description: Number of cores to allocate for task; common: true
protein (Boolean, default=false): Construct a protein database?
use_all_cores (Boolean, default=false); description: Use all cores? Recommended for cloud environments.; common: true

description: Runs Kraken2 on a pair of fastq files
outputs: {'report': {'description': 'A Kraken2 summary report', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#sample-report-output-format'}, 'sequences': {'description': 'Detailed Kraken2 output that has been gzipped', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#standard-kraken-output-format'}}

_runtime (Any, required)
db (File, required): Kraken2 database. Can be generated with make-qc-reference.wdl. Must be a tarball without a root directory.
read_one_fastq_gz (File, required): Gzipped FASTQ file with 1st reads in pair
read_two_fastq_gz (File, required): Gzipped FASTQ file with 2nd reads in pair

min_base_quality (Int, default=0): Minimum base quality used in classification
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB.
ncpu (Int, default=4); description: Number of cores to allocate for task; common: true
prefix (String, default=sub(basename(read_one_fastq_gz),"([_\.][rR][12])?(\.subsampled)?\.(fastq|fq)(\.gz)?$","")): Prefix for the Kraken2 output files. The extensions .kraken2.txt and .kraken2.sequences.txt.gz will be added.
store_sequences (Boolean, default=false); description: Store and output main Kraken2 output in addition to the summary report?; common: true
use_all_cores (Boolean, default=false); description: Use all cores? Recommended for cloud environments.; common: true
use_names (Boolean, default=true): Print scientific names instead of just taxids?