download_taxonomy
- description
- Downloads the NCBI taxonomy which Kraken2 uses to create a tree and taxon map during the database build
- outputs
- {'taxonomy': 'The NCBI taxonomy, which is needed by the
build_db
task. This output is not human-readable or meant for anything other than building a Kraken2 database.'}
Inputs
Required
_runtime
(Any, required)
Defaults
protein
(Boolean, default=false): Construct a protein database?
Outputs
taxonomy
(File)
download_library
- description
- Downloads a predefined library of reference genomes from NCBI. Detailed organism list for libraries (except nt) available here
- warning
- This task is particularly prone to failure due to network issues; especially when downloading the larger libraries (bacteria, nr, and nt).
- outputs
- {'library': 'A library of reference genomes, which is needed by the
build_db
task. This output is not human-readable or meant for anything other than building a Kraken2 database.'}
Inputs
Required
_runtime
(Any, required)library_name
(String, required); description: Library to download. Note thatprotein
must equaltrue
if downloading thenr
library, andprotein
must equalfalse
if downloading theUniVec
orUniVec_Core
library.; choices: ['archaea', 'bacteria', 'plasmid', 'viral', 'human', 'fungi', 'plant', 'protozoa', 'nt', 'nr', 'UniVec', 'UniVec_Core']
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation, specified in GB. Default disk size is determined dynamically based onlibrary_name
. Note that the default sizes are adequate as of April 2023, but new genomes are constantly being added to the NCBI database. More disk space may be required depending on when in the future this task is run.protein
(Boolean, default=false): Construct a protein database?
Outputs
library
(File)
create_library_from_fastas
- description
- Adds custom entries from FASTA files to a Kraken2 DB
- outputs
- {'custom_library': 'Kraken2 compatible library, which is needed by the
build_db
task. This output is not human-readable or meant for anything other than building a Kraken2 database.'}
Inputs
Required
_runtime
(Any, required)fastas_gz
(Array[File], required): Array of gzipped FASTA files. Each FASTA sequence ID must contain either an NCBI accession number or an explicit assignment of the taxonomy ID usingkraken:taxid
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.protein
(Boolean, default=false): Construct a protein database?
Outputs
custom_library
(File)
build_db
- description
- Builds a custom Kraken2 database
- outputs
- {'built_db': 'A complete Kraken2 database'}
Inputs
Required
_runtime
(Any, required)tarballs
(Array[File], required): Tarballs containing the NCBI taxonomy (generated by thedownload_taxonomy
task) and at least one library (generated by thedownload_library
orcreate_library_from_fastas
task). Tarballs must not have a root directory.
Defaults
db_name
(String, default="kraken2_db"); description: Name for output in compressed, archived format. The suffix.tar.gz
will be added.; common: truekmer_len
(Int, default=if protein then 15 else 35): K-mer length in bp that will be used to build the databasemax_db_size_gb
(Int, default=-1): Maximum number of GBs for Kraken 2 hash table; if the Kraken 2 estimator determines more would normally be needed, the reference library will be downsampled to fit.minimizer_len
(Int, default=if protein then 12 else 31): Minimizer length in bp that will be used to build the databaseminimizer_spaces
(Int, default=if protein then 0 else 7): Number of characters in minimizer that are ignored in comparisonsmodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.modify_memory_gb
(Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB.ncpu
(Int, default=4); description: Number of cores to allocate for task; common: trueprotein
(Boolean, default=false): Construct a protein database?use_all_cores
(Boolean, default=false); description: Use all cores? Recommended for cloud environments.; common: true
Outputs
built_db
(File)
kraken
- description
- Runs Kraken2 on a pair of fastq files
- outputs
- {'report': {'description': 'A Kraken2 summary report', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#sample-report-output-format'}, 'sequences': {'description': 'Detailed Kraken2 output that has been gzipped', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#standard-kraken-output-format'}}
Inputs
Required
_runtime
(Any, required)db
(File, required): Kraken2 database. Can be generated withmake-qc-reference.wdl
. Must be a tarball without a root directory.read_one_fastq_gz
(File, required): Gzipped FASTQ file with 1st reads in pairread_two_fastq_gz
(File, required): Gzipped FASTQ file with 2nd reads in pair
Defaults
min_base_quality
(Int, default=0): Minimum base quality used in classificationmodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.modify_memory_gb
(Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB.ncpu
(Int, default=4); description: Number of cores to allocate for task; common: trueprefix
(String, default=sub(basename(read_one_fastq_gz),"([_\.][rR][12])?(\.subsampled)?\.(fastq|fq)(\.gz)?$","")): Prefix for the Kraken2 output files. The extensions.kraken2.txt
and.kraken2.sequences.txt.gz
will be added.store_sequences
(Boolean, default=false); description: Store and output main Kraken2 output in addition to the summary report?; common: trueuse_all_cores
(Boolean, default=false); description: Use all cores? Recommended for cloud environments.; common: trueuse_names
(Boolean, default=true): Print scientific names instead of just taxids?
Outputs
report
(File)sequences
(File?)