quickcheck
- description
- Runs Samtools quickcheck on the input BAM file. This checks that the BAM file appears to be intact, e.g. header exists and the end-of-file marker exists.
- outputs
- {'check': 'Dummy output to enable caching'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file to quickcheck
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
Outputs
check
(String)
split
- description
- Runs Samtools split on the input BAM file. This splits the BAM by read group into one or more output files. It optionally errors if there are reads present that do not belong to a read group.
- outputs
- {'split_bams': 'The split BAM files. The extensions will contain read group IDs, and will end in
.bam
.'}
Inputs
Required
_runtime
(Any, required)bam
(File, required); description: Input BAM format file to split; stream: true
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.ncpu
(Int, default=2); description: Number of cores to allocate for task; common: trueprefix
(String, default=basename(bam,".bam")): Prefix for the split BAM files. The extensions will contain read group IDs, and will end in.bam
.reject_empty_output
(Boolean, default=true); description: If true, error if any output BAMs are empty.; common: truereject_unaccounted_reads
(Boolean, default=true); description: If true, error if there are reads present that do not have read group information matching the header.; common: trueuse_all_cores
(Boolean, default=false); description: Use all cores? Recommended for cloud environments.; common: true
Outputs
split_bams
(Array[File])
flagstat
- description
- Produces a
samtools flagstat
report containing statistics about the alignments based on the bit flags set in the BAM - outputs
- {'flagstat_report': '
samtools flagstat
STDOUT redirected to a file'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file to generate flagstat for
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.ncpu
(Int, default=2); description: Number of cores to allocate for task; common: trueoutfile_name
(String, default=basename(bam,".bam") + ".flagstat.txt"): Name for the flagstat report fileuse_all_cores
(Boolean, default=false); description: Use all cores? Recommended for cloud environments.; common: true
Outputs
flagstat_report
(File)
index
- description
- Creates a
.bai
BAM index for the input BAM - outputs
- {'bam_index': "A
.bai
BAM index associated with the input BAM. Filename will bebasename(bam) + '.bai'
."}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file to index
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.ncpu
(Int, default=2); description: Number of cores to allocate for task; common: trueuse_all_cores
(Boolean, default=false); description: Use all cores? Recommended for cloud environments.; common: true
Outputs
bam_index
(File)
subsample
- description
- Randomly subsamples the input BAM, in order to produce an output BAM with approximately the desired number of reads.
- help
- A
desired_reads
greater than zero must be supplied. Adesired_reads <= 0
will result in task failure. Sampling is probabalistic and will be approximate todesired_reads
. Read count will not be exact. Asampled_bam
will not be produced if the input BAM read count is less than or equal todesired_reads
. - outputs
- {'orig_read_count': 'A TSV report containing the original read count before subsampling. If subsampling was requested but the input BAM had less than
desired_reads
, no read count will be filled in (instead there will be adash
).', 'sampled_bam': 'The subsampled input BAM. Only present if subsampling was performed.'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file to subsampledesired_reads
(Int, required): How many reads should be in the ouput BAM? Output BAM read count will be approximate to this value. Must be greater than zero. Adesired_reads <= 0
will result in task failure.
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.ncpu
(Int, default=2); description: Number of cores to allocate for task; common: trueprefix
(String, default=basename(bam,".bam")): Prefix for the BAM file. The extension.sampled.bam
will be added.use_all_cores
(Boolean, default=false); description: Use all cores? Recommended for cloud environments.; common: true
Outputs
orig_read_count
(File)sampled_bam
(File?)
filter
- description
- Filters a BAM based on its bitwise flag value.
- help
- This task is a wrapper around
samtools view
. This task will fail if there are no reads in the output BAM. This can happen either because the input BAM was empty or because the suppliedbitwise_filter
was too strict. If you want to down-sample a BAM, use thesubsample
task instead. - outputs
- {'filtered_bam': 'BAM file that has been filtered based on the input flags'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file to filterbitwise_filter
(FlagFilter, required): A set of 4 possible read filters to apply. This is aFlagFilter
object (see ../data_structures/flag_filter.wdl for more information).
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.ncpu
(Int, default=2); description: Number of cores to allocate for task; common: trueprefix
(String, default=basename(bam,".bam") + ".filtered"): Prefix for the filtered BAM file. The extension.bam
will be added.use_all_cores
(Boolean, default=false); description: Use all cores? Recommended for cloud environments.; common: true
Outputs
filtered_bam
(File)
merge
- description
- Merges multiple sorted BAMs into a single BAM
- outputs
- {'merged_bam': 'The BAM resulting from merging all the input BAMs'}
Inputs
Required
_runtime
(Any, required)bams
(Array[File], required): An array of BAMs to merge into one combined BAMprefix
(String, required): Prefix for the BAM file. The extension.bam
will be added.
Optional
new_header
(File?): Use the lines of FILE as@
headers to be copied to the merged BAM, replacing any header lines that would otherwise be copied from the first BAM file in the list. (File may actually be in SAM format, though any alignment records it may contain are ignored.)
Defaults
attach_rg
(Boolean, default=true); description: Attach an RG tag to each alignment. The tag value is inferred from file names.; common: truecombine_pg
(Boolean, default=true); description: Similarly tocombine_rg
: for each @PG ID in the set of files to merge, use the @PG line of the first file we find that ID in rather than adding a suffix to differentiate similar IDs.; common: truecombine_rg
(Boolean, default=true); description: When several input files contain @RG headers with the same ID, emit only one of them (namely, the header line from the first file we find that ID in) to the merged output file. Combining these similar headers is usually the right thing to do when the files being merged originated from the same file. Without-c
, all @RG headers appear in the output file, with random suffixes added to their IDs where necessary to differentiate them.; common: truemodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.name_sorted
(Boolean, default=false); description: Are all input BAMsqueryname
sorted (true)? Or are all input BAMscoordinate
sorted (false)?; common: truencpu
(Int, default=2); description: Number of cores to allocate for task; common: trueregion
(String, default=""): Merge files in the specified region (Format:chr:start-end
)use_all_cores
(Boolean, default=false); description: Use all cores? Recommended for cloud environments.; common: true
Outputs
merged_bam
(File)
addreplacerg
- description
- Adds or replaces read group tags
- outputs
- {'tagged_bam': 'The transformed input BAM after read group modifications have been applied'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file to add read group information
Optional
read_group_id
(String?): Allows you to specify the read group ID of an existing @RG line and applies it to the reads specified by theorphan_only
option
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.ncpu
(Int, default=2); description: Number of cores to allocate for task; common: trueorphan_only
(Boolean, default=true); description: Only add RG tags to orphans (true)? Or also overwrite all existing RG tags (including any in the header) (false)?; common: trueoverwrite_header_record
(Boolean, default=false); description: Overwrite an existing @RG line, if a new one with the same ID value is provided?; common: trueprefix
(String, default=basename(bam,".bam") + ".addreplacerg"): Prefix for the BAM file. The extension.bam
will be added.read_group_line
(Array[String], default=[]); description: Allows you to specify a read group line to append to (or replace in) the header and applies it to the reads specified by theorphan_only
option. Each String in the Array should correspond to one field of the read group line. Tab literals will be inserted between each entry in the final BAM. Only one read group line can be supplied per invocation of this task.; common: trueuse_all_cores
(Boolean, default=false); description: Use all cores? Recommended for cloud environments.; common: true
Outputs
tagged_bam
(File)
collate
- description
- Runs
samtools collate
on the input BAM file. Shuffles and groups reads together by their names. - outputs
- {'collated_bam': 'A collated BAM (reads sharing a name next to each other, no other guarantee of sort order)'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file to collate
Defaults
fast_mode
(Boolean, default=true); description: Use fast mode (output primary alignments only)?; common: truemodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.modify_memory_gb
(Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB.ncpu
(Int, default=2); description: Number of cores to allocate for task; common: trueprefix
(String, default=basename(bam,".bam") + ".collated"): Prefix for the collated BAM file. The extension.bam
will be added.use_all_cores
(Boolean, default=false); description: Use all cores? Recommended for cloud environments.; common: true
Outputs
collated_bam
(File)
bam_to_fastq
- description
- Converts an input BAM file into FASTQ(s) using
samtools fastq
. - help
- If
paired_end == false
, then all reads in the BAM will be output to a single FASTQ file. Usebitwise_filter
argument to remove any unwanted reads. An exit-code of42
indicates that no reads were present in the output FASTQs. An exit-code of43
indicates that unexpected reads were discovered in the input BAM. - outputs
- {'collated_bam': 'A collated BAM (reads sharing a name next to each other, no other guarantee of sort order). Only generated if
retain_collated_bam
andpaired_end
are both true. Has the name~{prefix}.collated.bam
.', 'read_one_fastq_gz': 'Gzipped FASTQ file with 1st reads in pair. Only generated ifpaired_end
is true andinterleaved
is false. Has the name~{prefix}.R1.fastq.gz
.', 'read_two_fastq_gz': 'Gzipped FASTQ file with 2nd reads in pair. Only generated ifpaired_end
is true andinterleaved
is false. Has the name~{prefix}.R2.fastq.gz
.', 'singleton_reads_fastq_gz': 'Gzipped FASTQ containing singleton reads. Only generated ifpaired_end
andoutput_singletons
are both true. Has the name~{prefix}.singleton.fastq.gz
.', 'interleaved_reads_fastq_gz': 'Interleaved gzipped Paired-End FASTQ. Only generated ifpaired_end
andinterleaved
are both true. Has the name~{prefix}.fastq.gz
. The conditions under which this output andsingle_end_reads_fastq_gz
are created are mutually exclusive, but since they share the same literal filename they will always evaluate to the same file (or undefined if neither are created).', 'single_end_reads_fastq_gz': 'A gzipped FASTQ containing all reads. Only generated ifpaired_end
is false. Has the name~{prefix}.fastq.gz
. The conditions under which this output andinterleaved_reads_fastq_gz
are created are mutually exclusive, but since they share the same literal filename they will always evaluate to the same file (or undefined if neither are created).'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file to convert to FASTQ(s)
Defaults
append_read_number
(Boolean, default=true); description: Append /1 and /2 suffixes to read names?; common: truebitwise_filter
(FlagFilter, default={"include_if_all": "0x0", "exclude_if_any": "0x900", "include_if_any": "0x0", "exclude_if_all": "0x0"}): A set of 4 possible read filters to apply during conversion to FASTQ. This is aFlagFilter
object (see ../data_structures/flag_filter.wdl for more information). By default, it will remove secondary and supplementary reads from the output FASTQs.collated
(Boolean, default=false); description: Is the BAM collated (or name-sorted)? Ifcollated == true
, then the input BAM will be run throughsamtools fastq
without preprocessing. Ifcollated == false
, thensamtools collate
must be run on the input BAM before conversion to FASTQ. Ignored ifpaired_end == false
.; common: truefail_on_unexpected_reads
(Boolean, default=false): The definition of 'unexpected' depends on whether the values ofpaired_end
andoutput_singletons
are true or false. Ifpaired_end
isfalse
, no reads are considered unexpected, and every read (not caught bybitwise_filter
) will be present in the resulting FASTQ regardless offirst
/last
bit settings. This setting will be ignored in that case. Ifpaired_end
istrue
then reads that don't satisfyfirst
XORlast
are considered unexpected (i.e. reads that have neitherfirst
norlast
set or reads that have bothfirst
andlast
set). Ifoutput_singletons
isfalse
, singleton reads are considered unexpected. A singleton read is a read with either thefirst
or thelast
bit set (but not both) and that possesses a unique QNAME; i.e. it is a read without a pair when all reads are expected to be paired. But ifoutput_singletons
istrue
, these singleton reads will be output as their own FASTQ instead of causing the task to fail. Iffail_on_unexpected_reads
isfalse
, then all the above cases will be ignored. Any 'unexpected' reads will be silently discarded.; description: Should the task fail if reads with an unexpectedfirst
/last
bit setting are discovered?; common: truefast_mode
(Boolean, default=!retain_collated_bam); description: Fast mode forsamtools collate
? Iftrue
, this removes secondary and supplementary reads during thecollate
step. Iffalse
, secondary and supplementary reads will be retained in thecollated_bam
output (if created). Defaults to the opposite ofretain_collated_bam
. Ignored ifcollated == true
orpaired_end == false
.; common: trueinterleaved
(Boolean, default=false); description: Create an interleaved FASTQ file from Paired-End data? Ignored ifpaired_end == false
.; common: truemodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.modify_memory_gb
(Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB.ncpu
(Int, default=2); description: Number of cores to allocate for task; common: trueoutput_singletons
(Boolean, default=false): Output singleton reads as their own FASTQ? Ignored ifpaired_end == false
.paired_end
(Boolean, default=true); description: Is the data Paired-End? Ifpaired_end == false
, then all reads in the BAM will be output to a single FASTQ file. Usebitwise_filter
argument to remove any unwanted reads.; common: trueprefix
(String, default=basename(bam,".bam")): Prefix for the collated BAM and FASTQ files. The extensions.collated.bam
and[,.R1,.R2,.singleton].fastq.gz
will be added.retain_collated_bam
(Boolean, default=false); description: Save the collated BAM to disk and output it (true)? This slows performance and substantially increases storage requirements. Be aware that collated BAMs occupy much more space than either position sorted or name sorted BAMs (due to the compression algorithm). Ignored ifcollated == true
orpaired_end == false
.; common: trueuse_all_cores
(Boolean, default=false); description: Use all cores? Recommended for cloud environments.; common: true
Outputs
collated_bam
(File?)read_one_fastq_gz
(File?)read_two_fastq_gz
(File?)singleton_reads_fastq_gz
(File?)interleaved_reads_fastq_gz
(File?)single_end_reads_fastq_gz
(File?)
fixmate
- description
- Runs
samtools fixmate
on the name-collated input BAM file. This fills in mate coordinates and insert size fields among other tags and fields. - help
- This task assumes a name-sorted or name-collated input BAM. If you have a position-sorted BAM, please use the
position_sorted_fixmate
task. This task runsfixmate
and outputs a BAM in the same order as the input. - outputs
- {'fixmate_bam': 'The BAM resulting from running
samtools fixmate
on the input BAM'}
Inputs
Required
_runtime
(Any, required)bam
(File, required); description: Input BAM format file to add mate information. Must be name-sorted or name-collated.; stream: true
Defaults
add_cigar
(Boolean, default=true); description: Add template cigarct
tag; tool_default: false; common: trueadd_mate_score
(Boolean, default=true); description: Add mate score tags. These are used bymarkdup
to select the best reads to keep.; tool_default: false; common: truedisable_flag_sanitization
(Boolean, default=false): Disable all flag sanitization?disable_proper_pair_check
(Boolean, default=false): Disable proper pair check [ensure one forward and one reverse read in each pair]extension
(String, default=".bam"); description: File format extension to use for output file.; choices: ['.bam', '.cram']; common: truemodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.ncpu
(Int, default=2); description: Number of cores to allocate for task; common: trueprefix
(String, default=basename(bam,".bam") + ".fixmate"): Prefix for the output file. The extension specified with theextension
parameter will be added.remove_unaligned_and_secondary
(Boolean, default=false): Remove unmapped and secondary readsuse_all_cores
(Boolean, default=false); description: Use all cores? Recommended for cloud environments.; common: true
Outputs
fixmate_bam
(File)
position_sorted_fixmate
- description
- Runs
samtools fixmate
on the position-sorted input BAM file and output a position-sorted BAM.fixmate
fills in mate coordinates and insert size fields among other tags and fields.samtools fixmate
assumes a name-sorted or name-collated input BAM. If you already have a collated BAM, please use thefixmate
task. This task collates the input BAM, runsfixmate
, and then resorts the output into a position-sorted BAM. - outputs
- {'fixmate_bam': 'BAM file with mate information added'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file to add mate information. Must be position-sorted.
Defaults
add_cigar
(Boolean, default=true); description: Add template cigarct
tag; tool_default: false; common: trueadd_mate_score
(Boolean, default=true); description: Add mate score tags. These are used bymarkdup
to select the best reads to keep.; tool_default: false; common: truedisable_flag_sanitization
(Boolean, default=false): Disable all flag sanitization?disable_proper_pair_check
(Boolean, default=false): Disable proper pair check [ensure one forward and one reverse read in each pair]?fast_mode
(Boolean, default=false); description: Use fast mode (output primary alignments only)?; common: truemodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.modify_memory_gb
(Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB.ncpu
(Int, default=2); description: Number of cores to allocate for task; common: trueprefix
(String, default=basename(bam,".bam") + ".fixmate"): Prefix for the output file. The extension.bam
will be added.remove_unaligned_and_secondary
(Boolean, default=false): Remove unmapped and secondary readsuse_all_cores
(Boolean, default=false); description: Use all cores? Recommended for cloud environments.; common: true
Outputs
fixmate_bam
(File)
markdup
- description
- [DEPRECATED] Runs
samtools markdup
on the position-sorted input BAM file. This creates a report and optionally a new BAM with duplicate reads marked. - help
- This task assumes
samtools fixmate
has already been run on the input BAM. If it has not, then the output may be incorrect. A name-sorted or collated BAM can be run through thefixmate
task (and then position-sorted prior to this task) or a position-sorted BAM can be run through theposition_sorted_fixmate
task. Deprecated due to extremely high memory usage for certain RNA-Seq samples when searching for optical duplicates. Usemark_duplicates
in./picard.wdl
instead. - deprecated
- true
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file to mark duplicates in
Defaults
coordinates_order
(String, default="txy"); description: The order of the elements captured in theread_coords_regex
regular expression. Default istxy
wheret
is a part of the read name selected for string comparison andx
/y
are the coordinates used for optical duplicate detection. Ignored ifoptical_distance == 0
.; choices: ['txy', 'tyx', 'xyt', 'yxt', 'xty', 'ytx', 'xy', 'yx']create_bam
(Boolean, default=true): Create a new BAM with duplicate reads marked? Iffalse
, then only a markdup report will be generated.duplicate_count
(Boolean, default=false): Record the original primary read duplication count (include itself) in adc
tag? Ignored ifcreate_bam == false
.duplicates_of_duplicates_check
(Boolean, default=false): Check duplicates of duplicates for correctness? Performs further checks to make sure all optical duplicates are found. Also operates onmark_duplicates_with_do_tag
tagging where reads may be tagged with the best quality read. Disabling this option can speed up duplicate marking when there are a great many duplicates for each original read. Ignored ifcreate_bam == false
oroptical_distance == 0
.include_qc_fails
(Boolean, default=false): Include reads that have the QC-failed flag set in duplicate marking? This can increase the number of duplicates found. Ignored ifcreate_bam == false
.json
(Boolean, default=false): Output a JSON report instead of a text report? Either are parseable by MultiQC.mark_duplicates_with_do_tag
(Boolean, default=false): Mark duplicates with thedo
(d
uplicateo
riginal) tag? Thedo
tag contains the name of the "original" read that was duplicated. Ignored ifcreate_bam == false
.mark_supp_or_sec_or_unmapped_as_duplicates
(Boolean, default=false): Mark supplementary, secondary, or unmapped alignments of duplicates as duplicates? As this takes a quick second pass over the data it will increase running time. Ignored ifcreate_bam == false
.max_readlen
(Int, default=300): Expected maximum read length.modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.modify_memory_gb
(Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB.ncpu
(Int, default=2); description: Number of cores to allocate for task; common: trueoptical_distance
(Int, default=0): Maximum distance between read coordinates to consider them optical duplicates. If0
, then optical duplicate marking is disabled. Suggested settings of 100 for HiSeq style platforms or about 2500 for NovaSeq ones. When set above0
, duplicate reads are tagged withdt:Z:SQ
for optical duplicates anddt:Z:LB
otherwise. Calculation of distance depends on coordinate data embedded in the read names, typically produced by the Illumina sequencing machines. Optical duplicate detection will not work on non-standard names without modifyingread_coords_regex
. If changingread_coords_regex
, make sure thatcoordinates_order
matches.prefix
(String, default=basename(bam,".bam") + ".markdup"): Prefix for the output file.read_coords_regex
(String, default="[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)"); description: Regular expression to extract read coordinates from the QNAME field. This takes a POSIX regular expression for at least x and y to be used in optical duplicate marking It can also include another part of the read name to test for equality, eg lane:tile elements. Elements wanted are captured with parentheses. The default is meant to capture information from Illumina style read names. Ignored ifoptical_distance == 0
. If changingread_coords_regex
, make sure thatcoordinates_order
matches.; tool_default:([!-9;-?A-~]+:[0-9]+:[0-9]+:[0-9]+:[0-9]+):([0-9]+):([0-9]+)
remove_duplicates
(Boolean, default=false): Remove duplicates from the output BAM? Ignored ifcreate_bam == false
.use_all_cores
(Boolean, default=false); description: Use all cores? Recommended for cloud environments.; common: trueuse_read_groups
(Boolean, default=false): Only mark duplicates within the same Read Group? Ignored ifcreate_bam == false
.
Outputs
markdup_report
(File)markdup_bam
(File?)
faidx
- description
- Creates a
.fai
FASTA index for the input FASTA - outputs
- {'fasta_index': "A
.fai
FASTA index associated with the input FASTA. Filename will bebasename(fasta) + '.fai'
."}
Inputs
Required
_runtime
(Any, required)fasta
(File, required): Input FASTA format file to index. Optionally gzip compressed.
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
Outputs
fasta_index
(File)