mark_duplicates
- description
- Marks duplicate reads in the input BAM file using Picard
- external_help
- https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard-
- help
- For non-primary reads and unmapped mates of duplicate reads to be included in duplicate analysis, input BAM must be collated. See
external_help
for more information. - outputs
- {'duplicate_marked_bam': 'The input BAM with computationally determined duplicates marked.', 'duplicate_marked_bam_index': 'The
.bai
BAM index file associated withduplicate_marked_bam
', 'duplicate_marked_bam_md5': 'The md5sum ofduplicate_marked_bam
', 'mark_duplicates_metrics': {'description': 'The METRICS_FILE result ofpicard MarkDuplicates
', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#DuplicationMetrics'}}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file in which to mark duplicates
Defaults
clear_dt
(Boolean, default=true): Clear theDT
tag from the input BAM? For increased performance, if the input BAM does not have theDT
tag, set tofalse
.create_bam
(Boolean, default=true); description: Enable BAM creation (true)? Or only output MarkDuplicates metrics (false)?; common: trueduplicate_scoring_strategy
(String, default="SUM_OF_BASE_QUALITIES"); description: Strategy for scoring duplicates.; choices: ['SUM_OF_BASE_QUALITIES', 'TOTAL_MAPPED_REFERENCE_LENGTH', 'RANDOM']modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.modify_memory_gb
(Int, default=0): Add to or subtract from the default memory allocation. Default memory allocation is determined by the size of the input BAM. Specified in GB.optical_distance
(Int, default=0): Maximum distance between read coordinates to consider them optical duplicates. If0
, then optical duplicate marking is disabled. Suggested settings of 100 for unpatterned versions of the Illumina platform (e.g. HiSeq) or 2500 for patterned flowcell models (e.g. NovaSeq). Calculation of distance depends on coordinate data embedded in the read names, typically produced by the Illumina sequencing machines. Optical duplicate detection will not work on non-standard names without modifyingread_name_regex
.prefix
(String, default=basename(bam,".bam") + ".MarkDuplicates"): Prefix for the MarkDuplicates result files. The extensions.bam
,.bam.bai
,.bam.md5
, and.metrics.txt
will be added.read_name_regex
(String, default="^[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)$"): Regular expression for extracting tile names, x coordinates, and y coordinates from read names. The default works for typical Illumina read names.remove_duplicates
(Boolean, default=false): Remove duplicate reads from the output BAM? Iftrue
, the output BAM will not contain any duplicate reads.remove_sequencing_duplicates
(Boolean, default=false): Remove sequencing duplicates (i.e. optical duplicates) from the output BAM? Iftrue
, the output BAM will not contain any sequencing duplicates (optical duplicates).tagging_policy
(String, default="All"); description: Tagging policy for the output BAM.; choices: ['DontTag', 'OpticalOnly', 'All']validation_stringency
(String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT
Outputs
duplicate_marked_bam
(File?)duplicate_marked_bam_index
(File?)duplicate_marked_bam_md5
(File?)mark_duplicates_metrics
(File)
validate_bam
- description
- Validates the input BAM file for correct formatting using Picard
- external_help
- https://gatk.broadinstitute.org/hc/en-us/articles/360057440611-ValidateSamFile-Picard-
- outputs
- {'validate_report': 'Validation report produced by
picard ValidateSamFile
. Validation warnings and errors are logged.'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file to validate
Optional
reference_fasta
(File?): Reference genome in FASTA format. Presence of the reference FASTA allows forNM
tag validation.
Defaults
ignore_list
(Array[String], default=[]); description: List of Picard errors and warnings to ignore. Possible values can be found on the GATK website (seeexternal_help
).; external_help: https://gatk.broadinstitute.org/hc/en-us/articles/360035891231-Errors-in-SAM-or-BAM-files-can-be-diagnosed-with-ValidateSamFile; common: trueindex_validation_stringency_less_exhaustive
(Boolean, default=false): SetINDEX_VALIDATION_STRINGENCY=LESS_EXHAUSTIVE
?max_errors
(Int, default=2147483647): Set the value of MAX_OUTPUT forpicard ValidateSamFile
. The Picard default is 100, a lower number can enable fast fail behaviormemory_gb
(Int, default=16): RAM to allocate for task, specified in GBmodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.outfile_name
(String, default=basename(bam,".bam") + ".ValidateSamFile.txt"): Name for the ValidateSamFile report filesucceed_on_errors
(Boolean, default=false); description: Succeed the task even if errors and/or warnings are detected; common: truesucceed_on_warnings
(Boolean, default=true); description: Succeed the task if warnings are detected and there are no errors. Overridden bysucceed_on_errors
; common: truesummary_mode
(Boolean, default=false); description: Enable SUMMARY mode?; common: truevalidation_stringency
(String, default="LENIENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT
Outputs
validate_report
(File)
sort
- description
- Sorts the input BAM file
- external_help
- https://gatk.broadinstitute.org/hc/en-us/articles/360036510732-SortSam-Picard-
- outputs
- {'sorted_bam': 'The input BAM after it has been sorted according to
sort_order
', 'sorted_bam_index': 'The.bai
BAM index file associated withsorted_bam
', 'sorted_bam_md5': 'The md5sum ofsorted_bam
'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file to sort
Defaults
memory_gb
(Int, default=25): RAM to allocate for task, specified in GBmodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.prefix
(String, default=basename(bam,".bam") + ".sorted"): Prefix for the sorted BAM file and accessory files. The extensions.bam
,.bam.bai
, and.bam.md5
will be added.sort_order
(String, default="coordinate"); description: Order by which to sort the input BAM; choices: ['queryname', 'coordinate', 'duplicate']; common: truevalidation_stringency
(String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT
Outputs
sorted_bam
(File)sorted_bam_index
(File?)sorted_bam_md5
(File)
merge_sam_files
- description
- Merges the input BAM files into a single BAM file. All input BAMs are assumed to be sorted according to
sort_order
. - external_help
- https://gatk.broadinstitute.org/hc/en-us/articles/360057440751-MergeSamFiles-Picard-
- outputs
- {'merged_bam': 'The BAM resulting from merging all the input BAMs', 'merged_bam_index': 'The
.bai
BAM index file associated withmerged_bam
', 'merged_bam_md5': 'The md5sum ofmerged_bam
'}
Inputs
Required
_runtime
(Any, required)bams
(Array[File], required): Input BAMs to merge. All BAMs are assumed to be sorted according tosort_order
.prefix
(String, required): Prefix for the merged BAM file and accessory files. The extensions.bam
,.bam.bai
, and.bam.md5
will be added.
Defaults
memory_gb
(Int, default=40): RAM to allocate for task, specified in GBmodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.sort_order
(String, default="coordinate"); description: Sort order for the output merged BAM. It is assumed all input BAMs share this order.; choices: ['unsorted', 'queryname', 'coordinate', 'duplicate']; common: truethreading
(Boolean, default=true): Option to create a background thread to encode, compress and write to disk the output file. The threaded version uses about 20% more CPU and decreases runtime by ~20% when writing out a compressed BAM file. Setsruntime.cpu = 2
iftrue
.runtime.cpu = 1
iffalse
.validation_stringency
(String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT
Outputs
merged_bam
(File)merged_bam_index
(File)merged_bam_md5
(File)
clean_sam
- description
- Cleans the input BAM file. Cleans soft-clipping beyond end-of-reference, sets MAPQ=0 for unmapped reads.
- external_help
- https://gatk.broadinstitute.org/hc/en-us/articles/360036885571-CleanSam-Picard-
- outputs
- {'cleaned_bam': 'A cleaned version of the input BAM', 'cleaned_bam_index': 'The
.bai
BAM index file associated withcleaned_bam
', 'cleaned_bam_md5': 'The md5sum ofcleaned_bam
'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file to clean
Defaults
memory_gb
(Int, default=25): RAM to allocate for task, specified in GBmodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.prefix
(String, default=basename(bam,".bam") + ".cleaned"): Prefix for the cleaned BAM file and accessory files. The extensions.bam
,.bam.bai
, and.bam.md5
will be added.validation_stringency
(String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT
Outputs
cleaned_bam
(File)cleaned_bam_index
(File)cleaned_bam_md5
(File)
collect_wgs_metrics
- description
- Runs
picard CollectWgsMetrics
to collect metrics about the fractions of reads that pass base- and mapping-quality filters as well as coverage (read-depth) levels - external_help
- https://gatk.broadinstitute.org/hc/en-us/articles/360037226132-CollectWgsMetrics-Picard-
- outputs
- {'wgs_metrics': {'description': 'Output report of
picard CollectWgsMetrics
', 'external_help': 'https://broadinstitute.github.io/picard/picard-metric-definitions.html#CollectWgsMetrics.WgsMetrics'}}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file for which to calculate WGS metricsreference_fasta
(File, required): Gzipped reference genome in FASTA format
Defaults
memory_gb
(Int, default=12): RAM to allocate for task, specified in GBmodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.outfile_name
(String, default=basename(bam,".bam") + ".CollectWgsMetrics.txt"): Name for the metrics result filevalidation_stringency
(String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT
Outputs
wgs_metrics
(File)
collect_alignment_summary_metrics
- description
- Runs
picard CollectAlignmentSummaryMetrics
to calculate metrics detailing the quality of the read alignments as well as the proportion of the reads that passed machine signal-to-noise threshold quality filters - external_help
- https://gatk.broadinstitute.org/hc/en-us/articles/360040507751-CollectAlignmentSummaryMetrics-Picard-
- outputs
- {'alignment_metrics': {'description': 'The text file output of
CollectAlignmentSummaryMetrics
', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#AlignmentSummaryMetrics'}, 'alignment_metrics_pdf': 'The PDF file output ofCollectAlignmentSummaryMetrics
'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file for which to calculate alignment metrics
Defaults
memory_gb
(Int, default=8): RAM to allocate for task, specified in GBmodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.prefix
(String, default=basename(bam,".bam") + ".CollectAlignmentSummaryMetrics"): Prefix for the output report files. The extensions.txt
and.pdf
will be added.validation_stringency
(String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT
Outputs
alignment_metrics
(File)alignment_metrics_pdf
(File)
collect_gc_bias_metrics
- description
- Runs
picard CollectGcBiasMetrics
to collect information about the relative proportions of guanine (G) and cytosine (C) nucleotides - external_help
- https://gatk.broadinstitute.org/hc/en-us/articles/360037593931-CollectGcBiasMetrics-Picard-
- outputs
- {'gc_bias_metrics': {'description': 'The full text file output of
CollectGcBiasMetrics
', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#GcBiasDetailMetrics'}, 'gc_bias_metrics_summary': {'description': 'The summary text file output ofCollectGcBiasMetrics
', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#GcBiasSummaryMetrics'}, 'gc_bias_metrics_pdf': 'The PDF file output ofCollectGcBiasMetrics
'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file for which to calculate GC bias metricsreference_fasta
(File, required): Reference sequences in FASTA format
Defaults
memory_gb
(Int, default=8): RAM to allocate for task, specified in GBmodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.prefix
(String, default=basename(bam,".bam") + ".CollectGcBiasMetrics"): Prefix for the output report files. The extensions.txt
,.summary.txt
, and.pdf
will be added.validation_stringency
(String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT
Outputs
gc_bias_metrics
(File)gc_bias_metrics_summary
(File)gc_bias_metrics_pdf
(File)
collect_insert_size_metrics
- description
- Runs
picard CollectInsertSizeMetrics
to collect metrics for validating library construction including the insert size distribution and read orientation of Paired-End libraries - external_help
- https://gatk.broadinstitute.org/hc/en-us/articles/360037055772-CollectInsertSizeMetrics-Picard-
- outputs
- {'insert_size_metrics': {'description': 'The text file output of
CollectInsertSizeMetrics
', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#InsertSizeMetrics'}, 'insert_size_metrics_pdf': 'The PDF file output ofCollectInsertSizeMetrics
'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file for which to calculate insert size metrics
Defaults
memory_gb
(Int, default=8): RAM to allocate for task, specified in GBmodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.prefix
(String, default=basename(bam,".bam") + ".CollectInsertSizeMetrics"): Prefix for the output report files. The extensions.txt
and.pdf
will be added.validation_stringency
(String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT
Outputs
insert_size_metrics
(File)insert_size_metrics_pdf
(File)
quality_score_distribution
- description
- Runs
picard QualityScoreDistribution
to calculate the range of quality scores and creates an accompanying chart - external_help
- https://gatk.broadinstitute.org/hc/en-us/articles/360037057312-QualityScoreDistribution-Picard-
- outputs
- {'quality_score_distribution_txt': 'The text file output of
QualityScoreDistribution
', 'quality_score_distribution_pdf': 'The PDF file output ofQualityScoreDistribution
'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file for which to calculate quality score distribution
Defaults
memory_gb
(Int, default=8): RAM to allocate for task, specified in GBmodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.prefix
(String, default=basename(bam,".bam") + ".QualityScoreDistribution"): Prefix for the output report files. The extensions.txt
and.pdf
will be added.validation_stringency
(String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT
Outputs
quality_score_distribution_txt
(File)quality_score_distribution_pdf
(File)
bam_to_fastq
- description
- [Deprecated] This WDL task converts the input BAM file into FASTQ format files. This task has been deprecated in favor of
samtools.bam_to_fastq
which is more performant and doesn't error on 'illegal mate states'. - deprecated
- true
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file to convert to FASTQ
Defaults
memory_gb
(Int, default=56): RAM to allocate for task, specified in GBmodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.paired
(Boolean, default=true); description: Is the data Paired-End (true) or Single-End (false)?; common: trueprefix
(String, default=basename(bam,".bam")): Prefix for thefile. The extension <extension>
will be added.
Outputs
read_one_fastq_gz
(File)read_two_fastq_gz
(File?)
merge_vcfs
- description
- Merges the input VCF files into a single VCF file
- external_help
- https://gatk.broadinstitute.org/hc/en-us/articles/360036713331-MergeVcfs-Picard
- outputs
- {'merged_vcf': 'The merged VCF file', 'merged_vcf_index': 'The index file associated with the merged VCF file'}
Inputs
Required
_runtime
(Any, required)output_vcf_name
(String, required): Name for the merged VCF filevcfs
(Array[File], required): Input VCF format files to merge. May be gzipped or binary compressed.vcfs_indexes
(Array[File], required): Index files associated with the input VCF files
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
Outputs
merged_vcf
(File)merged_vcf_index
(File)
scatter_interval_list
- description
- Splits an interval list into smaller interval lists for parallel processing
- external_help
- https://gatk.broadinstitute.org/hc/en-us/articles/360036897212-IntervalListTools-Picard
- outputs
- {'interval_lists_scatter': 'The split interval lists', 'interval_count': 'The number of split interval lists'}
Inputs
Required
_runtime
(Any, required)interval_list
(File, required): Input interval list to splitscatter_count
(Int, required): Number of interval lists to create
Defaults
sort
(Boolean, default=true): Should the output interval lists be sorted? Sorts by coordinate.subdivision_mode
(String, default="BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW"); description: How to subdivide the intervals; choices: ['BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW', 'INTERVAL_SUBDIVISION', 'BALANCING_WITHOUT_INTERVAL_SUBDIVISION']unique
(Boolean, default=true): Should the output interval lists contain unique intervals? Implies sort=true. Merges overlapping or adjacent intervals.
Outputs
interval_lists_scatter
(Array[File])interval_count
(Int)
create_sequence_dictionary
- description
- Creates a sequence dictionary for the input FASTA file using Picard
- external_help
- https://gatk.broadinstitute.org/hc/en-us/articles/13832748622491-CreateSequenceDictionary-Picard-
- outputs
- {'dictionary': 'Sequence dictionary produced by
picard CreateSequenceDictionary
.'}
Inputs
Required
_runtime
(Any, required)fasta
(File, required): Input FASTA format file from which to create dictionary
Optional
assembly_name
(String?): Value to put in AS field of sequence dictionaryfasta_url
(String?): Value to put in UR field of sequence dictionaryspecies
(String?): Value to put in SP field of sequence dictionary
Defaults
memory_gb
(Int, default=16): RAM to allocate for task, specified in GBmodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.outfile_name
(String, default=basename(fasta,".fa") + ".dict"): Name for the CreateSequenceDictionary dictionary file
Outputs
dictionary
(File)