Picard

Homepage

mark_duplicates

description: Marks duplicate reads in the input BAM file using Picard
external_help: https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard-
help: For non-primary reads and unmapped mates of duplicate reads to be included in duplicate analysis, input BAM must be collated. See external_help for more information.
outputs: {'duplicate_marked_bam': 'The input BAM with computationally determined duplicates marked.', 'duplicate_marked_bam_index': 'The .bai BAM index file associated with duplicate_marked_bam', 'duplicate_marked_bam_md5': 'The md5sum of duplicate_marked_bam', 'mark_duplicates_metrics': {'description': 'The METRICS_FILE result of picard MarkDuplicates', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#DuplicationMetrics'}}

Inputs

Required

_runtime (Any, required)
bam (File, required): Input BAM format file in which to mark duplicates

Defaults

clear_dt (Boolean, default=true): Clear the DT tag from the input BAM? For increased performance, if the input BAM does not have the DT tag, set to false.
create_bam (Boolean, default=true); description: Enable BAM creation (true)? Or only output MarkDuplicates metrics (false)?; common: true
duplicate_scoring_strategy (String, default="SUM_OF_BASE_QUALITIES"); description: Strategy for scoring duplicates.; choices: ['SUM_OF_BASE_QUALITIES', 'TOTAL_MAPPED_REFERENCE_LENGTH', 'RANDOM']
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
modify_memory_gb (Int, default=0): Add to or subtract from the default memory allocation. Default memory allocation is determined by the size of the input BAM. Specified in GB.
optical_distance (Int, default=0): Maximum distance between read coordinates to consider them optical duplicates. If 0, then optical duplicate marking is disabled. Suggested settings of 100 for unpatterned versions of the Illumina platform (e.g. HiSeq) or 2500 for patterned flowcell models (e.g. NovaSeq). Calculation of distance depends on coordinate data embedded in the read names, typically produced by the Illumina sequencing machines. Optical duplicate detection will not work on non-standard names without modifying read_name_regex.
prefix (String, default=basename(bam,".bam") + ".MarkDuplicates"): Prefix for the MarkDuplicates result files. The extensions .bam, .bam.bai, .bam.md5, and .metrics.txt will be added.
read_name_regex (String, default="^[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)$"): Regular expression for extracting tile names, x coordinates, and y coordinates from read names. The default works for typical Illumina read names.
remove_duplicates (Boolean, default=false): Remove duplicate reads from the output BAM? If true, the output BAM will not contain any duplicate reads.
remove_sequencing_duplicates (Boolean, default=false): Remove sequencing duplicates (i.e. optical duplicates) from the output BAM? If true, the output BAM will not contain any sequencing duplicates (optical duplicates).
tagging_policy (String, default="All"); description: Tagging policy for the output BAM.; choices: ['DontTag', 'OpticalOnly', 'All']
validation_stringency (String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

duplicate_marked_bam (File?)
duplicate_marked_bam_index (File?)
duplicate_marked_bam_md5 (File?)
mark_duplicates_metrics (File)

validate_bam

description: Validates the input BAM file for correct formatting using Picard
external_help: https://gatk.broadinstitute.org/hc/en-us/articles/360057440611-ValidateSamFile-Picard-
outputs: {'validate_report': 'Validation report produced by picard ValidateSamFile. Validation warnings and errors are logged.'}

Inputs

Required

_runtime (Any, required)
bam (File, required): Input BAM format file to validate

Optional

reference_fasta (File?): Reference genome in FASTA format. Presence of the reference FASTA allows for NM tag validation.

Defaults

ignore_list (Array[String], default=[]); description: List of Picard errors and warnings to ignore. Possible values can be found on the GATK website (see external_help).; external_help: https://gatk.broadinstitute.org/hc/en-us/articles/360035891231-Errors-in-SAM-or-BAM-files-can-be-diagnosed-with-ValidateSamFile; common: true
index_validation_stringency_less_exhaustive (Boolean, default=false): Set INDEX_VALIDATION_STRINGENCY=LESS_EXHAUSTIVE?
max_errors (Int, default=2147483647): Set the value of MAX_OUTPUT for picard ValidateSamFile. The Picard default is 100, a lower number can enable fast fail behavior
memory_gb (Int, default=16): RAM to allocate for task, specified in GB
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
outfile_name (String, default=basename(bam,".bam") + ".ValidateSamFile.txt"): Name for the ValidateSamFile report file
succeed_on_errors (Boolean, default=false); description: Succeed the task even if errors and/or warnings are detected; common: true
succeed_on_warnings (Boolean, default=true); description: Succeed the task if warnings are detected and there are no errors. Overridden by succeed_on_errors; common: true
summary_mode (Boolean, default=false); description: Enable SUMMARY mode?; common: true
validation_stringency (String, default="LENIENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

validate_report (File)

sort

description: Sorts the input BAM file
external_help: https://gatk.broadinstitute.org/hc/en-us/articles/360036510732-SortSam-Picard-
outputs: {'sorted_bam': 'The input BAM after it has been sorted according to sort_order', 'sorted_bam_index': 'The .bai BAM index file associated with sorted_bam', 'sorted_bam_md5': 'The md5sum of sorted_bam'}

Inputs

Required

_runtime (Any, required)
bam (File, required): Input BAM format file to sort

Defaults

memory_gb (Int, default=25): RAM to allocate for task, specified in GB
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
prefix (String, default=basename(bam,".bam") + ".sorted"): Prefix for the sorted BAM file and accessory files. The extensions .bam, .bam.bai, and .bam.md5 will be added.
sort_order (String, default="coordinate"); description: Order by which to sort the input BAM; choices: ['queryname', 'coordinate', 'duplicate']; common: true
validation_stringency (String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

sorted_bam (File)
sorted_bam_index (File?)
sorted_bam_md5 (File)

merge_sam_files

description: Merges the input BAM files into a single BAM file. All input BAMs are assumed to be sorted according to sort_order.
external_help: https://gatk.broadinstitute.org/hc/en-us/articles/360057440751-MergeSamFiles-Picard-
outputs: {'merged_bam': 'The BAM resulting from merging all the input BAMs', 'merged_bam_index': 'The .bai BAM index file associated with merged_bam', 'merged_bam_md5': 'The md5sum of merged_bam'}

Inputs

Required

_runtime (Any, required)
bams (Array[File], required): Input BAMs to merge. All BAMs are assumed to be sorted according to sort_order.
prefix (String, required): Prefix for the merged BAM file and accessory files. The extensions .bam, .bam.bai, and .bam.md5 will be added.

Defaults

memory_gb (Int, default=40): RAM to allocate for task, specified in GB
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
sort_order (String, default="coordinate"); description: Sort order for the output merged BAM. It is assumed all input BAMs share this order.; choices: ['unsorted', 'queryname', 'coordinate', 'duplicate']; common: true
threading (Boolean, default=true): Option to create a background thread to encode, compress and write to disk the output file. The threaded version uses about 20% more CPU and decreases runtime by ~20% when writing out a compressed BAM file. Sets runtime.cpu = 2 if true. runtime.cpu = 1 if false.
validation_stringency (String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

merged_bam (File)
merged_bam_index (File)
merged_bam_md5 (File)

clean_sam

description: Cleans the input BAM file. Cleans soft-clipping beyond end-of-reference, sets MAPQ=0 for unmapped reads.
external_help: https://gatk.broadinstitute.org/hc/en-us/articles/360036885571-CleanSam-Picard-
outputs: {'cleaned_bam': 'A cleaned version of the input BAM', 'cleaned_bam_index': 'The .bai BAM index file associated with cleaned_bam', 'cleaned_bam_md5': 'The md5sum of cleaned_bam'}

Inputs

Required

_runtime (Any, required)
bam (File, required): Input BAM format file to clean

Defaults

memory_gb (Int, default=25): RAM to allocate for task, specified in GB
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
prefix (String, default=basename(bam,".bam") + ".cleaned"): Prefix for the cleaned BAM file and accessory files. The extensions .bam, .bam.bai, and .bam.md5 will be added.
validation_stringency (String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

cleaned_bam (File)
cleaned_bam_index (File)
cleaned_bam_md5 (File)

collect_wgs_metrics

description: Runs picard CollectWgsMetrics to collect metrics about the fractions of reads that pass base- and mapping-quality filters as well as coverage (read-depth) levels
external_help: https://gatk.broadinstitute.org/hc/en-us/articles/360037226132-CollectWgsMetrics-Picard-
outputs: {'wgs_metrics': {'description': 'Output report of picard CollectWgsMetrics', 'external_help': 'https://broadinstitute.github.io/picard/picard-metric-definitions.html#CollectWgsMetrics.WgsMetrics'}}

Inputs

Required

_runtime (Any, required)
bam (File, required): Input BAM format file for which to calculate WGS metrics
reference_fasta (File, required): Gzipped reference genome in FASTA format

Defaults

memory_gb (Int, default=12): RAM to allocate for task, specified in GB
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
outfile_name (String, default=basename(bam,".bam") + ".CollectWgsMetrics.txt"): Name for the metrics result file
validation_stringency (String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

wgs_metrics (File)

collect_alignment_summary_metrics

description: Runs picard CollectAlignmentSummaryMetrics to calculate metrics detailing the quality of the read alignments as well as the proportion of the reads that passed machine signal-to-noise threshold quality filters
external_help: https://gatk.broadinstitute.org/hc/en-us/articles/360040507751-CollectAlignmentSummaryMetrics-Picard-
outputs: {'alignment_metrics': {'description': 'The text file output of CollectAlignmentSummaryMetrics', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#AlignmentSummaryMetrics'}, 'alignment_metrics_pdf': 'The PDF file output of CollectAlignmentSummaryMetrics'}

Inputs

Required

_runtime (Any, required)
bam (File, required): Input BAM format file for which to calculate alignment metrics

Defaults

memory_gb (Int, default=8): RAM to allocate for task, specified in GB
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
prefix (String, default=basename(bam,".bam") + ".CollectAlignmentSummaryMetrics"): Prefix for the output report files. The extensions .txt and .pdf will be added.
validation_stringency (String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

alignment_metrics (File)
alignment_metrics_pdf (File)

collect_gc_bias_metrics

description: Runs picard CollectGcBiasMetrics to collect information about the relative proportions of guanine (G) and cytosine (C) nucleotides
external_help: https://gatk.broadinstitute.org/hc/en-us/articles/360037593931-CollectGcBiasMetrics-Picard-
outputs: {'gc_bias_metrics': {'description': 'The full text file output of CollectGcBiasMetrics', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#GcBiasDetailMetrics'}, 'gc_bias_metrics_summary': {'description': 'The summary text file output of CollectGcBiasMetrics', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#GcBiasSummaryMetrics'}, 'gc_bias_metrics_pdf': 'The PDF file output of CollectGcBiasMetrics'}

Inputs

Required

_runtime (Any, required)
bam (File, required): Input BAM format file for which to calculate GC bias metrics
reference_fasta (File, required): Reference sequences in FASTA format

Defaults

memory_gb (Int, default=8): RAM to allocate for task, specified in GB
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
prefix (String, default=basename(bam,".bam") + ".CollectGcBiasMetrics"): Prefix for the output report files. The extensions .txt, .summary.txt, and .pdf will be added.
validation_stringency (String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

gc_bias_metrics (File)
gc_bias_metrics_summary (File)
gc_bias_metrics_pdf (File)

collect_insert_size_metrics

description: Runs picard CollectInsertSizeMetrics to collect metrics for validating library construction including the insert size distribution and read orientation of Paired-End libraries
external_help: https://gatk.broadinstitute.org/hc/en-us/articles/360037055772-CollectInsertSizeMetrics-Picard-
outputs: {'insert_size_metrics': {'description': 'The text file output of CollectInsertSizeMetrics', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#InsertSizeMetrics'}, 'insert_size_metrics_pdf': 'The PDF file output of CollectInsertSizeMetrics'}

Inputs

Required

_runtime (Any, required)
bam (File, required): Input BAM format file for which to calculate insert size metrics

Defaults

memory_gb (Int, default=8): RAM to allocate for task, specified in GB
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
prefix (String, default=basename(bam,".bam") + ".CollectInsertSizeMetrics"): Prefix for the output report files. The extensions .txt and .pdf will be added.
validation_stringency (String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

insert_size_metrics (File)
insert_size_metrics_pdf (File)

quality_score_distribution

description: Runs picard QualityScoreDistribution to calculate the range of quality scores and creates an accompanying chart
external_help: https://gatk.broadinstitute.org/hc/en-us/articles/360037057312-QualityScoreDistribution-Picard-
outputs: {'quality_score_distribution_txt': 'The text file output of QualityScoreDistribution', 'quality_score_distribution_pdf': 'The PDF file output of QualityScoreDistribution'}

Inputs

Required

_runtime (Any, required)
bam (File, required): Input BAM format file for which to calculate quality score distribution

Defaults

memory_gb (Int, default=8): RAM to allocate for task, specified in GB
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
prefix (String, default=basename(bam,".bam") + ".QualityScoreDistribution"): Prefix for the output report files. The extensions .txt and .pdf will be added.
validation_stringency (String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

quality_score_distribution_txt (File)
quality_score_distribution_pdf (File)

bam_to_fastq

description: [Deprecated] This WDL task converts the input BAM file into FASTQ format files. This task has been deprecated in favor of samtools.bam_to_fastq which is more performant and doesn't error on 'illegal mate states'.
deprecated: true

Inputs

Required

_runtime (Any, required)
bam (File, required): Input BAM format file to convert to FASTQ

Defaults

memory_gb (Int, default=56): RAM to allocate for task, specified in GB
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
paired (Boolean, default=true); description: Is the data Paired-End (true) or Single-End (false)?; common: true
prefix (String, default=basename(bam,".bam")): Prefix for the file. The extension <extension> will be added.

Outputs

read_one_fastq_gz (File)
read_two_fastq_gz (File?)

merge_vcfs

description: Merges the input VCF files into a single VCF file
external_help: https://gatk.broadinstitute.org/hc/en-us/articles/360036713331-MergeVcfs-Picard
outputs: {'merged_vcf': 'The merged VCF file', 'merged_vcf_index': 'The index file associated with the merged VCF file'}

Inputs

Required

_runtime (Any, required)
output_vcf_name (String, required): Name for the merged VCF file
vcfs (Array[File], required): Input VCF format files to merge. May be gzipped or binary compressed.
vcfs_indexes (Array[File], required): Index files associated with the input VCF files

Defaults

modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.

Outputs

merged_vcf (File)
merged_vcf_index (File)

scatter_interval_list

description: Splits an interval list into smaller interval lists for parallel processing
external_help: https://gatk.broadinstitute.org/hc/en-us/articles/360036897212-IntervalListTools-Picard
outputs: {'interval_lists_scatter': 'The split interval lists', 'interval_count': 'The number of split interval lists'}

Inputs

Required

_runtime (Any, required)
interval_list (File, required): Input interval list to split
scatter_count (Int, required): Number of interval lists to create

Defaults

sort (Boolean, default=true): Should the output interval lists be sorted? Sorts by coordinate.
subdivision_mode (String, default="BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW"); description: How to subdivide the intervals; choices: ['BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW', 'INTERVAL_SUBDIVISION', 'BALANCING_WITHOUT_INTERVAL_SUBDIVISION']
unique (Boolean, default=true): Should the output interval lists contain unique intervals? Implies sort=true. Merges overlapping or adjacent intervals.

Outputs

interval_lists_scatter (Array[File])
interval_count (Int)

create_sequence_dictionary

description: Creates a sequence dictionary for the input FASTA file using Picard
external_help: https://gatk.broadinstitute.org/hc/en-us/articles/13832748622491-CreateSequenceDictionary-Picard-
outputs: {'dictionary': 'Sequence dictionary produced by picard CreateSequenceDictionary.'}

Inputs

Required

_runtime (Any, required)
fasta (File, required): Input FASTA format file from which to create dictionary

Optional

assembly_name (String?): Value to put in AS field of sequence dictionary
fasta_url (String?): Value to put in UR field of sequence dictionary
species (String?): Value to put in SP field of sequence dictionary

Defaults

memory_gb (Int, default=16): RAM to allocate for task, specified in GB
modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
outfile_name (String, default=basename(fasta,".fa") + ".dict"): Name for the CreateSequenceDictionary dictionary file

Outputs

dictionary (File)