Homepage

mark_duplicates

description
Marks duplicate reads in the input BAM file using Picard
external_help
https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard-
help
For non-primary reads and unmapped mates of duplicate reads to be included in duplicate analysis, input BAM must be collated. See external_help for more information.
outputs
{'duplicate_marked_bam': 'The input BAM with computationally determined duplicates marked.', 'duplicate_marked_bam_index': 'The .bai BAM index file associated with duplicate_marked_bam', 'duplicate_marked_bam_md5': 'The md5sum of duplicate_marked_bam', 'mark_duplicates_metrics': {'description': 'The METRICS_FILE result of picard MarkDuplicates', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#DuplicationMetrics'}}

Inputs

Required

  • _runtime (Any, required)
  • bam (File, required): Input BAM format file in which to mark duplicates

Defaults

  • clear_dt (Boolean, default=true): Clear the DT tag from the input BAM? For increased performance, if the input BAM does not have the DT tag, set to false.
  • create_bam (Boolean, default=true); description: Enable BAM creation (true)? Or only output MarkDuplicates metrics (false)?; common: true
  • duplicate_scoring_strategy (String, default="SUM_OF_BASE_QUALITIES"); description: Strategy for scoring duplicates.; choices: ['SUM_OF_BASE_QUALITIES', 'TOTAL_MAPPED_REFERENCE_LENGTH', 'RANDOM']
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • modify_memory_gb (Int, default=0): Add to or subtract from the default memory allocation. Default memory allocation is determined by the size of the input BAM. Specified in GB.
  • optical_distance (Int, default=0): Maximum distance between read coordinates to consider them optical duplicates. If 0, then optical duplicate marking is disabled. Suggested settings of 100 for unpatterned versions of the Illumina platform (e.g. HiSeq) or 2500 for patterned flowcell models (e.g. NovaSeq). Calculation of distance depends on coordinate data embedded in the read names, typically produced by the Illumina sequencing machines. Optical duplicate detection will not work on non-standard names without modifying read_name_regex.
  • prefix (String, default=basename(bam,".bam") + ".MarkDuplicates"): Prefix for the MarkDuplicates result files. The extensions .bam, .bam.bai, .bam.md5, and .metrics.txt will be added.
  • read_name_regex (String, default="^[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)$"): Regular expression for extracting tile names, x coordinates, and y coordinates from read names. The default works for typical Illumina read names.
  • remove_duplicates (Boolean, default=false): Remove duplicate reads from the output BAM? If true, the output BAM will not contain any duplicate reads.
  • remove_sequencing_duplicates (Boolean, default=false): Remove sequencing duplicates (i.e. optical duplicates) from the output BAM? If true, the output BAM will not contain any sequencing duplicates (optical duplicates).
  • tagging_policy (String, default="All"); description: Tagging policy for the output BAM.; choices: ['DontTag', 'OpticalOnly', 'All']
  • validation_stringency (String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

  • duplicate_marked_bam (File?)
  • duplicate_marked_bam_index (File?)
  • duplicate_marked_bam_md5 (File?)
  • mark_duplicates_metrics (File)

validate_bam

description
Validates the input BAM file for correct formatting using Picard
external_help
https://gatk.broadinstitute.org/hc/en-us/articles/360057440611-ValidateSamFile-Picard-
outputs
{'validate_report': 'Validation report produced by picard ValidateSamFile. Validation warnings and errors are logged.'}

Inputs

Required

  • _runtime (Any, required)
  • bam (File, required): Input BAM format file to validate

Optional

  • reference_fasta (File?): Reference genome in FASTA format. Presence of the reference FASTA allows for NM tag validation.

Defaults

  • ignore_list (Array[String], default=[]); description: List of Picard errors and warnings to ignore. Possible values can be found on the GATK website (see external_help).; external_help: https://gatk.broadinstitute.org/hc/en-us/articles/360035891231-Errors-in-SAM-or-BAM-files-can-be-diagnosed-with-ValidateSamFile; common: true
  • index_validation_stringency_less_exhaustive (Boolean, default=false): Set INDEX_VALIDATION_STRINGENCY=LESS_EXHAUSTIVE?
  • max_errors (Int, default=2147483647): Set the value of MAX_OUTPUT for picard ValidateSamFile. The Picard default is 100, a lower number can enable fast fail behavior
  • memory_gb (Int, default=16): RAM to allocate for task, specified in GB
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • outfile_name (String, default=basename(bam,".bam") + ".ValidateSamFile.txt"): Name for the ValidateSamFile report file
  • succeed_on_errors (Boolean, default=false); description: Succeed the task even if errors and/or warnings are detected; common: true
  • succeed_on_warnings (Boolean, default=true); description: Succeed the task if warnings are detected and there are no errors. Overridden by succeed_on_errors; common: true
  • summary_mode (Boolean, default=false); description: Enable SUMMARY mode?; common: true
  • validation_stringency (String, default="LENIENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

  • validate_report (File)

sort

description
Sorts the input BAM file
external_help
https://gatk.broadinstitute.org/hc/en-us/articles/360036510732-SortSam-Picard-
outputs
{'sorted_bam': 'The input BAM after it has been sorted according to sort_order', 'sorted_bam_index': 'The .bai BAM index file associated with sorted_bam', 'sorted_bam_md5': 'The md5sum of sorted_bam'}

Inputs

Required

  • _runtime (Any, required)
  • bam (File, required): Input BAM format file to sort

Defaults

  • memory_gb (Int, default=25): RAM to allocate for task, specified in GB
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • prefix (String, default=basename(bam,".bam") + ".sorted"): Prefix for the sorted BAM file and accessory files. The extensions .bam, .bam.bai, and .bam.md5 will be added.
  • sort_order (String, default="coordinate"); description: Order by which to sort the input BAM; choices: ['queryname', 'coordinate', 'duplicate']; common: true
  • validation_stringency (String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

  • sorted_bam (File)
  • sorted_bam_index (File?)
  • sorted_bam_md5 (File)

merge_sam_files

description
Merges the input BAM files into a single BAM file. All input BAMs are assumed to be sorted according to sort_order.
external_help
https://gatk.broadinstitute.org/hc/en-us/articles/360057440751-MergeSamFiles-Picard-
outputs
{'merged_bam': 'The BAM resulting from merging all the input BAMs', 'merged_bam_index': 'The .bai BAM index file associated with merged_bam', 'merged_bam_md5': 'The md5sum of merged_bam'}

Inputs

Required

  • _runtime (Any, required)
  • bams (Array[File], required): Input BAMs to merge. All BAMs are assumed to be sorted according to sort_order.
  • prefix (String, required): Prefix for the merged BAM file and accessory files. The extensions .bam, .bam.bai, and .bam.md5 will be added.

Defaults

  • memory_gb (Int, default=40): RAM to allocate for task, specified in GB
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • sort_order (String, default="coordinate"); description: Sort order for the output merged BAM. It is assumed all input BAMs share this order.; choices: ['unsorted', 'queryname', 'coordinate', 'duplicate']; common: true
  • threading (Boolean, default=true): Option to create a background thread to encode, compress and write to disk the output file. The threaded version uses about 20% more CPU and decreases runtime by ~20% when writing out a compressed BAM file. Sets runtime.cpu = 2 if true. runtime.cpu = 1 if false.
  • validation_stringency (String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

  • merged_bam (File)
  • merged_bam_index (File)
  • merged_bam_md5 (File)

clean_sam

description
Cleans the input BAM file. Cleans soft-clipping beyond end-of-reference, sets MAPQ=0 for unmapped reads.
external_help
https://gatk.broadinstitute.org/hc/en-us/articles/360036885571-CleanSam-Picard-
outputs
{'cleaned_bam': 'A cleaned version of the input BAM', 'cleaned_bam_index': 'The .bai BAM index file associated with cleaned_bam', 'cleaned_bam_md5': 'The md5sum of cleaned_bam'}

Inputs

Required

  • _runtime (Any, required)
  • bam (File, required): Input BAM format file to clean

Defaults

  • memory_gb (Int, default=25): RAM to allocate for task, specified in GB
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • prefix (String, default=basename(bam,".bam") + ".cleaned"): Prefix for the cleaned BAM file and accessory files. The extensions .bam, .bam.bai, and .bam.md5 will be added.
  • validation_stringency (String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

  • cleaned_bam (File)
  • cleaned_bam_index (File)
  • cleaned_bam_md5 (File)

collect_wgs_metrics

description
Runs picard CollectWgsMetrics to collect metrics about the fractions of reads that pass base- and mapping-quality filters as well as coverage (read-depth) levels
external_help
https://gatk.broadinstitute.org/hc/en-us/articles/360037226132-CollectWgsMetrics-Picard-
outputs
{'wgs_metrics': {'description': 'Output report of picard CollectWgsMetrics', 'external_help': 'https://broadinstitute.github.io/picard/picard-metric-definitions.html#CollectWgsMetrics.WgsMetrics'}}

Inputs

Required

  • _runtime (Any, required)
  • bam (File, required): Input BAM format file for which to calculate WGS metrics
  • reference_fasta (File, required): Gzipped reference genome in FASTA format

Defaults

  • memory_gb (Int, default=12): RAM to allocate for task, specified in GB
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • outfile_name (String, default=basename(bam,".bam") + ".CollectWgsMetrics.txt"): Name for the metrics result file
  • validation_stringency (String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

  • wgs_metrics (File)

collect_alignment_summary_metrics

description
Runs picard CollectAlignmentSummaryMetrics to calculate metrics detailing the quality of the read alignments as well as the proportion of the reads that passed machine signal-to-noise threshold quality filters
external_help
https://gatk.broadinstitute.org/hc/en-us/articles/360040507751-CollectAlignmentSummaryMetrics-Picard-
outputs
{'alignment_metrics': {'description': 'The text file output of CollectAlignmentSummaryMetrics', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#AlignmentSummaryMetrics'}, 'alignment_metrics_pdf': 'The PDF file output of CollectAlignmentSummaryMetrics'}

Inputs

Required

  • _runtime (Any, required)
  • bam (File, required): Input BAM format file for which to calculate alignment metrics

Defaults

  • memory_gb (Int, default=8): RAM to allocate for task, specified in GB
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • prefix (String, default=basename(bam,".bam") + ".CollectAlignmentSummaryMetrics"): Prefix for the output report files. The extensions .txt and .pdf will be added.
  • validation_stringency (String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

  • alignment_metrics (File)
  • alignment_metrics_pdf (File)

collect_gc_bias_metrics

description
Runs picard CollectGcBiasMetrics to collect information about the relative proportions of guanine (G) and cytosine (C) nucleotides
external_help
https://gatk.broadinstitute.org/hc/en-us/articles/360037593931-CollectGcBiasMetrics-Picard-
outputs
{'gc_bias_metrics': {'description': 'The full text file output of CollectGcBiasMetrics', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#GcBiasDetailMetrics'}, 'gc_bias_metrics_summary': {'description': 'The summary text file output of CollectGcBiasMetrics', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#GcBiasSummaryMetrics'}, 'gc_bias_metrics_pdf': 'The PDF file output of CollectGcBiasMetrics'}

Inputs

Required

  • _runtime (Any, required)
  • bam (File, required): Input BAM format file for which to calculate GC bias metrics
  • reference_fasta (File, required): Reference sequences in FASTA format

Defaults

  • memory_gb (Int, default=8): RAM to allocate for task, specified in GB
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • prefix (String, default=basename(bam,".bam") + ".CollectGcBiasMetrics"): Prefix for the output report files. The extensions .txt, .summary.txt, and .pdf will be added.
  • validation_stringency (String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

  • gc_bias_metrics (File)
  • gc_bias_metrics_summary (File)
  • gc_bias_metrics_pdf (File)

collect_insert_size_metrics

description
Runs picard CollectInsertSizeMetrics to collect metrics for validating library construction including the insert size distribution and read orientation of Paired-End libraries
external_help
https://gatk.broadinstitute.org/hc/en-us/articles/360037055772-CollectInsertSizeMetrics-Picard-
outputs
{'insert_size_metrics': {'description': 'The text file output of CollectInsertSizeMetrics', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#InsertSizeMetrics'}, 'insert_size_metrics_pdf': 'The PDF file output of CollectInsertSizeMetrics'}

Inputs

Required

  • _runtime (Any, required)
  • bam (File, required): Input BAM format file for which to calculate insert size metrics

Defaults

  • memory_gb (Int, default=8): RAM to allocate for task, specified in GB
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • prefix (String, default=basename(bam,".bam") + ".CollectInsertSizeMetrics"): Prefix for the output report files. The extensions .txt and .pdf will be added.
  • validation_stringency (String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

  • insert_size_metrics (File)
  • insert_size_metrics_pdf (File)

quality_score_distribution

description
Runs picard QualityScoreDistribution to calculate the range of quality scores and creates an accompanying chart
external_help
https://gatk.broadinstitute.org/hc/en-us/articles/360037057312-QualityScoreDistribution-Picard-
outputs
{'quality_score_distribution_txt': 'The text file output of QualityScoreDistribution', 'quality_score_distribution_pdf': 'The PDF file output of QualityScoreDistribution'}

Inputs

Required

  • _runtime (Any, required)
  • bam (File, required): Input BAM format file for which to calculate quality score distribution

Defaults

  • memory_gb (Int, default=8): RAM to allocate for task, specified in GB
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • prefix (String, default=basename(bam,".bam") + ".QualityScoreDistribution"): Prefix for the output report files. The extensions .txt and .pdf will be added.
  • validation_stringency (String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT

Outputs

  • quality_score_distribution_txt (File)
  • quality_score_distribution_pdf (File)

bam_to_fastq

description
[Deprecated] This WDL task converts the input BAM file into FASTQ format files. This task has been deprecated in favor of samtools.bam_to_fastq which is more performant and doesn't error on 'illegal mate states'.
deprecated
true

Inputs

Required

  • _runtime (Any, required)
  • bam (File, required): Input BAM format file to convert to FASTQ

Defaults

  • memory_gb (Int, default=56): RAM to allocate for task, specified in GB
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • paired (Boolean, default=true); description: Is the data Paired-End (true) or Single-End (false)?; common: true
  • prefix (String, default=basename(bam,".bam")): Prefix for the file. The extension <extension> will be added.

Outputs

  • read_one_fastq_gz (File)
  • read_two_fastq_gz (File?)

merge_vcfs

description
Merges the input VCF files into a single VCF file
external_help
https://gatk.broadinstitute.org/hc/en-us/articles/360036713331-MergeVcfs-Picard
outputs
{'merged_vcf': 'The merged VCF file', 'merged_vcf_index': 'The index file associated with the merged VCF file'}

Inputs

Required

  • _runtime (Any, required)
  • output_vcf_name (String, required): Name for the merged VCF file
  • vcfs (Array[File], required): Input VCF format files to merge. May be gzipped or binary compressed.
  • vcfs_indexes (Array[File], required): Index files associated with the input VCF files

Defaults

  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.

Outputs

  • merged_vcf (File)
  • merged_vcf_index (File)

scatter_interval_list

description
Splits an interval list into smaller interval lists for parallel processing
external_help
https://gatk.broadinstitute.org/hc/en-us/articles/360036897212-IntervalListTools-Picard
outputs
{'interval_lists_scatter': 'The split interval lists', 'interval_count': 'The number of split interval lists'}

Inputs

Required

  • _runtime (Any, required)
  • interval_list (File, required): Input interval list to split
  • scatter_count (Int, required): Number of interval lists to create

Defaults

  • sort (Boolean, default=true): Should the output interval lists be sorted? Sorts by coordinate.
  • subdivision_mode (String, default="BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW"); description: How to subdivide the intervals; choices: ['BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW', 'INTERVAL_SUBDIVISION', 'BALANCING_WITHOUT_INTERVAL_SUBDIVISION']
  • unique (Boolean, default=true): Should the output interval lists contain unique intervals? Implies sort=true. Merges overlapping or adjacent intervals.

Outputs

  • interval_lists_scatter (Array[File])
  • interval_count (Int)

create_sequence_dictionary

description
Creates a sequence dictionary for the input FASTA file using Picard
external_help
https://gatk.broadinstitute.org/hc/en-us/articles/13832748622491-CreateSequenceDictionary-Picard-
outputs
{'dictionary': 'Sequence dictionary produced by picard CreateSequenceDictionary.'}

Inputs

Required

  • _runtime (Any, required)
  • fasta (File, required): Input FASTA format file from which to create dictionary

Optional

  • assembly_name (String?): Value to put in AS field of sequence dictionary
  • fasta_url (String?): Value to put in UR field of sequence dictionary
  • species (String?): Value to put in SP field of sequence dictionary

Defaults

  • memory_gb (Int, default=16): RAM to allocate for task, specified in GB
  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
  • outfile_name (String, default=basename(fasta,".fa") + ".dict"): Name for the CreateSequenceDictionary dictionary file

Outputs

  • dictionary (File)