split_n_cigar_reads
- description
- Splits reads that contain Ns in their CIGAR strings into multiple reads.
- external_help
- https://gatk.broadinstitute.org/hc/en-us/articles/360036858811-SplitNCigarReads
- outputs
- {'split_n_reads_bam': 'BAM file with reads split at N CIGAR elements and updated CIGAR strings.', 'split_n_reads_bam_index': 'Index file for the split BAM', 'split_n_reads_bam_md5': 'MD5 checksum for the split BAM'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file to with unsplit reads containing Ns in their CIGAR strings.bam_index
(File, required): BAM index file corresponding to the input BAMdict
(File, required): Dictionary file for FASTA format genomefasta
(File, required): Reference genome in FASTA format. Must be uncompressed.fasta_index
(File, required): Index for FASTA format genome
Defaults
memory_gb
(Int, default=25): RAM to allocate for task, specified in GBmodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.ncpu
(Int, default=8): Number of cores to allocate for taskprefix
(String, default=basename(bam,".bam") + ".split"): Prefix for the BAM file. The extension.bam
will be added.
Outputs
split_n_reads_bam
(File)split_n_reads_bam_index
(File)split_n_reads_bam_md5
(File)
base_recalibrator
- description
- Generates recalibration report for base quality score recalibration.
- external_help
- https://gatk.broadinstitute.org/hc/en-us/articles/360036897372-BaseRecalibratorSpark-BETA
- outputs
- {'recalibration_report': 'Recalibration report file'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file on which to recabilbrate base quality scoresbam_index
(File, required): BAM index file corresponding to the input BAMdbSNP_vcf
(File, required): dbSNP VCF filedbSNP_vcf_index
(File, required): dbSNP VCF index filedict
(File, required): Dictionary file for FASTA format genomefasta
(File, required): Reference genome in FASTA formatfasta_index
(File, required): Index for FASTA format genomeknown_indels_sites_indices
(Array[File], required): List of VCF index files corresponding to the VCF files inknown_indels_sites_vcfs
known_indels_sites_vcfs
(Array[File], required): List of VCF files containing known indels
Defaults
memory_gb
(Int, default=25): RAM to allocate for task, specified in GBmodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.ncpu
(Int, default=4): Number of cores to allocate for taskoutfile_name
(String, default=basename(bam,".bam") + ".recal.txt"): Name for the output recalibration report.use_original_quality_scores
(Boolean, default=false): Use original quality scores from the input BAM. Default is to use recalibrated quality scores.
Outputs
recalibration_report
(File)
apply_bqsr
- description
- Applies base quality score recalibration to a BAM file.
- external_help
- https://gatk.broadinstitute.org/hc/en-us/articles/360040097972-ApplyBQSRSpark-BETA
- outputs
- {'recalibrated_bam': 'Recalibrated BAM file', 'recalibrated_bam_index': 'Index file for the recalibrated BAM'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file on which to apply base quality score recalibrationbam_index
(File, required): BAM index file corresponding to the input BAMrecalibration_report
(File, required): Recalibration report file
Defaults
memory_gb
(Int, default=25): RAM to allocate for task, specified in GBmodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.ncpu
(Int, default=4): Number of cores to allocate for taskprefix
(String, default=basename(bam,".bam")): Prefix for the output recalibrated BAM. The extension.bqsr.bam
will be added.use_original_quality_scores
(Boolean, default=false): Use original quality scores from the input BAM. Default is to use recalibrated quality scores.
Outputs
recalibrated_bam
(File)recalibrated_bam_index
(File)
haplotype_caller
- description
- Calls germline SNPs and indels via local re-assembly of haplotypes.
- external_help
- https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller
- outputs
- {'vcf': 'VCF file containing called variants', 'vcf_index': 'Index file for the VCF'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file on which to call variantsbam_index
(File, required): BAM index file corresponding to the input BAMdbSNP_vcf
(File, required): dbSNP VCF filedbSNP_vcf_index
(File, required): dbSNP VCF index filedict
(File, required): Dictionary file for FASTA format genomefasta
(File, required): Reference genome in FASTA formatfasta_index
(File, required): Index for FASTA format genomeinterval_list
(File, required); description: Interval list indicating regions in which to call variants; external_help: https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists
Defaults
memory_gb
(Int, default=25): RAM to allocate for task, specified in GBmodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB.ncpu
(Int, default=4): Number of cores to allocate for taskprefix
(String, default=basename(bam,".bam")): Prefix for the output VCF. The extension.vcf.gz
will be added.stand_call_conf
(Int, default=20); description: Minimum confidence threshold for calling variants; external_help: https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller#--standard-min-confidence-threshold-for-callinguse_soft_clipped_bases
(Boolean, default=false): Use soft clipped bases in variant calling. Default is to ignore soft clipped bases.
Outputs
vcf
(File)vcf_index
(File)
variant_filtration
- description
- Filters variants based on specified criteria.
- external_help
- https://gatk.broadinstitute.org/hc/en-us/articles/360037434691-VariantFiltration
- outputs
- {'vcf_filtered': 'Filtered VCF file', 'vcf_filtered_index': 'Index file for the filtered VCF'}
Inputs
Required
_runtime
(Any, required)dict
(File, required): Dictionary file for FASTA format genomefasta
(File, required): Reference genome in FASTA formatfasta_index
(File, required): Index for FASTA format genomevcf
(File, required): Input VCF format file to filtervcf_index
(File, required): VCF index file corresponding to the input VCF
Defaults
cluster
(Int, default=3): Number of SNPs that must be present in a window to filterfilter_expressions
(Array[String], default=["FS > 30.0", "QD < 2.0"]); description: Expressions for the filters; external_help: https://gatk.broadinstitute.org/hc/en-us/articles/360037434691-VariantFiltration#--filter-expressionfilter_names
(Array[String], default=["FS", "QD"]); description: Names of the filters to apply; external_help: https://gatk.broadinstitute.org/hc/en-us/articles/360037434691-VariantFiltration#--filter-namemodify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB.ncpu
(Int, default=1): Number of cores to allocate for taskprefix
(String, default=basename(vcf,".vcf.gz")): Prefix for the output filtered VCF. The extension.filtered.vcf.gz
will be added.window
(Int, default=35): Size of the window (in bases) for filtering
Outputs
vcf_filtered
(File)vcf_filtered_index
(File)
mark_duplicates_spark
- description
- Marks duplicate reads in the input BAM file using GATK's Spark implementation of Picard's MarkDuplicates.
- external_help
- https://gatk.broadinstitute.org/hc/en-us/articles/13832682540699-MarkDuplicatesSpark
- outputs
- {'duplicate_marked_bam': 'The input BAM with computationally determined duplicates marked.', 'duplicate_marked_bam_index': 'The
.bai
BAM index file associated withduplicate_marked_bam
', 'mark_duplicates_metrics': {'description': 'The METRICS_FILE result ofpicard MarkDuplicates
', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#DuplicationMetrics'}}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file in which to mark duplicates
Defaults
create_bam
(Boolean, default=true); description: Enable BAM creation (true)? Or only output MarkDuplicates metrics (false)?; common: trueduplicate_scoring_strategy
(String, default="SUM_OF_BASE_QUALITIES"); description: Strategy for scoring duplicates.; choices: ['SUM_OF_BASE_QUALITIES', 'TOTAL_MAPPED_REFERENCE_LENGTH', 'RANDOM']modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.modify_memory_gb
(Int, default=0): Add to or subtract from the default memory allocation. Default memory allocation is determined by the size of the input BAM. Specified in GB.ncpu
(Int, default=4): Number of cores to allocate for taskoptical_distance
(Int, default=0): Maximum distance between read coordinates to consider them optical duplicates. If0
, then optical duplicate marking is disabled. Suggested settings of 100 for unpatterned versions of the Illumina platform (e.g. HiSeq) or 2500 for patterned flowcell models (e.g. NovaSeq). Calculation of distance depends on coordinate data embedded in the read names, typically produced by the Illumina sequencing machines. Optical duplicate detection will not work on non-standard names without modifyingread_name_regex
.prefix
(String, default=basename(bam,".bam") + ".MarkDuplicates"): Prefix for the MarkDuplicates result files. The extensions.bam
,.bam.bai
, and.metrics.txt
will be added.read_name_regex
(String, default="^[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)$"): Regular expression for extracting tile names, x coordinates, and y coordinates from read names. The default works for typical Illumina read names.tagging_policy
(String, default="All"); description: Tagging policy for the output BAM.; choices: ['DontTag', 'OpticalOnly', 'All']validation_stringency
(String, default="SILENT"); description: Validation stringency for parsing the input BAM.; choices: ['STRICT', 'LENIENT', 'SILENT']; tool_default: STRICT
Outputs
duplicate_marked_bam
(File?)duplicate_marked_bam_index
(File?)mark_duplicates_metrics
(File)