arriba
- description
- Run Arriba structural variant caller on a RNA-Seq BAM file.
- help
- Typical input is a STAR-aligned BAM. Arriba also supports DRAGEN-aligned BAMs and any spec compliant BAM. That is discordant mates must have
BAM_FPROPER_PAIR (0x2)
, split reads must haveBAM_FSUPPLEMENTARY (0x800)
, and the anchor read must have aSA
tag. Arriba also uses theHI
tag to group supplementary alignments. - outputs
- {'fusions': 'Output file of fusions in TSV format', 'discarded_fusions': 'Output file of discarded fusions in TSV format'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file from which to call fusionsgtf
(File, required): GTF features file. Gzipped or uncompressed.reference_fasta_gz
(File, required): Gzipped reference genome in FASTA format
Optional
annotate_fusions
(File?); description: Optional input file in tab delimited format of fusions to annotate with tags; external_help: https://arriba.readthedocs.io/en/v2.4.0/input-files/#tagschimeric_sam
(File?): Optional input file of chimeric reads in SAM format, from older versions of STARexclude_list
(File?); description: Optional input file of regions to exclude from analysis in tab delimited format; external_help: https://arriba.readthedocs.io/en/v2.4.0/input-files/#blacklistknown_fusions
(File?); description: Optional input file of known fusions in tab delimited format; external_help: https://arriba.readthedocs.io/en/v2.4.0/input-files/#known-fusionsprotein_domains
(File?); description: Optional input file of protein domains coordinates in GFF3 format; external_help: https://arriba.readthedocs.io/en/v2.4.0/input-files/#protein-domainswgs_svs
(File?); description: Optional input file of structural variants found by WGS in tab delimited or VCF format; external_help: https://arriba.readthedocs.io/en/v2.4.0/input-files/#structural-variant-calls-from-wgs
Defaults
coverage_fraction
(Float, default=0.05): Minimum fraction of viral contig transcription.disable_filters
(Array[String], default=[]); description: Array of filters to disable.; choices: ['top_expressed_viral_contigs', 'viral_contigs', 'low_coverage_viral_contigs', 'uninteresting_contigs', 'no_genomic_support', 'short_anchor', 'select_best', 'many_spliced', 'long_gap', 'merge_adjacent', 'hairpin', 'small_insert_size', 'same_gene', 'genomic_support', 'read_through', 'no_coverage', 'mismatches', 'homopolymer', 'low_entropy', 'multimappers', 'inconsistently_clipped', 'duplicates', 'homologs', 'blacklist', 'mismappers', 'spliced', 'relative_support', 'min_support', 'known_fusions', 'end_to_end', 'non_coding_neighbors', 'isoforms', 'intronic', 'in_vitro', 'intragenic_exonic', 'internal_tandem_duplication']exonic_fraction
(Float, default=0.33): Minimum fraction of exonic sequence between breakpoints.feature_name
(String, default="gene_name=gene_name|gene_id,gene_id=gene_id,transcript_id=transcript_id,feature_exon=exon,feature_CDS=CDS"): The Arriba default it designed to handle RefSeq, GENCODE, or ENSEMBL format annotations.feature_name
expects a string of space/comma separated options. The required fields aregene_name
,gene_id
,transcript_id
,feature_exon
, andfeature_CDS
. The fields should space separated. The values should be provided withfield=value
. Mutliple values can be provided and separated by a pipe (|
), e.g.=value1|value2
. A complete example isgene_name=gene_name|gene_id gene_id=gene_id transcript_id=transcript_id feature_exon=exon feature_CDS=CDS
.; description: List of feature names to use in GTF.; external_help: https://arriba.readthedocs.io/en/v2.4.0/command-line-options/; common: falsefill_gaps
(Boolean, default=false): Fill gaps in assembled transcripts with reference bases. Expands the fusion sequence to the complete sequence of the fusion gene.fragment_length
(Int, default=200): For single-end data, this is the fragment length. With paired-end reads, this is ignored and determined automatically.homopolymer_length
(Int, default=6): Maximum homopolymer length adjacent to breakpoints.interesting_contigs
(Array[String], default=["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "X", "Y", "AC_", "NC_"]): Array of contigs to consider for analysis. Contigs can be specified with or without the prefixchr
.many_spliced_events
(Int, default=4): Recover fusions with at least this many spliced breakpoints.mark_duplicates
(Boolean, default=true): Arriba performs marking of duplicates internally based on identical mapping coordinates. When this switch is set, internal marking of duplicates is disabled and Arriba assumes that duplicates have been marked by a preceding program. In this case, Arriba only discards alignments flagged with the BAM_FDUP flag. This makes sense when duplicates cannot be reliably identified solely based on their mapping coordinates, e.g. when unique molecular identifiers (UMIs) are used or when independently generated libraries are merged in a single BAM file and the read group must be interrogated to distinguish duplicates from reads that map to the same coordinates by chance. In addition, when this switch is set, duplicate reads are not considered for the calculation of the coverage at fusion breakpoints (columns coverage1 and coverage2 in the output file).; description: Mark duplicates in the input BAM file with Arriba.max_e_value
(Float, default=0.3): Maximum E-value for read support.max_genomic_breakpoint_distance
(Int, default=1000000): With 'wgs_svs', threshold for relating genomic and transcriptomic events.max_homolog_identity
(Float, default=0.3): Maximum fraction of homologous sequence for genes.max_itd_length
(Int, default=100): Maximum length of internal tandem duplications.max_kmer_content
(Float, default=0.6): Maximum fraction of repetitive 3-mer content in the fusion region.max_mismappers
(Float, default=0.8): Maximum fraction of mismapped reads in the fusion region.max_mismatch_pvalue
(Float, default=0.01): Maximum p-value for mismatches in the fusion region.max_reads
(Int, default=300): Subsample fusions with more than this number of reads.min_anchor_length
(Int, default=23): Minimum anchor length for split reads.min_itd_allele_fraction
(Float, default=0.07): Minimum supporting read fraction for internal tandem duplications.min_itd_supporting_reads
(Int, default=10): Minimum number of supporting reads for internal tandem duplications.min_supporting_reads
(Int, default=2): Minimum number of supporting reads for a fusion.modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.modify_memory_gb
(Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB.prefix
(String, default=basename(bam,".bam") + ".fusions"): Prefix for the fusion result files. The extensions.tsv
and.discarded.tsv
will be added.quantile
(Float, default=0.998): Genes with expression above the given quantile are eligible for filtering.read_through_distance
(Int, default=10000): Minimum distance between breakpoints for read-through events.report_additional_columns
(Boolean, default=false): Report additional columns ['fusion_transcript', 'peptide_sequence', 'read_identifiers'] in the discarded fusions file.strandedness
(String, default="auto"); description: Strandedness of the input data.; external_help: https://arriba.readthedocs.io/en/v2.4.0/command-line-options/; choices: ['auto', 'yes', 'no', 'reverse']top_n
(Int, default=5): Only report the top N most highly expressed viral integration sites.viral_contigs
(Array[String], default=["AC_", "NC_"]): Array of contigs to consider for viral integration site analysis.
Outputs
fusions
(File)discarded_fusions
(File)
arriba_tsv_to_vcf
- description
- Convert Arriba TSV format fusions to VCF format.
- outputs
- {'fusions_vcf': 'Output file of fusions in VCF format'}
Inputs
Required
_runtime
(Any, required)fusions
(File, required): Input fusions in TSV format to convert to VCFreference_fasta
(File, required): Reference genome in FASTA format. Either gzipped or uncompressed.
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.prefix
(String, default=basename(fusions,".tsv")): Output file name for fusions in VCF format. The extension.vcf
will be appended.
Outputs
fusions_vcf
(File)
arriba_extract_fusion_supporting_alignments
- description
- Extract alignments that support fusions.
- outputs
- {'fusion_bams': 'Array of BAM files corresponding with fusions in the input file', 'fusion_bam_indexes': "Array of BAM indexes corresponding with the BAMs in the 'fusion_bams'"}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file from which fusions were calledbam_index
(File, required): BAM index file corresponding to the input BAMfusions
(File, required): Input fusions in TSV format for which to extract supporting alignments
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.prefix
(String, default=basename(fusions,".tsv")): Output file name prefix for the extracted BAM files. The extension.bam
will be appended.
Outputs
fusion_bams
(Array[File])fusion_bam_indexes
(Array[File])
arriba_annotate_exon_numbers
- description
- Annotate fusions with exon numbers.
- outputs
- {'fusion_tsv': 'TSV file with fusions annotated with exon numbers'}
Inputs
Required
_runtime
(Any, required)fusions
(File, required): Input fusions in TSV format for which to annotate gene exon numbersgtf
(File, required): GTF features file. Gzipped or uncompressed.
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.prefix
(String, default=basename(fusions,".tsv")): Output file name for annotated fusions in TSV format. The extension.annotated.tsv
will be appended.
Outputs
fusion_tsv
(File)