Read groups are defined in the SAM spec - ID: Read group identifier. Each Read Group must have a unique ID. The value of ID is used in the RG tags of alignment records. - BC: "Barcode sequence identifying the sample or library. This value is the expected barcode bases as read by the sequencing machine in the absence of errors. If there are several barcodes for the sample/library (e.g., one on each end of the template), the recommended implementation concatenates all the barcodes separating them with hyphens (-). - CN: Name of sequencing center producing the read. - DS: Description. - DT: Date the run was produced (ISO8601 date or date/time). - FO: Flow order. The array of nucleotide bases that correspond to the nucleotides used for each flow of each read. Multi-base flows are encoded in IUPAC format, and non-nucleotide flows by various other characters. Format: /\*|[ACMGRSVTWYHKDBN]+/ - KS: The array of nucleotide bases that correspond to the key sequence of each read. - LB: Library. - PG: Programs used for processing the read group. - PI: Predicted median insert size, rounded to the nearest integer. - PL: Platform/technology used to produce the reads. Valid values: CAPILLARY, DNBSEQ (MGI/BGI), ELEMENT, HELICOS, ILLUMINA, IONTORRENT, LS454, ONT (Oxford Nanopore), PACBIO (Pacific Biosciences), SINGULAR, SOLID, and ULTIMA. This field should be omitted when the technology is not in this list (though the PM field may still be present in this case) or is unknown. - PM: Platform model. Free-form text providing further details of the platform/technology used. - PU: Platform unit (e.g., flowcell-barcode.lane for Illumina or slide for SOLiD). Unique identifier. - SM: Sample. Use pool name where a pool is being sequenced.

An example input JSON entry for read_group might look like this:

{
    "read_group": {
        "ID": "rg1",
        "PI": 150,
        "PL": "ILLUMINA",
        "SM": "Sample",
        "LB": "Sample"
    }
}

read_group_to_string

description
Stringifies a ReadGroup struct
outputs
{'stringified_read_group': 'Input ReadGroup as a string'}

Inputs

Required

  • _runtime (Any, required)
  • read_group (ReadGroup, required): ReadGroup struct to stringify

Outputs

  • stringified_read_group (String)

get_read_groups

description
Gets read group information from a BAM file and writes it out as JSON which is converted to a WDL struct.
outputs
{'read_groups': 'An array of ReadGroup structs containing read group information.'}

Inputs

Required

  • _runtime (Any, required)
  • bam (File, required): Input BAM format file to get read groups from

Defaults

  • modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.

Outputs

  • read_groups (Array[ReadGroup])

validate_read_group

description
Validate a ReadGroup struct's fields are defined
outputs
{'check': 'Dummy output to indicate success and enable call-caching'}

Inputs

Required

  • _runtime (Any, required)
  • read_group (ReadGroup, required): ReadGroup struct to validate

Defaults

  • required_fields (Array[String], default=[]): Array of read group fields that must be defined. The ID field is always required and does not need to be specified.
  • restrictive (Boolean, default=true): If true, run a less permissive validation of field values. Otherwise, check against SAM spec-defined values.

Outputs

  • check (String)