Read group

Read groups are defined in the SAM spec - ID: Read group identifier. Each Read Group must have a unique ID. The value of ID is used in the RG tags of alignment records. - BC: "Barcode sequence identifying the sample or library. This value is the expected barcode bases as read by the sequencing machine in the absence of errors. If there are several barcodes for the sample/library (e.g., one on each end of the template), the recommended implementation concatenates all the barcodes separating them with hyphens (-). - CN: Name of sequencing center producing the read. - DS: Description. - DT: Date the run was produced (ISO8601 date or date/time). - FO: Flow order. The array of nucleotide bases that correspond to the nucleotides used for each flow of each read. Multi-base flows are encoded in IUPAC format, and non-nucleotide flows by various other characters. Format: /\*|[ACMGRSVTWYHKDBN]+/ - KS: The array of nucleotide bases that correspond to the key sequence of each read. - LB: Library. - PG: Programs used for processing the read group. - PI: Predicted median insert size, rounded to the nearest integer. - PL: Platform/technology used to produce the reads. Valid values: CAPILLARY, DNBSEQ (MGI/BGI), ELEMENT, HELICOS, ILLUMINA, IONTORRENT, LS454, ONT (Oxford Nanopore), PACBIO (Pacific Biosciences), SINGULAR, SOLID, and ULTIMA. This field should be omitted when the technology is not in this list (though the PM field may still be present in this case) or is unknown. - PM: Platform model. Free-form text providing further details of the platform/technology used. - PU: Platform unit (e.g., flowcell-barcode.lane for Illumina or slide for SOLiD). Unique identifier. - SM: Sample. Use pool name where a pool is being sequenced.

An example input JSON entry for read_group might look like this:

{
    "read_group": {
        "ID": "rg1",
        "PI": 150,
        "PL": "ILLUMINA",
        "SM": "Sample",
        "LB": "Sample"
    }
}

read_group_to_string

description: Stringifies a ReadGroup struct
outputs: {'stringified_read_group': 'Input ReadGroup as a string'}

Inputs

Required

_runtime (Any, required)
read_group (ReadGroup, required): ReadGroup struct to stringify

Outputs

stringified_read_group (String)

get_read_groups

description: Gets read group information from a BAM file and writes it out as JSON which is converted to a WDL struct.
outputs: {'read_groups': 'An array of ReadGroup structs containing read group information.'}

Inputs

Required

_runtime (Any, required)
bam (File, required): Input BAM format file to get read groups from

Defaults

modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.

Outputs

read_groups (Array[ReadGroup])

validate_read_group

description: Validate a ReadGroup struct's fields are defined
outputs: {'check': 'Dummy output to indicate success and enable call-caching'}

Inputs

Required

_runtime (Any, required)
read_group (ReadGroup, required): ReadGroup struct to validate

Defaults

required_fields (Array[String], default=[]): Array of read group fields that must be defined. The ID field is always required and does not need to be specified.
restrictive (Boolean, default=true): If true, run a less permissive validation of field values. Otherwise, check against SAM spec-defined values.

Outputs

check (String)