docs / data_structures / read_group

read_group

WDL Version
1.1

Read groups are defined in the SAM spec

  • ID: Read group identifier. Each Read Group must have a unique ID. The value of ID is used in the RG tags of alignment records.
  • BC: Barcode sequence identifying the sample or library. This value is the expected barcode bases as read by the sequencing machine in the absence of errors. If there are several barcodes for the sample/library (e.g., one on each end of the template), the recommended implementation concatenates all the barcodes separating them with hyphens (-).
  • CN: Name of sequencing center producing the read.
  • DS: Description.
  • DT: Date the run was produced (ISO8601 date or date/time).
  • FO: Flow order. The array of nucleotide bases that correspond to the nucleotides used for each flow of each read. Multi-base flows are encoded in IUPAC format, and non-nucleotide flows by various other characters. Format: /\\*|[ACMGRSVTWYHKDBN]+/
  • KS: The array of nucleotide bases that correspond to the key sequence of each read.
  • LB: Library.
  • PG: Programs used for processing the read group.
  • PI: Predicted median insert size, rounded to the nearest integer.
  • PL: Platform/technology used to produce the reads. Valid values: CAPILLARY, DNBSEQ (MGI/BGI), ELEMENT, HELICOS, ILLUMINA, IONTORRENT, LS454, ONT (Oxford Nanopore), PACBIO (Pacific Biosciences), SINGULAR, SOLID, and ULTIMA. This field should be omitted when the technology is not in this list (though the PM field may still be present in this case) or is unknown.
  • PM: Platform model. Free-form text providing further details of the platform/technology used.
  • PU: Platform unit (e.g., flowcell-barcode.lane for Illumina or slide for SOLiD). Unique identifier.
  • SM: Sample. Use pool name where a pool is being sequenced.

An example input JSON entry for read_group might look like this:

{
    "read_group": {
        "ID": "rg1",
        "PI": 150,
        "PL": "ILLUMINA",
        "SM": "Sample",
        "LB": "Sample"
    }
}

Table of Contents

Page
Type
Description
struct
N/A
ERROR
workflow
Validates a ReadGroup struct is well-formed and then converts it to a String
Validates a ReadGroup struct is well-formed and then converts it to a String
task
Gets read group information from a BAM file and writes it out as JSON which is converted to a WDL struct.
Gets read group information from a BAM file and writes it out as JSON which is converted to a WDL struct.
task
Validate a ReadGroup struct's fields are defined and well-formed
Validate a ReadGroup struct's fields are defined and well-formed
task
Converts a ReadGroup struct to a String without any validation.
Converts a ReadGroup struct to a String without any validation.