Read groups are defined in the SAM spec
- ID: Read group identifier. Each Read Group must have a unique ID.
The value of ID is used in the RG tags of alignment records.
- BC: "Barcode sequence identifying the sample or library. This value is the
expected barcode bases as read by the sequencing machine in the absence
of errors. If there are several barcodes for the sample/library
(e.g., one on each end of the template), the recommended implementation
concatenates all the barcodes separating them with hyphens (-
).
- CN: Name of sequencing center producing the read.
- DS: Description.
- DT: Date the run was produced (ISO8601 date or date/time).
- FO: Flow order. The array of nucleotide bases that correspond to the nucleotides
used for each flow of each read. Multi-base flows are encoded in IUPAC format,
and non-nucleotide flows by various other characters.
Format: /\*|[ACMGRSVTWYHKDBN]+/
- KS: The array of nucleotide bases that correspond to the key sequence of each read.
- LB: Library.
- PG: Programs used for processing the read group.
- PI: Predicted median insert size, rounded to the nearest integer.
- PL: Platform/technology used to produce the reads.
Valid values: CAPILLARY, DNBSEQ (MGI/BGI), ELEMENT, HELICOS, ILLUMINA, IONTORRENT,
LS454, ONT (Oxford Nanopore), PACBIO (Pacific Biosciences), SINGULAR, SOLID,
and ULTIMA. This field should be omitted when the technology is not in this list
(though the PM field may still be present in this case) or is unknown.
- PM: Platform model. Free-form text providing further details of the
platform/technology used.
- PU: Platform unit (e.g., flowcell-barcode.lane for Illumina or slide
for SOLiD). Unique identifier.
- SM: Sample. Use pool name where a pool is being sequenced.
An example input JSON entry for read_group
might look like this:
{
"read_group": {
"ID": "rg1",
"PI": 150,
"PL": "ILLUMINA",
"SM": "Sample",
"LB": "Sample"
}
}
read_group_to_string
- description
- Stringifies a ReadGroup struct
- outputs
- {'stringified_read_group': 'Input ReadGroup as a string'}
Inputs
Required
_runtime
(Any, required)read_group
(ReadGroup, required): ReadGroup struct to stringify
Outputs
stringified_read_group
(String)
get_read_groups
- description
- Gets read group information from a BAM file and writes it out as JSON which is converted to a WDL struct.
- outputs
- {'read_groups': 'An array of ReadGroup structs containing read group information.'}
Inputs
Required
_runtime
(Any, required)bam
(File, required): Input BAM format file to get read groups from
Defaults
modify_disk_size_gb
(Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.
Outputs
read_groups
(Array[ReadGroup])
validate_read_group
- description
- Validate a ReadGroup struct's fields are defined
- outputs
- {'check': 'Dummy output to indicate success and enable call-caching'}
Inputs
Required
_runtime
(Any, required)read_group
(ReadGroup, required): ReadGroup struct to validate
Defaults
required_fields
(Array[String], default=[]): Array of read group fields that must be defined. The ID field is always required and does not need to be specified.restrictive
(Boolean, default=true): If true, run a less permissive validation of field values. Otherwise, check against SAM spec-defined values.
Outputs
check
(String)