Variant identifiers for Oncomine™ panels

Hotspots BED files

The BED file specification (http://genome.ucsc.edu/FAQ/FAQformat.html#format1.7) indicates that the fourth column is the name of the BED line, and is used to label the variant region in the UCSC genome browser or IGV. This label is also used to populate the ID field in the output VCF files as well as the Variant ID columns in Ion Reporter™ Software.

Oncomine™ panel hotspots files contain genomic representations that correspond to somatic variants that have been frequently observed in cancer samples, and thus are likely to be relevant to the cancer phenotype. These files also contain less frequently observed variants, implicated in literature reports as functionally relevant, for example, activating/inactivating variants. When possible, variants within the hotspots files are assigned an identifier consistent with a publicly accessible data source, preferentially COSMIC (https://cancer.sanger.ac.uk/cosmic) but also including dbSNP (https://www.ncbi.nlm.nih.gov/snp/) and ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/). If a variant cannot be found in COSMIC, it may receive an arbitrarily assigned identifier to aid in variant calling interpretation and troubleshooting; these identifiers (for example, BT144, OM3324, OMINDEL700, MAN103) should be consistent across Oncomine™ panels. When hotspots files are updated, Thermo Fisher Scientific reviews the COSMIC database to determine whether variants with such identifiers have been assigned COSMIC IDs, and replace them with the more meaningful COSMIC IDs.

COSMIC ID changes due to database updates

The COSMIC database updates approximately four times a year, and these updates involve the addition of newly curated variants, removal of a smaller number of variants, and changes in genomic representations to a very small number of variants. There is, therefore, a chance that the variant identifiers in the hotspots files might be out of sync with, or no longer be found on the COSMIC website. Additionally, prior to COSMIC version 90, the same normalized genomic variant (see https://pubmed.ncbi.nlm.nih.gov/25701572/) might have had multiple redundant COSMIC variant identifiers (COSM); since COSMIC version 90, these identifiers have been replaced with one consistent COSV identifier, and multiple COSMIC variants may map to these identifiers.

Fusions files

A comprehensive, universally recognized database of oncogenic gene fusion breakpoints does not exist. Therefore, gene fusion isoform identifiers are generated by concatenating the two gene symbols with a hyphen, and then combining the first letter of the 5’ partner with the last retained 5’ exon along with the first letter of the 3’ partner with the first retained 3’ exon. For example, EML4-ALK.E6A17 involves a fusion between the sixth exon of EML4 with the 17th exon of ALK. Fusion isoforms involving junctions between incomplete exons, or involving intronic insertions may contain “ins” or “del” modifiers followed by the number of nucleotides removed or added. Additionally, assay names may contain additional identifiers such as COSMIC COSF ids or GenBank accession numbers.