Variant identifiers for Oncomine™ panels
Hotspots BED files
The BED file specification (http://genome.ucsc.edu/FAQ/FAQformat.html#format1.7) indicates that the fourth column is the name of the BED line. The values in this column are used to label the variant region in the UCSC genome browser or the Integrative Genomics Viewer (IGV). The labels are used in Ion Reporter™ Software to populate the ID field in the output VCF files and the Variant ID columns.
Oncomine™ panel hotspots files include genomic representations that correspond to somatic variants that have often been observed in cancer samples and therefore are likely to be relevant to the cancer phenotype. These files also include variants that are observed less often, or are implicated in literature reports as functionally relevant, such as activating or inactivating variants.
When possible, variants in the hotspots files are assigned an identifier consistent with a publicly available data source, preferentially COSMIC (https://cancer.sanger.ac.uk/cosmic) but also including dbSNP (https://www.ncbi.nlm.nih.gov/snp/) and ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/). If a variant cannot be found in one of these databases, the variant can receive an arbitrarily assigned identifier to help with variant calling interpretation and troubleshooting. Such identifiers, for example, BT144, OM3324, OMINDEL700, MAN103, are designed to be consistent across Oncomine™ panels. When hotspots files are updated, Thermo Fisher Scientific reviews the COSMIC database to determine whether variants with such identifiers have been assigned COSMIC IDs and sometimes replaces them with the more meaningful COSMIC IDs.
COSMIC ID changes due to database updates
The COSMIC database updates approximately twice a year, and these updates involve the addition of newly curated variants, removal of a smaller number of variants, and changes in genomic representations to a very small number of variants. Therefore, there is a chance that the variant identifiers in the hotspots files are not synchronized with – or are no longer found on – the COSMIC website. Additionally, before COSMIC version 90, the same normalized genomic variant (see https://pubmed.ncbi.nlm.nih.gov/25701572/) can have had multiple redundant legacy COSMIC variant identifiers (COSM); since COSMIC version 90, these identifiers have been replaced with one consistent COSV identifier, to which multiple legacy COSMIC IDs can map.
Fusions files
A comprehensive, universally recognized database of oncogenic gene fusion breakpoints does not exist. Therefore, gene fusion isoform identifiers are generated by concatenating the two gene symbols with a hyphen, and then combining the first letter of the 5’ partner with the last retained 5’ exon along with the first letter of the 3’ partner with the first retained 3’ exon. For example, EML4-ALK.E6A17 involves a fusion between the sixth exon of EML4 with the 17th exon of ALK. Fusion isoforms involving junctions between incomplete exons, or involving intronic insertions can include “ins” or “del” modifiers followed by the number of nucleotides removed or added. Additionally, assay names can include additional identifiers such as COSMIC COSF ids or GenBank accession numbers.
