VariantDB files
You can provide custom annotation information for specific variants of interest when you import a VariantDB file.
Note: Custom variantDBs must have unique names. For example, name and version number can be combined to create unique names.
A tab-delimited file with a header line is required.
##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 124535436 COSM00001 TG AA . . AMPID=AMPL495041;TEMP_ID=0
chr1 124535494 COSM00002 G T . . AMPID=AMPL495041;TEMP_ID=1
chr1 128808434 COSM00003 T A . . AMPID=AMPL30014;TEMP_ID=2
chr1 124597624 . T G . . AMPID=AMPL30014;TEMP_ID=3
chr1 136671158 . TT CA . . AMPID=AMPL30014;TEMP_ID=5
chr1 141128903 COSM00006 TTG CTT . . AMPID=AMPL30014;TEMP_ID=6
We recommend that the custom input file provided to VariantDB be left-aligned. Left alignment is used to normalize the positions of ambiguous INDELs that can be placed at multiple positions.
The information in the input VariantDB file is used in the following ways in the analysis results:
-
In a downloaded variants TSV file, the content in the ID, REF, ALT, and INFO fields are added to the variant.
-
In the Analysis Results screen, the content in the ID and INFO fields are added to the variant.
-
In the Analysis Results screen, you can create a filter that is based on the content in the ID field. If the content of the ID field does not include a value (contains only a period), then the first key-value pair of the INFO field is used.
More information about VCF format:
-
Official specification of VCF (Variant Call Format) version 4.1:
http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41
In VCF format files, missing values are represented by dots. The content must be tab-separated. Ensure that no extra or hidden characters are added to the VCF files, which can occur when they are opened in programs like Excel or Word, or when emailed as an unzipped attachment.
-
Mandatory headers required when creating a VariantDB file: The following three headers must be present in the first three lines of the VCF file (FORMAT and Sample columns are optional in VCF files):
##fileformat=VCFv4.1 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample -
Hit-level information in a VariantDB file: You can adjust the hit level of each VariantDB file individually by including this information in the header. The following hit-level parameters can be included in the VCF header.
-
##HITLEVEL=overlap matches all annotations whose loci overlap with variant.
-
##HITLEVEL=locus matches all annotations whose loci start at the variant locus.
-
##HITLEVEL=allele matches all annotations that are 'locus' matches plus have at least one allele in common with variant.
-
##HITLEVEL=genotype matches all annotations that are 'allele' matches where the genotypes also match.
-
##HITLEVEL=auto matches the most specific hit level possible, which can be any of the hit levels listed above.
-
-
Mandatory columns required in the VCF file when creating a VariantDB: Providing FORMAT and SAMPLE fields is not required according to the official VCF specification. However, to perform a "genotype" hit level match in Ion Reporter™ Software, you need to specify a GT (genotype) for the variant in the FORMAT column.
An example of a variant with a GT field of 0/1 in the FORMAT field of a VCF file is given below:
chr1 141128903 COSM00006 TTG CTT . . AMPID=AMPL30014;TEMP_ID=6 GT 0/1If only an "overlap" or "locus" or "allele" match is needed, you do not need to specify a GT field. However, the missing values must be represented by dots in the appropriate columns. For example:
chr1 141128903 COSM00006 TTG CTT . . AMPID=AMPL30014;TEMP_ID=6...If the "auto" hit level match is chosen, Ion Reporter™ Software attempts to find the most specific hit level match possible. However, if no GT value is supplied, the most specific hit level possible is an allele match, as there is no GT value to do an allele or genotype level match.
-
How to filter on VariantDB:
-
Option 1:
Ion Reporter™ Software automatically exposes a filter on the first INFO key of the VariantDB VCF file if such a key is specified and if the ID field of the VCF file is absent.
-
Consider the example below:
##fileformat=VCFv4.1 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample chr1 124535436 . TG AA . . AMPID=AMPL495041;TEMP_ID=0... chr1 128808434 . T A . . AMPID=AMPL30014;TEMP_ID=2...If the above VCF file with two variants is used to make a VariantDB in Ion Reporter™ Software, you can filter on the AMPID field, because the AMPID key is the first INFO key present in the INFO field of the VCF file and the ID fields are absent (represented by dots).
-
Option 2:
If the INFO field is not populated, filtering is automatically enabled on the ID column.
Consider the example below, in which the INFO field is absent and represented with a dot:
##fileformat=VCFv4.1 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample chr1 124535436 COSM00001 TG AA . . .... chr1 141128903 COSM00006 TTG CTT . . ....If the VCF file above with two variants is used to make a VariantDB in Ion Reporter™ Software, you can filter on the ID field, because the INFO field of the VCF file is not populated.
For more information about marking and tracking of variant annotation in addition to the VariantDB annotation presets in analysis workflows, see MyVariants.
-
