VariantDB files
You can provide custom annotation information for specific variants of interest when you import a VariantDB file.
Note: Custom variantDBs must have unique names. For example, name and version number should be combined to ensure unique names.
A tab-delimited file with a header line is required.
##fileformat=VCFv4.1 #CHROM POS ID REF ALT QUAL FILTER INFO chr1 124535436 COSM00001 TG AA . . AMPID=AMPL495041;TEMP_ID=0 chr1 124535494 COSM00002 G T . . AMPID=AMPL495041;TEMP_ID=1 chr1 128808434 COSM00003 T A . . AMPID=AMPL30014;TEMP_ID=2 chr1 124597624 . T G . . AMPID=AMPL30014;TEMP_ID=3 chr1 136671158 . TT CA . . AMPID=AMPL30014;TEMP_ID=5 chr1 141128903 COSM00006 TTG CTT . . AMPID=AMPL30014;TEMP_ID=6
We recommend that the custom input file provided to VariantDB be left-aligned. Left alignment is used to normalize the positions of ambiguous INDELs that can be placed at multiple positions.
The information in your input VariantDB file is used in the following ways in your analysis results:
-
In a downloaded variants TSV file, the content in your ID, REF, ALT, and INFO fields are added to the variant.
-
In the Analysis Results screen, the content in your ID and INFO fields are added to the variant.
-
In the Analysis Results screen, you can create a filter that is based on the content in your ID field. If the content of the ID field does not contain a value (contains only a period), then the first key-value pair of your INFO field is used.
Further information on VCF format:
-
Official specification of VCF (Variant Call Format) version 4.1:
http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41
In VCF format files, missing values are represented by dots. The content must be tab-separated. Ensure that no extra or hidden characters are added to the VCF files, which may occur when they are opened in programs like Excel or Word, or when emailed as an unzipped attachment.
-
Mandatory headers required when creating a VariantDB file: The following three headers must be present in the first three lines of the VCF file (FORMAT and Sample columns are optional in VCF files):
##fileformat=VCFv4.1 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample
-
Hit-level information in a VariantDB file: You can adjust the hit level of each VariantDB file individually by including this information in the header. The following hit-level parameters can be included in the VCF header.
-
##HITLEVEL=overlap matches all annotations whose loci overlap with variant.
-
##HITLEVEL=locus matches all annotations whose loci start at the variant locus.
-
##HITLEVEL=allele matches all annotations that are 'locus' matches plus have at least one allele in common with variant.
-
##HITLEVEL=genotype matches all annotations that are 'allele' matches where the genotypes also match.
-
##HITLEVEL=auto matches the most specific hit level possible, which could be any of the hit levels listed above.
-
-
Mandatory columns required in the VCF file when creating a VariantDB: Providing FORMAT and SAMPLE fields is not mandatory according to the official VCF specification. However, in order to perform a "genotype" hit level match in Ion Reporter™ Software, you must specify a GT (genotype) for the variant in the FORMAT column.
An example of a variant with a GT field of 0/1 in the FORMAT field of a VCF file is given below:
chr1 141128903 COSM00006 TTG CTT . . AMPID=AMPL30014;TEMP_ID=6 GT 0/1
If only an "overlap" or "locus" or "allele" match is needed, you do not need to specify a GT field. However, the missing values must be represented by dots in the appropriate columns. For example:
chr1 141128903 COSM00006 TTG CTT . . AMPID=AMPL30014;TEMP_ID=6...
If the "auto" hit level match is chosen, Ion Reporter™ Software will try to find the most specific hit level match possible. However, if no GT value is supplied, the most specific hit level possible will be an allele match, as there is no GT value to do an allele or genotype level match.
-
-
Ion Reporter™ Software automatically exposes a filter on the first INFO key of the VariantDB VCF file if such a key is specified and if the ID field of the VCF file is missing.
-
Consider the example below:
##fileformat=VCFv4.1 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample chr1 124535436 . TG AA . . AMPID=AMPL495041;TEMP_ID=0... chr1 128808434 . T A . . AMPID=AMPL30014;TEMP_ID=2...
If the above VCF file with two variants is used in order to make a VariantDB in Ion Reporter™ Software, you will be able to filter on the AMPID field, since the AMPID key is the first INFO key present in the INFO field of the VCF file and the ID fields are missing (represented by dots).
-
If the INFO field is not populated, filtering will be automatically enabled on the ID column.
Consider the example below, in which the INFO field is missing and represented with a dot:
##fileformat=VCFv4.1 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample chr1 124535436 COSM00001 TG AA . . .... chr1 141128903 COSM00006 TTG CTT . . ....
If the VCF file above with two variants is used in order to make a VariantDB in Ion Reporter™ Software, you will be able to filter on the ID field, since the INFO field of the VCF file is not populated.
See also the sections on "MyVariants" for marking and tracking of variant annotation beyond the VariantDB annotation presets in analysis workflows.
-