VariantDB files

You can provide custom annotation information for specific variants of interest when you import a VariantDB file.

Note: Custom variantDBs must have unique names. For example, name and version number should be combined to ensure unique names.

A tab-delimited file with a header line is required.

				##fileformat=VCFv4.1
				#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
				chr1    124535436   COSM00001   TG  AA  .   .   AMPID=AMPL495041;TEMP_ID=0
				chr1    124535494   COSM00002   G   T   .   .   AMPID=AMPL495041;TEMP_ID=1
				chr1    128808434   COSM00003   T   A   .   .   AMPID=AMPL30014;TEMP_ID=2
				chr1    124597624   .           T   G   .   .   AMPID=AMPL30014;TEMP_ID=3
				chr1    136671158   .           TT  CA  .   .   AMPID=AMPL30014;TEMP_ID=5
				chr1    141128903   COSM00006   TTG CTT .   .   AMPID=AMPL30014;TEMP_ID=6

We recommend that the custom input file provided to VariantDB be left-aligned. Left alignment is used to normalize the positions of ambiguous INDELs that can be placed at multiple positions.

The information in your input VariantDB file is used in the following ways in your analysis results:

  • In a downloaded variants TSV file, the content in your ID, REF, ALT, and INFO fields are added to the variant.

  • In the Analysis Results screen, the content in your ID and INFO fields are added to the variant.

  • In the Analysis Results screen, you can create a filter that is based on the content in your ID field. If the content of the ID field does not contain a value (contains only a period), then the first key-value pair of your INFO field is used.

Further information on VCF format:

  • Official specification of VCF (Variant Call Format) version 4.1:

    http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41

    In VCF format files, missing values are represented by dots. The content must be tab-separated. Ensure that no extra or hidden characters are added to the VCF files, which may occur when they are opened in programs like Excel or Word, or when emailed as an unzipped attachment.

  • Mandatory headers required when creating a VariantDB file: The following three headers must be present in the first three lines of the VCF file (FORMAT and Sample columns are optional in VCF files):

    						##fileformat=VCFv4.1
    						##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    						#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    Sample
    					
  • Hit-level information in a VariantDB file: You can adjust the hit level of each VariantDB file individually by including this information in the header. The following hit-level parameters can be included in the VCF header.

    • ##HITLEVEL=overlap matches all annotations whose loci overlap with variant.

    • ##HITLEVEL=locus matches all annotations whose loci start at the variant locus.

    • ##HITLEVEL=allele matches all annotations that are 'locus' matches plus have at least one allele in common with variant.

    • ##HITLEVEL=genotype matches all annotations that are 'allele' matches where the genotypes also match.

    • ##HITLEVEL=auto matches the most specific hit level possible, which could be any of the hit levels listed above.

  • Mandatory columns required in the VCF file when creating a VariantDB: Providing FORMAT and SAMPLE fields is not mandatory according to the official VCF specification. However, in order to perform a "genotype" hit level match in Ion Reporter™ Software, you must specify a GT (genotype) for the variant in the FORMAT column.

    An example of a variant with a GT field of 0/1 in the FORMAT field of a VCF file is given below:

    chr1    141128903   COSM00006   TTG CTT .   .   AMPID=AMPL30014;TEMP_ID=6    GT    0/1

    If only an "overlap" or "locus" or "allele" match is needed, you do not need to specify a GT field. However, the missing values must be represented by dots in the appropriate columns. For example:

    chr1    141128903   COSM00006   TTG CTT .   .   AMPID=AMPL30014;TEMP_ID=6...

    If the "auto" hit level match is chosen, Ion Reporter™ Software will try to find the most specific hit level match possible. However, if no GT value is supplied, the most specific hit level possible will be an allele match, as there is no GT value to do an allele or genotype level match.

  • How to filter on VariantDB:

    • Option 1:

      Ion Reporter™ Software automatically exposes a filter on the first INFO key of the VariantDB VCF file if such a key is specified and if the ID field of the VCF file is missing.

    • Consider the example below:

      								##fileformat=VCFv4.1
      								##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
      								#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    Sample
      									chr1    124535436   .    TG  AA  .   .   AMPID=AMPL495041;TEMP_ID=0... 
      									chr1    128808434   .    T   A   .   .   AMPID=AMPL30014;TEMP_ID=2...
      							

      If the above VCF file with two variants is used in order to make a VariantDB in Ion Reporter™ Software, you will be able to filter on the AMPID field, since the AMPID key is the first INFO key present in the INFO field of the VCF file and the ID fields are missing (represented by dots).

    • Option 2:

      If the INFO field is not populated, filtering will be automatically enabled on the ID column.

      Consider the example below, in which the INFO field is missing and represented with a dot:

      								##fileformat=VCFv4.1
      								##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
      								#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    Sample
      								chr1    124535436   COSM00001   TG  AA  .   .   .... 
      								chr1    141128903   COSM00006   TTG CTT .   .   ....

      If the VCF file above with two variants is used in order to make a VariantDB in Ion Reporter™ Software, you will be able to filter on the ID field, since the INFO field of the VCF file is not populated.

      See also the sections on "MyVariants" for marking and tracking of variant annotation beyond the VariantDB annotation presets in analysis workflows.