Glossary

aligned reads

The number of bases covered by reads aligned to the reference sequence.

aligned read length

The aligned length metric of a read at a given accuracy threshold is defined as the greatest position in the read at which the accuracy in the bases up to and including the position meets the accuracy threshold.

allele view

Variants per allele view in Ion Reporter Software. See also locus view.

alpha-beta diversity

Alpha diversity results describe the diversity in a single sample at the Species, Genus, and Family levels. Beta diversity results describe the diversity between multiple samples at the Species, Genus, and Family levels. Used with Metagenomics 16S analysis workflows in Ion Reporter Software. See also metagenomics.

amplicon coverage

An amplicon is a piece of DNA or RNA that is the source or product of a natural or artificial amplification or replication event. Coverage refers to the number of times the amplicon is amplified or replicated.

Analyze role

Person in the Ion Reporter Software organization who can create analysis workflows and launch analyses. See also Import role and Report role.

annotation-only analysis workflow

This predefined analysis workflow adds annotations when a VCF file is uploaded to Ion Reporter Software; there is no further analysis of the data, and no variants are called in Ion Reporter Software with this analysis workflow.

annotation set preset

Set of annotation sources to apply to variants for selection in the Annotation step of creating an analysis workflow.

annotation source

Ion Reporter Software provides several annotation sources that are derived from public and private annotation databases for hg19. See also annotation set preset.

API token

Unique identifier of an API (application programming interface) requesting access to your service, similar to a username-password authentication. See also Ion Reporter Software web services API.

average base coverage depth

The average number of reads of all targeted reference bases.

average base read depth

The average number of reads of all targeted reference bases that were read at least once.

BAM file

A BAM (binary alignment map) file (.bam) is the binary version of a SAM (sequence alignment map) file (.sam). A SAM file is a tab-delimited text file that contains sequence alignment data. A BAM file contains aligned reads sorted by reference location.

Bamstats

A software tool built on the Picard Java API(2) that can calculate and graphically display various metrics derived from SAM or BAM files.

barcode

A barcode is a machine-readable code in the form of numbers and a pattern of parallel lines of varying widths, printed on and identifying a product.

There are several applications for barcodes. Libraries can be molecularly barcoded with unique nucleic acid sequence identifiers. Library barcodes are used during data analysis to sort the sequencing results from sequencing reactions that contain combined libraries. Chips and sample tubes also contain unique numeric barcodes that aid in the setup of the experimental analysis workflow.

barcode crosstalk

Reads from a particular barcode that show up in a neighboring barcode. This can be a source of contamination in fusions results.

basecalling input file

Signal processing input files are converted to a single condensed basecalling input file that represents the processed signal. Basecalling input files are required files for basecalling.

base substitution classes

Somatic mutations can be divided into six base substitution classes: C>A, C>G, C>T, T>A, T>C, and T>G.

BED file

Browser Extensible Data file—BED file—defines chromosome positions or regions.

Boolean

A binary value, having two possible values called "true" and "false".

bp

Abbreviation for "base pair(s)".

cellularity (%)

The percentage of tumor cells in a given sample.

CDR

Complementarity-determining regions are components of the variable chains in antibodies and T-cell receptors that are generated by B-cells and T-cells.

cluster

A gene cluster is a group of two or more genes found within a sample's DNA that are similar in makeup.

CNV

Copy number variation (CNV) is the variation in copy number of any given gene between two samples. CNV is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals in the human population.

CNV baseline preset

Set of control samples that are used to create a baseline for detecting CNVs. The baselines are accessible in the Copy Number step when you create an analysis workflow.

codon

A sequence of three nucleotides that form a genetic code in a DNA or RNA molecule.

control sequence

Control nucleic acid sequences can be added to DNA or RNA samples to facilitate post-sequencing data analysis. Two types of control sequences can be used during sample preparation. ERCC RNA Spike‑In Mix is used with RNA samples to achieve a standard measure for data comparison across gene expression experiments. Ion AmpliSeq Sample ID Panel, comprised of nine specially designed primers, can be added prior to template amplification to generate a unique ID for each sample during post-sequencing analysis.

copy number gain

Greater than expected copy number for a gene or chromosome in a karyotype. See also copy number loss.

copy number loss

Less than expected copy number for a gene or chromosome in a karyotype. See also copy number gain.

coverage

The average number of reads representing a given nucleotide in the reconstructed sequence. Enables you to estimate the percentage of the genome covered by reads. High coverage overcomes errors in base-calling and assembly. The typical desired coverage of a genome is 30x.

coverage histogram

A graphical representation of coverage in Ion Reporter Genomic Viewer (IRGV).

CSV file

A comma-separated values (CSV) file is a delimited text file in which each line represents a data record with information fields separated by a comma. A CSV file stores tabular data (numbers and text) in plain text. Each line of the file is a data record.

CSV files are easily opened using spreadsheet software, such as Microsoft Excel or Apache® OpenOffice Calc, where each comma-separated field is listed in a separate column.

de novo assembly

Nucleic acid sequence data that is assembled from sequencing reads without the aid of a reference genome library sequence.

exon

DNA bases that are translated into mRNA.

FASTA file

A FASTA file is a text-based format for representing either nucleotide sequences or peptide sequences, in which base pairs or amino acids are represented using single-letter codes. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data.

FASTQ file

A FASTQ file is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are each encoded with a single ASCII character for brevity.

FD (flow disruptiveness)

A data filtering parameter that is used instead of INDEL, SNP, and MNP.

filter chain preset

Set of filters to apply to variants in the Filter step of creating an analysis workflow.

final report template preset

Final report templates that are accessible for selection in the Final Report step of creating an analysis workflow.

flow order

The order in which a chip is exposed to each particular dNTP. The default Samba flow order consists of a 32-base sequence, repeated. This flow order resists phase errors by providing opportunities for out-of-phase molecules to catch up and is designed to sample all dimer (nucleotide pair) sequences efficiently. Samba is the default flow order because it improves sequencing accuracy for longer reads by resisting phase errors.

frameshift insertion or deletion

Insertion or deletion of the number of nucleotide bases that are not divisible by 3, hence change in reading frame, the grouping of codons, and completely different protein translation from the original.

functional score

A filter in Ion Reporter Software that provides functional scores based on SIFT, PolyPhen, and Grantham scores. See also SIFT score, PolyPhen score, and Grantham score.

fusions

A targeted sequencing technique used for detection and annotation of gene fusions (or translocation of genetic material) in samples.

genomic coordinates

Where variants are located on chromosomes or genes.

germline

Germ-cell lineage.

Grantham score

A measure of evolutionary distance. Used in Metagenomics 16S analyses in Ion Reporter Software.

GRCh38 human reference

Genome based on the latest Genome Reference Consortium (GRC) human reference assembly. See also Library Reference.

hotspots file

A BED or a VCF file that defines variants. Specifying a hotspots file to use in a run enables the Torrent Variant Caller (TVC) module to identify variants that may be present in sample DNA. A hotspots file instructs the Torrent Variant Caller (TVC) module to include these positions in its output files, including evidence for a variant and the filtering thresholds that disqualified a variant candidate. A hotspots file does not affect other parts of the analysis pipeline.

If you do not specify a hotspots file, the software tells only the difference between your sequence and the reference genome.

IGV

Acronym for the Integrative Genomics Viewer developed by the Broad Institute for visualizing analysis results. See also IRGV.

Import role

Person in the Ion Reporter Software organization who can import and define samples and launch analyses. See also Analyze role and Report role.

INDEL

INDEL is an abbreviation used to designate an insertion or deletion of bases in the genome of an organism.

intron

DNA bases found in between exons.

IRGV

Acronym for Ion Reporter Genomic Viewer, which is used to visualize analysis results.

ISPs

Ion Sphere Particles (ISPs) are particles that contain bound copies of a single (ideally) DNA fragment amplified during template preparation.

IUPAC

Acronym for International Union of Pure and Applied Chemistry. Ion Reporter Software uses IUPAC codes for amino acids.

JSON file

JavaScript Object Notation file. Used in Ion Reporter Software to import parameters from Torrent Suite Software.

Ion Mesh

A network of Ion Torrent Servers that allows users to perform the following actions:

  • View all runs of interest across multiple servers on the same data page.

  • Transfer Planned Runs between different connected servers.

  • Perform flexible workflows for Ion 550 Chips across different Ion Chef instruments connected to different servers. Torrent Suite Software can track reagent/cartridge usage across multiple servers that are a part of the same Ion Mesh.

key signal

Average 1-mer signal in the library key.

library ISPs

Live ISPs that have a key signal identical to the library key signal.

library key

A short known sequence of bases used to distinguish a library fragment from a test fragment (for example, "TCAG").

Locus view

Locus-centric view of variants in Ion Reporter Software. See also Allele view.

LOD

Acronym for limit of detection. LOD is the lowest quantity of a substance that can be determined.

LONGDEL

Long deletion.

MAF

Minor allele frequency (MAF) annotation source of population frequency information from the 1000 genomes project.

MAPD

The Median of the Absolute values of all Pairwise Differences (MAPD) score is reported on Aneuploidy run results and other runs that detect CNVs.

MAPD is one metric that is used to determine whether the panel data are useful for copy number variation (CNV) run results.

MAPD is defined as the Median of the Absolute values of all Pairwise Differences between log2 ratios of each tile for a run. Tiles roughly correspond to amplicons in an Ion AmpliSeq assay. Each pair is defined as adjacent in terms of genomic distance. Tiles corresponding to copy number amplicons and other amplicons are being treated equally as no differences in variability are seen between these types. Then, any two log2 ratios that are adjacent on the genome are a pair. Except at the beginning and the end of a chromosome, every log2 ratio belongs to two pairs.

Formally, if xi is the log2 ratio at position i, with i ordered by genomic position:

MAPD = median ( | x(i - 1) - x(i) | )

MAPD is an estimate of copy number variability in each sequencing run that is similar to standard deviation (SD). If one assumes the log2 ratios are distributed normally with mean 0 and a constant SD, then MAPD/0.95 is approximately equal to SD. However, unlike SD, using MAPD is robust against high biological variability in log2 ratios induced by known conditions such as cancer.

Regardless of the source of the variability, increased variability decreases the quality of CNV calls.

Mbp

Million base pairs.

metagenomics

Population diversity in polymicrobial research samples.

missense SNV

A point mutation that changes the amino acid of the respective protein. SNV is an acronym for single nucleotide variation, which means at one base there is a difference.

MNP

Multiple nucleotide polymorphism (MNP) is a genetic mutation in an allele that differs from the reference allele of the same length by >1 nucleotide.

mosaicism

Decimal-level copy number gain or loss calls.

non-frameshift insertion or deletion

Insertion or deletion of the number of nucleotide base that are divisible by 3, hence, the inclusion or exclusion of amino acid in the protein translation from the original.

non-PAR

Non-Pseudoautosomal Regions (PAR 1 and PAR 2) of the human X and Y chromosomes pair and recombine during meiosis. Therefore, genes in this region are not inherited in a strictly sex-linked fashion.

no template control

Sample that has no cDNA or gDNA content.

nonsense SNV

A point mutation that changes one of the 20 amino acids into a stop codon, hence a shorter or unfinished protein product. SNV is an acronym for single nucleotide variation, which means at one base there is a difference.

OTU

Operational taxonomic unit (OTU) tables used by QIIME to generate alpha-beta diversity results in metagenomics analyses.

paired sample

Control or normal sample paired with a tumor sample.

partner gene

Used in fusions to describe the second gene involved in a translocation of genetic material. Donor gene is the first.

phyloP score

Measure of conservation of protein across a wide range of organisms in metagenomics analyses.

polyclonal ISP

An ISP that carries clones from two or more library sequences.

PolyPhen score

Prediction of the functional effect of a variant on a protein.

primer dimer ISP

An ISP that carries an insert length of less than 8 base pairs.

proband

A person or a sample that is serving as a starting point for the genetic study. Denoting the proband aids in establishing relationships within a group. In medical genetics, the proband is the first affected family member who seeks medical attention for a genetic disorder.

p-value

Probability value. A statistical method for the detection of variant calls from next-generation sequencers.

In Ion Reporter Software, p-values in Analysis Results in the column named p-value are rounded to 5 decimal places (between 0.00001-0.99999) when shown in the software screen. Very small p-values that are less than 0.00001 are rounded to 0.00001 by default when shown in the software screen. Very large p-values that are greater than 0.99999 are rounded to 0.99999 by default when shown in the software screen.

Q score

Phred quality score (Q score) is used to measure the accuracy of the nucleotide sequence generated by the sequencing instrument. The Q score represents the probability that a given base is called incorrectly by the sequencer.

read mapping

Alignment of sequencing reads to a reference genome.

reference library

A consensus nucleotide sequence that represents the genome of a particular species. The results from a sequencing run are compared to the reference library to identify sequence variants.

relationship group

Defines related samples within a sample set. Related samples are designated by the same relationship group number.

Report role

Person in the Ion Reporter Software organization who can generate reports. See also Analyze role and Import role.

sample

Genetic material from one source (for example, DNA from one individual).

sample pair

Can be a sample from normal tissue and tumor tissue, control sample and test sample.

SIFT score

SIFT stands for Sorting Intolerant from Tolerant and is an algorithm for predicting whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids.

smoothing

Ion Reporter Software includes a smoothing algorithm to smooth discrete data points in aneuploidy detection visualization.

SNP

Single nucleotide polymorphism (SNP) is a genetic mutation in an allele that differs from the reference allele of the same length by one nucleotide.

somatic

Cells from the body of an organism.

splice site

A genetic mutation that inserts, deletes, or changes a number of nucleotides at a specific location.

structural variants

Genetic mutations that cause a change in the organism's chromosome structure, such as insertions, deletions, copy number variations, duplications, inversions, and translocations.

target regions file

A BED file that specifies regions that a panel represents such as the amplified regions that are used with target sequencing. The complete software analysis pipeline, including plugins, is restricted to include only these specified regions instead of the entire reference library.

test fragment ISPs

Live ISPs with a key signal that is identical to the test fragment key signal.

transcripts

Gene transcripts as determined by public annotation sources.

trio

Father, mother and child (proband) samples.

TSV file

A tab-separated values (TSV) file is a tab-delimited file that is used with spreadsheet software. TSV files are essentially text files, and the raw data can be viewed by text editors, though they are often used when moving raw data between spreadsheets. See also VCF file.

tumor mutational burden

A calculation of nonsynonomous variants (missense and nonsense single nucleotide variants (SNVs)) plus insertion and deletion variants (INDELs) detected per megabase (Mb) of exonic sequence.

tumor-normal pair

Samples from tumor and normal healthy tissue.

unaligned reads

Nucleotide bases covered by reads that are not aligned to the reference.

VCF file

A variant call format (VCF) file specifies a variant of interest and its location. This file stores the differences between the BAM file and the reference file.

VCIB

Variability Correction Informatics Baseline is a CNV baseline available in Ion Reporter Software. Users can start with this CNV baseline and add their samples to it when building CNV baseline analysis workflows.

Ion Reporter Software web services API

The Ion Reporter Software web services API (application programming interface) can be used to automate returns and retrieve key information from the system. Ion Reporter Software APIs are compliant with REST (Representational State Transfer) architectural constraints.