Glossary
aligned reads
The number of bases covered by reads aligned to the reference sequence.
aligned read length
allele view
alpha-beta diversity
amplicon coverage
Analyze role
annotation-only analysis workflow
annotation set preset
annotation source
API token
average base coverage depth
The average number of reads of all targeted reference bases.
average base read depth
The average number of reads of all targeted reference bases that were read at least once.
BAM file
A BAM (binary alignment map) file (.bam) is the binary version of a SAM (sequence alignment map) file (.sam). A SAM file is a tab-delimited text file that contains sequence alignment data. A BAM file contains aligned reads sorted by reference location.
Bamstats
barcode
A barcode is a machine-readable code in the form of numbers and a pattern of parallel lines of varying widths, printed on and identifying a product.
There are several applications for barcodes. Libraries can be molecularly barcoded with unique nucleic acid sequence identifiers. Library barcodes are used during data analysis to sort the sequencing results from sequencing reactions that contain combined libraries. Chips and sample tubes also contain unique numeric barcodes that aid in the setup of the experimental analysis workflow.
barcode crosstalk
basecalling input file
Signal processing input files are converted to a single condensed basecalling input file that represents the processed signal. Basecalling input files are required files for basecalling.
base substitution classes
Somatic mutations can be divided into six base substitution classes: C>A, C>G, C>T, T>A, T>C, and T>G.
BED file
Browser Extensible Data file—BED file—defines chromosome positions or regions.
Boolean
bp
Abbreviation for "base pair(s)".
cellularity (%)
The percentage of tumor cells in a given sample.
CDR
cluster
CNV
Copy number variation (CNV) is the variation in copy number of any given gene between two samples. CNV is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals in the human population.
CNV baseline preset
Set of control samples that are used to create a baseline for detecting CNVs. The baselines are accessible in the Copy Number step when you create an analysis workflow.
codon
control sequence
Control nucleic acid sequences can be added to DNA or RNA samples to facilitate post-sequencing data analysis. Two types of control sequences can be used during sample preparation. ERCC RNA Spike‑In Mix is used with RNA samples to achieve a standard measure for data comparison across gene expression experiments. Ion AmpliSeq™ Sample ID Panel, comprised of nine specially designed primers, can be added prior to template amplification to generate a unique ID for each sample during post-sequencing analysis.
copy number gain
copy number loss
coverage
coverage histogram
CSV file
A comma-separated values (CSV) file is a delimited text file in which each line represents a data record with information fields separated by a comma. A CSV file stores tabular data (numbers and text) in plain text. Each line of the file is a data record.
CSV files are easily opened using spreadsheet software, such as Microsoft™ Excel™ or Apache® OpenOffice™ Calc, where each comma-separated field is listed in a separate column.
de novo assembly
Nucleic acid sequence data that is assembled from sequencing reads without the aid of a reference genome library sequence.
exon
FASTA file
A FASTA file is a text-based format for representing either nucleotide sequences or peptide sequences, in which base pairs or amino acids are represented using single-letter codes. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data.
FASTQ file
A FASTQ file is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are each encoded with a single ASCII character for brevity.
FD (flow disruptiveness)
A data filtering parameter that is used instead of INDEL, SNP, and MNP.
filter chain preset
Set of filters to apply to variants in the Filter step of creating an analysis workflow.
final report template preset
flow order
The order in which a chip is exposed to each particular dNTP. The default Samba flow order consists of a 32-base sequence, repeated. This flow order resists phase errors by providing opportunities for out-of-phase molecules to catch up and is designed to sample all dimer (nucleotide pair) sequences efficiently. Samba is the default flow order because it improves sequencing accuracy for longer reads by resisting phase errors.
frameshift insertion or deletion
functional score
fusions
A targeted sequencing technique used for detection and annotation of gene fusions (or translocation of genetic material) in samples.
genomic coordinates
germline
Grantham score
GRCh38 human reference
hotspots file
A BED or a VCF file that defines variants. Specifying a hotspots file to use in a run enables the Torrent Variant Caller (TVC) module to identify variants that may be present in sample DNA. A hotspots file instructs the Torrent Variant Caller (TVC) module to include these positions in its output files, including evidence for a variant and the filtering thresholds that disqualified a variant candidate. A hotspots file does not affect other parts of the analysis pipeline.
If you do not specify a hotspots file, the software tells only the difference between your sequence and the reference genome.
IGV
Import role
INDEL
INDEL is an abbreviation used to designate an insertion or deletion of bases in the genome of an organism.
intron
IRGV
ISPs
Ion Sphere™ Particles (ISPs) are particles that contain bound copies of a single (ideally) DNA fragment amplified during template preparation.
IUPAC
JSON file
Ion Mesh
A network of Ion Torrent™ Servers that allows users to perform the following actions:
-
View all runs of interest across multiple servers on the same data page.
-
Transfer Planned Runs between different connected servers.
-
Perform flexible workflows for Ion 550™ Chips across different Ion Chef™ instruments connected to different servers. Torrent Suite™ Software can track reagent/cartridge usage across multiple servers that are a part of the same Ion Mesh.
key signal
Average 1-mer signal in the library key.
library ISPs
Live ISPs that have a key signal identical to the library key signal.
library key
A short known sequence of bases used to distinguish a library fragment from a test fragment (for example, "TCAG").
Locus view
LOD
LONGDEL
MAF
MAPD
The Median of the Absolute values of all Pairwise Differences (MAPD) score is reported on Aneuploidy run results and other runs that detect CNVs.
MAPD is one metric that is used to determine whether the panel data are useful for copy number variation (CNV) run results.
MAPD is defined as the Median of the Absolute values of all Pairwise Differences between log2 ratios of each tile for a run. Tiles roughly correspond to amplicons in an Ion AmpliSeq™ assay. Each pair is defined as adjacent in terms of genomic distance. Tiles corresponding to copy number amplicons and other amplicons are being treated equally as no differences in variability are seen between these types. Then, any two log2 ratios that are adjacent on the genome are a pair. Except at the beginning and the end of a chromosome, every log2 ratio belongs to two pairs.
Formally, if xi is the log2 ratio at position i, with i ordered by genomic position:
MAPD = median ( | x(i - 1) - x(i) | )
MAPD is an estimate of copy number variability in each sequencing run that is similar to standard deviation (SD). If one assumes the log2 ratios are distributed normally with mean 0 and a constant SD, then MAPD/0.95 is approximately equal to SD. However, unlike SD, using MAPD is robust against high biological variability in log2 ratios induced by known conditions such as cancer.
Regardless of the source of the variability, increased variability decreases the quality of CNV calls.
Mbp
metagenomics
missense SNV
MNP
Multiple nucleotide polymorphism (MNP) is a genetic mutation in an allele that differs from the reference allele of the same length by >1 nucleotide.
mosaicism
non-frameshift insertion or deletion
non-PAR
no template control
nonsense SNV
OTU
paired sample
partner gene
phyloP score
polyclonal ISP
An ISP that carries clones from two or more library sequences.
PolyPhen score
primer dimer ISP
An ISP that carries an insert length of less than 8 base pairs.
proband
A person or a sample that is serving as a starting point for the genetic study. Denoting the proband aids in establishing relationships within a group. In medical genetics, the proband is the first affected family member who seeks medical attention for a genetic disorder.
p-value
Probability value. A statistical method for the detection of variant calls from next-generation sequencers.
In Ion Reporter™ Software, p-values in Analysis Results in the column named p-value are rounded to 5 decimal places (between 0.00001-0.99999) when shown in the software screen. Very small p-values that are less than 0.00001 are rounded to 0.00001 by default when shown in the software screen. Very large p-values that are greater than 0.99999 are rounded to 0.99999 by default when shown in the software screen.
Q score
Phred quality score (Q score) is used to measure the accuracy of the nucleotide sequence generated by the sequencing instrument. The Q score represents the probability that a given base is called incorrectly by the sequencer.
read mapping
reference library
A consensus nucleotide sequence that represents the genome of a particular species. The results from a sequencing run are compared to the reference library to identify sequence variants.
relationship group
Defines related samples within a sample set. Related samples are designated by the same relationship group number.
Report role
sample
Genetic material from one source (for example, DNA from one individual).
sample pair
SIFT score
smoothing
SNP
Single nucleotide polymorphism (SNP) is a genetic mutation in an allele that differs from the reference allele of the same length by one nucleotide.
somatic
splice site
structural variants
Genetic mutations that cause a change in the organism's chromosome structure, such as insertions, deletions, copy number variations, duplications, inversions, and translocations.
target regions file
A BED file that specifies regions that a panel represents such as the amplified regions that are used with target sequencing. The complete software analysis pipeline, including plugins, is restricted to include only these specified regions instead of the entire reference library.
test fragment ISPs
Live ISPs with a key signal that is identical to the test fragment key signal.
transcripts
trio
TSV file
A tab-separated values (TSV) file is a tab-delimited file that is used with spreadsheet software. TSV files are essentially text files, and the raw data can be viewed by text editors, though they are often used when moving raw data between spreadsheets. See also VCF file.
tumor mutational burden
tumor-normal pair
unaligned reads
Nucleotide bases covered by reads that are not aligned to the reference.
VCF file
A variant call format (VCF) file specifies a variant of interest and its location. This file stores the differences between the BAM file and the reference file.
