Barcode CrossTalk QC

This module analyzes the reads from all the barcodes on a chip from one run and determines if there are any reads in a particular barcode that could belong to any of the barcodes.

For example, if barcode manufacturers provide us a QC threshold of 0.5%, i.e., there is a chance that up-to 0.5% of the reads in any one barcode (say barcode7) could be mislabeled as that barcode (barcode 7) where in reality those reads are from a different barcode (say barcode 8). Usually for many applications this is not important. But this is very important for applications such as fusions which require high sensitivity.

For example, if barcode 8 is positive for a fusion involving ROS1 gene with read_count = 100000.

If barcode 7 also has ROS1 read_count = 30. (30 is less than 0.5% of 100000)

Current fusions algorithm will call both barcode 7 and barcode 8 as positive, because the read count in both these samples is >20 (our default threshold).

Barcode 7 is false positive and barcode 8 is true positive.

Barcode crosstalk is not the only source of the contamination, these types of reads could also be seen due to sample-level contaminations as well.

In order to identify these reads, this module needs to look at the reads from all barcodes. The maximum estimated percentage of crosstalk is by default 0.5%, but there is a parameter exposed in user interface that users can change.

This module generates as qcInfo file per barcode and summary file for the entire chip. These files are generated before launching fusions calling module on any one of the barcodes and are passed to the fusions module.