![]() All PCR products of that read will contain the same UMI. ![]() ![]() In libraries using UMIs, a short sequence of random nucleotides is added to each DNA fragment before PCR amplification. ( a) Schematic showing utility of UMIs in identifying PCR duplicates. In downstream analysis these reads are normally ignored, but they can also be used in error correction or in estimating sequencing error rates. After sequencing, reads arising from PCR duplicates will all have the same UMI and all but one of these reads are marked as duplicates as shown in Figure 1a. In order to address the problem of duplicate reads from a biological perspective, library generation protocols have been developed that tag DNA fragments with a sequence of randomly selected nucleotides or partially degenerate nucleotides, called a unique molecular identifier (UMI). For example, accurate allele quantification may be distorted when identifying rare cancer mutations ( Kinde et al., 2011 Kukita et al., 2015 Mansukhani et al., 2017) or when measuring the genomic outcomes of CRISPR genome editing experiments ( Pinello et al., 2016) through amplicon sequencing of a specific target region. In libraries with potentially low sequence complexity, PCR amplification bias may go undetected and lead to inaccurate interpretation of sequencing results. short exons, restriction enzyme digestion, etc.) can also increase the apparent rate of PCR duplicates if sequence identity and alignment co-ordinates are the only criteria for identifying PCR duplicates.Īlthough PCR deduplication may not affect experimental outcome in cases of calling reference variants from genome wide sequencing ( Ebbert et al., 2016), library preparation strategies that rely on many PCR amplification cycles (such as single-cell experiments) are susceptible to errors introduced by amplification bias and should be corrected ( Islam et al., 2014). In addition, non-uniform genomic fragment formation (e.g. As sequencing depth increases, so does the chance of finding apparently duplicate reads that align to the same genomic location but are from different DNA fragments. However, if two DNA fragments from different cells produce reads aligning to the same location, they will be incorrectly called as PCR duplicates, even though they originated from two different molecules. Commonly used tools such as Picard MarkDuplicates ( Li et al., 2009), samtools rmdup ( Li, 2011) and SEAL ( Pireddu et al., 2011) align the reads to the genome and identify reads aligning to the same genomic position as duplicates. Some tools such as FastUniq ( Xu et al., 2012) or Fulcrum ( Burriesci et al., 2012) filter raw sequencing output in FASTQ format and remove any reads with the same sequence. Many approaches have been developed for the deduplication of next-generation reads from across the genome. Duplicate sequencing reads resulting from PCR amplification may lead to biases in results or incorrect conclusions about the actual frequency of that read. Exponential amplification is sequence dependent and differences in amplification rate may arise from variation in sequence composition ( Aird et al., 2011). In most library preparation protocols, genomic material must be amplified using PCR to ensure successful sequencing. Next-generation sequencing technologies have enabled the rapid and cost-effective translation of biological DNA or RNA sequences to short sequencing reads that can be used to analyze and understand the genome.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |