The Karolinska Institutet overlapper is designed to handle the problems created by sequencing errors. Instead of exact k-mer matches - the approach used by most existing assemblers - the KI overlapper uses a q-gram based method to identify "near hits" - k-mers that differ at a small number of positions. This approach allows this overlapper to identify overlaps otherwise missed by other overlappers.
Sequencing errors in combination with repeated regions cause major problems in shotgun sequencing, mainly due to the failure of assembly programs to distinguish single base differences between repeat copies from erroneous base calls. The Karolinska Institutet error corrector implements a new strategy to correct errors in shotgun sequence data using defined nucleotide positions, DNPs. The method distinguishes single base differences from sequencing errors by analyzing multiple alignments consisting of a read and all its overlaps with other reads. The construction of multiple alignments is performed using a novel pattern matching algorithm.
- "Correcting errors in shotgun sequences." Tammi MT, Arner E, Kindlund E, Andersson B, Nucleic Acids Research, 2003. 31(15):4663-72.
- "TRAP: Tandem Repeat Assembly Program produces improved shotgun assemblies of repetitive sequences." Tammi MT, Arner E, Andersson B, Computational Methods Programs Biomed, 2003. 70(1):47-59.
- "Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions, DNPs." Tammi MT, Arner E, Britton T, Andersson B, Bioinformatics, 2002. 18(3):379-88.