Difference between revisions of "UMD Overlapper"
(New page: == Overview == The UMD overlapper is designed to reduce the number of overlaps produced by the assembler by reducing the number of repeat-induced overlaps. Furthermore the algorithm is gr...) |
|||
Line 11: | Line 11: | ||
* "A preprocessor for shotgun assembly of large genomes." Roberts M, Hunt BR, Yorke JA, Bolanos R, Delcher A, Journal of Computational Biology, 2004. 11(4):734-752 | * "A preprocessor for shotgun assembly of large genomes." Roberts M, Hunt BR, Yorke JA, Bolanos R, Delcher A, Journal of Computational Biology, 2004. 11(4):734-752 | ||
* "Reducing storage requirements for biological sequence comparison." Roberts M, Hayes W, Hunt BR, Mount SM, Yorke JA. Bioinformatics, 2004, 20(18):3363-3369. | * "Reducing storage requirements for biological sequence comparison." Roberts M, Hayes W, Hunt BR, Mount SM, Yorke JA. Bioinformatics, 2004, 20(18):3363-3369. | ||
+ | |||
+ | |||
+ | == More information == | ||
+ | See: [http://www.genome.umd.edu/ http://www.genome.umd.edu/] |
Latest revision as of 22:33, 7 July 2009
Overview
The UMD overlapper is designed to reduce the number of overlaps produced by the assembler by reducing the number of repeat-induced overlaps. Furthermore the algorithm is greatly enhanced through the use of minimizers - a technique for reducing the number of k-mers considered in the initial phase of overlapping by an order of magnitude. Most assemblers use exact k-mer matches in order to identify reads that potentially overlap.
In conjunction with the UMD overlapper, the UMD error corrector identifies and corrects potential sequencing errors by detecting bases in a multiple alignment of reads that are supported by only one of the reads. The algorithm uses a heuristic rule called the 4-3 rule that examines overlapping sets of 4 reads at 3 positions in order to
identify differences corresponding to distinct copies of a repeat.
Related publications
- "A preprocessor for shotgun assembly of large genomes." Roberts M, Hunt BR, Yorke JA, Bolanos R, Delcher A, Journal of Computational Biology, 2004. 11(4):734-752
- "Reducing storage requirements for biological sequence comparison." Roberts M, Hayes W, Hunt BR, Mount SM, Yorke JA. Bioinformatics, 2004, 20(18):3363-3369.