AMOS Utilities

From AMOS WIKI
Revision as of 17:51, 8 July 2009 by Mcschatz (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Below is a short description of several useful tools or modules released with AMOS. Full documentation for some of these tools is still pending, however you can obtain the basic info by running these programs with option "-h" (help). If you wish to contribute documentation to any of the tools let us know.


Bank operations

  • bank-transact - tool for loading an AMOS message file into a bank
  • bank-report - tool for extracting AMOS messages from a bank
  • select-reads - tool for selecting subsets of reads from a bank. Allows both inclusive and exclusive queries (i.e. fetch all reads not in a specified list)


Validation

  • asmQC - mate-pair based validation tool. Reports clusters of problematic mate-pairs, i.e. too short/long, mis-oriented, etc. Can also be used to recompute library sizes. The problem areas are reported as features that can be viewed by the Hawkeye viewer.
  • amosvalidate - AMOS pipeline containing several quality control operations (including asmQC). Running amosvalidate on a bank will populate the bank with features indicating possible problem areas, such as incorrect mate-pairs, high SNP density, etc. These features can be viewed in Hawkeye.
  • cavalidate - like amosvalidate however it works off the output of Celera Assembler.


Multialignment operations

  • make-consensus - code to build a multiple alignment of a set of reads. As input it takes a layout - i.e. a contig that specifies the approximate placement of the reads with respect to each other. Make-consensus fills in the details (gaps, exact position of the reads in the consensus, etc.) and computes the consensus sequence for the contig.
  • recallConsensus - tool that updates the consensus of a contig. Note that the contig must already have been created by make-consensus. recallConsensus can apply a slightly different algorithm for computing consensus calls (e.g. by allowing ambiguity codes), or recompute the consensus after some of the reads have been edited. RecallConsensus does not recompute the placement of the reads. Use make-consensus if read placement may need to change.


Layout tools

  • tigger - a unitigger based on Gene Myers' original chunk graph assembly code. It takes as input a set of overlaps between reads (stored in a bank) and outputs a set of layouts. To compute the consensus you will need to use the make-consensus program (see above).
  • casm-layout - a comparative layout program. Takes as input a MUMmer .delta file and outputs a set of layouts (which can be processed by make-consensus to generate contigs). Casm-layout attempts to avoid mis-assemblies by identifying differences between the genome being assembled and the reference genome used to construct the layout. For more information see AMOScmp.


Overlappers

  • hash-overlap - this is a basic shotgun read overlapper, using mimizers (see reference below) to reduce memory usage and increase performance.

Roberts M, Hunt BR, Yorke JA, Bolanos RA, Delcher AL. (2004)A preprocessor for shotgun assembly of large genomes.J Comput Biol. 2004;11(4):734-52.


Viewers

  • Hawkeye - one-stop shop to your data visualization needs