Infrastructure

From AMOS WIKI
Revision as of 22:28, 7 July 2009 by Mcschatz (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The principal benefit of the AMOS project is its modular design, but in order to facilitate many, isolated components, a robust infrastructure is desirable. In response to this need, TIGR has developed numerous C++ classes for the efficient storage of assembly data types. These assembly objects can be written to and read from a central data repository, allowing for separate modules to build on and improve existing assemblies in discrete steps. This allows an assembly pipeline to run its steps in any order, and for data snapshots to be preserved at any time. In order to convey the assembly data outside of the C++ classes, we have implemented an ASCII message format modeled on that used by Celera Assembler*. This message format will be the unifying standard for all external module communication, and allow for the data snapshots to be output in a concise, text format. The API (application programming interface) for the AMOS foundation classes and the specification for the AMOS message format can be found in the sections below.

  • "A Whole-Genome Assembly of Drosophila." Myers E, Sutton G, et. al., Science, 2000. 287(5461):2196-204.


libAMOS

The AMOS API describes the programming interface for all of the AMOS foundation classes. Currently these classes are implemented in C++, but could ported to other languages as long as the API was preserved. The implementation can be found in the latest distribution under the src/AMOS project directory. These classes comprise the libAMOS.a library. This library contains the tools necessary to handle and manipulate AMOS messages, data-banks and internal assembly data structures such as sequencing reads, contigs, scaffolds, etc.


More Information: libAMOS


libSlice

libSlice is a C++ library that provides the user with a parametric implementation of the Churchill-Waterman algorithm for computing the consensus base from a column in a multiple alignment of reads. This task is an essential part of any consensus module. The implementation can be found in the latest distribution under the src/Slice project directory. These C structs comprise the libSlice.a library.


More Information: libSlice


libAlign

libAlign is a robust multi-alignment library for consensus generation. It can efficiently handle large inputs and is able to identify and correctly align slightly misplaced and/or low-similarity reads in the input. The implementation can be found in the latest distribution under the src/Align project directory. These classes comprise the libAlign.a library and depend on the libSlice library.



File format specs

The AMOS file types and message formats are defined in various specification documents, which can be found by following the below link. These documents also provide information on how to use messages for module communication and general development procedure recommendations.

message_types.rtf

Describes expected fields for AMOS message types. This document is a good starting point if you want to write your own parser for AMOS files.


message_grammar.rtf

Describes grammar requirements for AMOS messages. Parsers for this file format are provided both in Perl (see AMOS::AmosLib) and C++ (the Message_t type in the API). Note that this format is inspired from the format developed at Celera Genomics for Celera Assembler. The AMOS parsers can be used to parse Celera files as well, however the specific data types are different between AMOS and Celera Assembler.


libAMOS_quickref.rtf

Quick reference guide to the libAMOS C++ library. This document contains definitions of the main concepts needed in writing AMOS code and provides you with code examples for performing basic bank access operations.