Bambus 2.0/quick start guide

From AMOS WIKI
Revision as of 16:17, 16 December 2010 by Dmb000006 (Talk | contribs)

Jump to: navigation, search

This is a copy of the Bambus 2 user guide taken (and improved) from here: http://www.cbcb.umd.edu/software/bambus/doc/HowToBambus2.pdf

See also: http://www.cbcb.umd.edu/software/bambus


How to run Bambus 2.0

Caveat: Bambus is still being actively developed and the code is currently in the "user beware" and "for experts only" stage.

Step 1. Install the AMOS package - Bambus 2.0 is part of it.

See AMOS Getting Started.

Note: since Bambus is still under active development you should pull the latest unofficial release of AMOS directly from the CVS repository - see instructions at: Programmer's guide.


Step 2. What information you need

Bambus needs to know about the contigs produced by the assembler and information about how these contigs are linked to each other. In AMOS terms, the basic information necessary are a list of contigs (http://amos.sourceforge.net/docs/api/classAMOS_1_1Contig__t.html) and a list of contig links (http://amos.sourceforge.net/docs/api/classAMOS_1_1ContigLink__t.html) or contig edges (http://amos.sourceforge.net/docs/api/classAMOS_1_1ContigEdge__t.html - bundles of consistent contig links) indicating the relative placement of pairs of contigs.

These data can either be provided to Bambus directly in the form of a AMOS message file (see Message Types) or inferred from mate-pair information as described below.

Running Bambus 2.0

bank-transact -cf myproj.bnk -m myfile.afg
  • Use the mate-pair information to construct a collection of contig links.
clk -b myproj.bnk

Note: that you can also construct these links with your own custom software and upload them into the bank in which case you would skip the "clk" command.

  • Bundle the contig links into a collection of contig edges.
Bundler -b myproj.bnk

Note: as with the clk command you might want to build the contig edges separately and upload them into the bank using your own software.

Note: the Bundler command also accepts the command line parameter "-t" followed by a list of edge types as defined in src/AMOS/Link_AMOS.hh. Currently the following types are defined: M - mate-pair, O - overlap, P - physical, A - alignment, S - synteny, and X - other.

  • Identify genomic repeats and output them to std out
MarkRepeats -b myproj.bnk [-redundancy X -aggressive] > myRepeats

Optional parameters:

"-redundancy X" only uses contig edges comprising X or more contig links
"-aggressive" - aggressive repeat identification based on global depth of coverage statistics (default procedure relies on graph analysis rather than coverage statistics)

Note: this program requires the boost library

  • Order and orient contigs according to repeat and link information

IMPORTANT: several of the operations performed by this program destructively modify the bank (changes cannot be undone). You should make a copy of the bank prior to running OrientContigs.

OrientContigs -b myproj.bnk -prefix myscaff
"-prefix" specifies the prefix for all output files

Optional parameters:

"-all" - output unlinked contigs as scaffolds
"-noreduce" - turns off graph simplification routines (see below)
"-redundancy X" - same as above - ignore edges with less than X links
"-repeats filename" - ignores repeats listed in "filename" (one contig ID per line) as generated, e.g. by the MarkRepeats :program described above.
"-aggressive" - aggressive scaffolding - by default links that are stretched by more than 3 standard deviations are ignored. Aggressive option turns this feature off and tries to reconcile the scaffold as best possible.
  • Linearize the scaffolds (if desired). By default Bambus 2 produces non-linear graph-based scaffolds. If fasta output is desired, it is necessary to linearize the scaffolds.
untangle -e myscaff.evidence.xml -s myscaff.out.xml -o myscaff.untangle.xml
  • Output fasta result (if desired). This involves two steps, the first to generating the fasta file representing the contigs and the second combines them, separated by Ns, into a scaffold fasta file.
bank2fasta -d -b myproj.bnk > contigs.fasta
printScaff -e myscaff.evidence.xml -s myscaff.untangle.xml -l myscaff.library -f contigs.fasta -merge -o myscaff

Outputs

The output of the OrientContigs program is a collection of scaffolds stored in the bank. The program also generates several files starting with the specified prefix

  • myScaff.agp
    • The scaffolds generated by the OrientContigs programs in NCBI AGP format
  • myScaff.dot
    • The scaffolds generated by the OrientContigs program in Graphviz dot format. It can be converted to a PostScript or PDF file using the dot program in the Graphviz package.
  • myScaff.evidence.xml
  • myScaff.library
  • myScaff.out.xml
    • The scaffolds generated by the OrientContigs program compatible with the Bambus 1 format.
  • myScaff.fasta
    • The fasta file of the scaffolds, joined by Ns
  • myScaff.stats
    • Statistics on the scaffolds generated, including N50 and total span.

Scaffold simplifications

By default (unless option "-noreduce" is provided) the OrientContigs program simplifies certain graph patterns:

  • simple paths
  • bubbles
    • These patterns are iteratively merged into single contigs until no additional simplifications can be made.