Difference between revisions of "Bambus 2.0/quick start guide"
(→Running Bambus 2.0: +formattting) |
(→Step 1. Install the AMOS package - Bambus 2.0 is part of it.) |
||
Line 10: | Line 10: | ||
See [[AMOS Getting Started]]. | See [[AMOS Getting Started]]. | ||
− | '''Note:''' since Bambus is still under active development you should pull the latest unofficial release of AMOS directly from the | + | '''Note:''' since Bambus is still under active development you should pull the latest unofficial release of AMOS directly from the Git repository - see instructions at: [[Programmer's guide]]. |
− | + | ||
=== Step 2. What information you need === | === Step 2. What information you need === |
Revision as of 07:54, 24 July 2011
This is a copy of the Bambus 2 user guide taken (and improved) from here: http://www.cbcb.umd.edu/software/bambus/doc/HowToBambus2.pdf
See also: http://www.cbcb.umd.edu/software/bambus
Contents
How to run Bambus 2.0
Caveat: Bambus is still being actively developed and the code is currently in the "user beware" and "for experts only" stage.
Step 1. Install the AMOS package - Bambus 2.0 is part of it.
See AMOS Getting Started.
Note: since Bambus is still under active development you should pull the latest unofficial release of AMOS directly from the Git repository - see instructions at: Programmer's guide.
Step 2. What information you need
Bambus needs to know about the contigs produced by the assembler and information about how these contigs are linked to each other. In AMOS terms, the basic information necessary are a list of contigs (http://amos.sourceforge.net/docs/api/classAMOS_1_1Contig__t.html) and a list of contig links (http://amos.sourceforge.net/docs/api/classAMOS_1_1ContigLink__t.html) or contig edges (http://amos.sourceforge.net/docs/api/classAMOS_1_1ContigEdge__t.html - bundles of consistent contig links) indicating the relative placement of pairs of contigs.
These data can either be provided to Bambus directly in the form of a AMOS message file (see Message Types) or inferred from mate-pair information as described below.
Running Bambus 2.0
- First, add the .afg file built as described above (for other conversion utilities see: http://sourceforge.net/apps/mediawiki/amos/index.php?title=File_conversion_utilities) to an AMOS bank (flat-file database):
bank-transact -cf myproj.bnk -m myfile.afg
- Use the mate-pair information to construct a collection of contig links.
clk -b myproj.bnk
Note: that you can also construct these links with your own custom software and upload them into the bank in which case you would skip the "clk" command.
- Bundle the contig links into a collection of contig edges.
Bundler -b myproj.bnk
Note: as with the clk command you might want to build the contig edges separately and upload them into the bank using your own software.
Note: the Bundler command also accepts the command line parameter "-t" followed by a list of edge types as defined in src/AMOS/Link_AMOS.hh. Currently the following types are defined: M - mate-pair, O - overlap, P - physical, A - alignment, S - synteny, and X - other.
- Identify genomic repeats and output them to std out
MarkRepeats -b myproj.bnk [-redundancy X -aggressive] > myRepeats
Optional parameters:
- "-redundancy X" only uses contig edges comprising X or more contig links
- "-aggressive" - aggressive repeat identification based on global depth of coverage statistics (default procedure relies on graph analysis rather than coverage statistics)
Note: this program requires the boost library
- Order and orient contigs according to repeat and link information
IMPORTANT: several of the operations performed by this program destructively modify the bank (changes cannot be undone). You should make a copy of the bank prior to running OrientContigs.
OrientContigs -b myproj.bnk -prefix myscaff
- "-prefix" specifies the prefix for all output files
Optional parameters:
- "-all" - output unlinked contigs as scaffolds
- "-noreduce" - turns off graph simplification routines (see below)
- "-redundancy X" - same as above - ignore edges with less than X links
- "-repeats filename" - ignores repeats listed in "filename" (one contig ID per line) as generated, e.g. by the MarkRepeats :program described above.
- "-aggressive" - aggressive scaffolding - by default links that are stretched by more than 3 standard deviations are ignored. Aggressive option turns this feature off and tries to reconcile the scaffold as best possible.
- Linearize the scaffolds (if desired). By default Bambus 2 produces non-linear graph-based scaffolds. If fasta output is desired, it is necessary to linearize the scaffolds.
untangle -e myscaff.evidence.xml -s myscaff.out.xml -o myscaff.untangle.xml
- Output fasta result (if desired). This involves two steps, the first to generating the fasta file representing the contigs and the second combines them, separated by Ns, into a scaffold fasta file.
bank2fasta -d -b myproj.bnk > contigs.fasta printScaff -e myscaff.evidence.xml -s myscaff.untangle.xml -l myscaff.library -f contigs.fasta -merge -o myscaff
Outputs
The output of the OrientContigs program is a collection of scaffolds stored in the bank. The program also generates several files starting with the specified prefix
- myScaff.agp
- The scaffolds generated by the OrientContigs programs in NCBI AGP format
- myScaff.dot
- The scaffolds generated by the OrientContigs program in Graphviz dot format. It can be converted to a PostScript or PDF file using the dot program in the Graphviz package.
- myScaff.evidence.xml
- myScaff.library
- myScaff.out.xml
- The scaffolds generated by the OrientContigs program compatible with the Bambus 1 format.
- myScaff.fasta
- The fasta file of the scaffolds, joined by Ns
- myScaff.stats
- Statistics on the scaffolds generated, including N50 and total span.
Scaffold simplifications
By default (unless option "-noreduce" is provided) the OrientContigs program simplifies certain graph patterns:
- simple paths
- bubbles
- These patterns are iteratively merged into single contigs until no additional simplifications can be made.