Difference between revisions of "Minimus/README"

From AMOS WIKI
Jump to: navigation, search
m (Example)
m
Line 32: Line 32:
  
 
TGT - The target genome sequences in AMOS message format (.afg)
 
TGT - The target genome sequences in AMOS message format (.afg)
  `minimus -D TGT=<target> <prefix>'
+
  minimus -D TGT=<target> <prefix>
 
OR
 
OR
  `runAmos -C minimus -D TGT=<target> <prefix>'
+
  runAmos -C minimus -D TGT=<target> <prefix>
  
 
Where <prefix> will be the output file prefix, and <target> is the
 
Where <prefix> will be the output file prefix, and <target> is the
Line 52: Line 52:
 
<prefix>.afg we can run minimus simply by typing:
 
<prefix>.afg we can run minimus simply by typing:
  
  `minimus <prefix>'
+
  minimus <prefix>
  
 
== Output ==
 
== Output ==
Line 68: Line 68:
 
the following to obtain a <prefix>.ace file:
 
the following to obtain a <prefix>.ace file:
  
  `bank-report -b <prefix>.bnk CTG > <prefix>.ctg'
+
  bank-report -b <prefix>.bnk CTG > <prefix>.ctg
  `amos2ace <prefix>.afg <prefix>.ctg'
+
  amos2ace <prefix>.afg <prefix>.ctg
  
 
Where <prefix> is the same as was used in the above section and
 
Where <prefix> is the same as was used in the above section and
Line 82: Line 82:
 
pipeline and generate the default output, we would type the following:
 
pipeline and generate the default output, we would type the following:
  
  `tarchive2amos -o  target.seq'
+
  tarchive2amos -o  target.seq
  `minimus -D TGT=target.afg target'
+
  minimus -D TGT=target.afg target
  
 
This will generate the default output named `target.contig' and
 
This will generate the default output named `target.contig' and

Revision as of 22:29, 14 January 2010

minimus - The AMOS Lightweight Assembler


Brief Summary

minimus is an assembly pipeline designed specifically for small data-sets, such as the set of reads covering a specific gene. Note that the code will work for larger assemblies (we have used it to assemble bacterial genomes), however, due to its stringency, the resulting assembly will be highly fragmented. For large and/or complex assemblies the execution of Minimus should be followed by additional processing steps, such as scaffolding.

Minimus follows the Overlap-Layout-Consensus paradigm and consists of three main modules:

  • overlapper - computes the overlaps between the reads using a modified version of the Smith-Waterman local alignment algorithm
  • tigger - uses the read overlaps to generate the layouts of reads representing individual contigs
  • make-consensus - refines the layouts produced by the tigger to generate accurate multiple alignments within the reads

Dependencies

None.


Running

Either execute the minimus configuration script directly from $bindir OR copy it to your local directory, edit it, and run it with the `runAmos' command interpreter. The following variables must be set on the command line or added to the script for the pipeline to operate properly:

TGT - The target genome sequences in AMOS message format (.afg)

minimus -D TGT=<target> <prefix>

OR

runAmos -C minimus -D TGT=<target> <prefix>

Where <prefix> will be the output file prefix, and <target> is the input AMOS message file. Check the `runAmos' documentation or type `runAmos --help' for details on operating an AMOS pipeline. The minimus pipeline config file can be easily modified by the user to add additional processing steps.

In order to run minimus you need to provide an AMOS formatted file of the reads. Such a file (commonly with extension .afg) can be generated from a combination of sequence (.seq), quality (.qual), and Trace Archive XML (.xml) files using the toAmos or tarchive2amos programs which will appear in the $bindir directory upon installation.

The default TGT file is <prefix>.afg, thus if our input file is <prefix>.afg we can run minimus simply by typing:

minimus <prefix>

Output

Output will be a TIGR .contig file and a FastA .fasta file. The TIGR contig file contains the gapped consensus and multi-alignment information for the assembly. Each contig sequence is preceded by a header line which starts with '##', followed by the gapped consensus sequence with gaps represented as a '-' character. Following the consensus is the gapped read sequence preceded by a header line beginning with '#'. The .fasta file contains all the contigs produced by AMOScmp in a multi-FastA formatted file. These sequences will match the sequences in the .contig file, but without the gaps.

To obtain an ACE format representation of the assembly, we can run the following to obtain a <prefix>.ace file:

bank-report -b <prefix>.bnk CTG > <prefix>.ctg
amos2ace <prefix>.afg <prefix>.ctg

Where <prefix> is the same as was used in the above section and <prefix>.afg is the original input to the assembly pipeline. We can simply add these commands to the runAmos config file to produce an ACE file every time we run minimus.

Example

Assume we have a set of Trace Archive data with the names `target.seq', `target.qual' and `target.xml' which contain the sequence information for a small assembly task. To run the minimus pipeline and generate the default output, we would type the following:

tarchive2amos -o  target.seq
minimus -D TGT=target.afg target

This will generate the default output named `target.contig' and `target.fasta'. We could then generate an ACE assembly format file by following the instructions in the above section, substituting "target" for "<prefix>".

Minimus is now packaged with two example assemblies. The two examples are an Influenza A assembly and a Zebra Fish Gene assembly under the 'test' directory. The 'test' directory in located in the main AMOS directory after you untar the AMOS tarball.