Difference between revisions of "Minimus"
m (→Basic usage) |
(→Publication) |
||
(3 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
== Overview == | == Overview == | ||
− | Minimus is | + | Minimus is one of several assembly pipelines included in the AMOS software package. It is designed specifically for small data-sets, such as the set of reads covering a specific gene. Note that the code will work for larger assemblies (we have used it to assemble bacterial genomes), however, due to its stringency, the resulting assembly will be highly fragmented. For large and/or complex assemblies the execution of Minimus should be followed by additional processing steps, such as scaffolding. |
Minimus follows the Overlap-Layout-Consensus paradigm and consists of three main modules which share information through a central file bank: | Minimus follows the Overlap-Layout-Consensus paradigm and consists of three main modules which share information through a central file bank: | ||
Line 30: | Line 30: | ||
To run minimus will you need a set of sequence files. Assuming you have a set of reads in fasta format called '''my_reads.seq''', you can run minimus with the following two commands: | To run minimus will you need a set of sequence files. Assuming you have a set of reads in fasta format called '''my_reads.seq''', you can run minimus with the following two commands: | ||
− | + | toAmos -s my_reads.seq -o my_reads.afg | |
− | + | minimus my_reads | |
The output will be a fasta formatted file called '"my_reads.fasta"', a contig file with details about the assembly of each contig called '"my_reads.contig"', and an AMOS bank folder with various files used internally by minimus. | The output will be a fasta formatted file called '"my_reads.fasta"', a contig file with details about the assembly of each contig called '"my_reads.contig"', and an AMOS bank folder with various files used internally by minimus. | ||
The toAmos file conversion utility is the most general and probably the most useful of the file conversion utilities included with minimus. More information about toAmos and the [[File_conversion_utilities | other file conversion utilities]] can be found in the [[AMOS | AMOS documentation wiki]]. For example, you can include quality data from a Phred style quality score file by running [[ToAmos | toAmos]] with the -q option as follows: | The toAmos file conversion utility is the most general and probably the most useful of the file conversion utilities included with minimus. More information about toAmos and the [[File_conversion_utilities | other file conversion utilities]] can be found in the [[AMOS | AMOS documentation wiki]]. For example, you can include quality data from a Phred style quality score file by running [[ToAmos | toAmos]] with the -q option as follows: | ||
− | + | toAmos -s my_reads.fasta -q my_reads.qual -o asm_reads.afg | |
Minimus can also be called with the following equivalent command: | Minimus can also be called with the following equivalent command: | ||
− | + | runAmos -C $AMOSBASE/src/Pipeline/minimus.acf asm_reads | |
The AMOS package also includes other helpful tools such as [[Hawkeye]], which is useful for evaluating your assembly with respect to paired-end reads. It can be run on the minimus bank with the following command: | The AMOS package also includes other helpful tools such as [[Hawkeye]], which is useful for evaluating your assembly with respect to paired-end reads. It can be run on the minimus bank with the following command: | ||
− | + | hawkeye asm_reads.bnk/ | |
== Publication == | == Publication == | ||
Line 51: | Line 51: | ||
[http://www.biomedcentral.com/1471-2105/8/64 Minimus: a fast, lightweight genome assembler] | [http://www.biomedcentral.com/1471-2105/8/64 Minimus: a fast, lightweight genome assembler] | ||
− | Sommer, DD, Delcher, AL, Salzberg, SL, and Pop, M. (2007) BMC Bioinformatics, 8: | + | Sommer, DD, Delcher, AL, Salzberg, SL, and Pop, M. (2007) BMC Bioinformatics, 8:64, doi:10.1186/1471-2105-8-64. |
− | + | ||
− | + | ||
− | + | ||
== Acknowledgements == | == Acknowledgements == | ||
The development of minimus was supported by the National Institutes of Health under grants R01-LM06845 and R01-LM007938 to SLS and by Department of Homeland Security cooperative agreement W81XWH-05-2-0051. | The development of minimus was supported by the National Institutes of Health under grants R01-LM06845 and R01-LM007938 to SLS and by Department of Homeland Security cooperative agreement W81XWH-05-2-0051. |
Latest revision as of 02:23, 12 November 2011
Overview
Minimus is one of several assembly pipelines included in the AMOS software package. It is designed specifically for small data-sets, such as the set of reads covering a specific gene. Note that the code will work for larger assemblies (we have used it to assemble bacterial genomes), however, due to its stringency, the resulting assembly will be highly fragmented. For large and/or complex assemblies the execution of Minimus should be followed by additional processing steps, such as scaffolding.
Minimus follows the Overlap-Layout-Consensus paradigm and consists of three main modules which share information through a central file bank:
- hash-overlap - Computes the overlaps between the reads using a modified version of the Smith-Waterman local alignment algorithm
- tigger - Uses the read overlaps to generate the layouts of reads representing individual contigs
- make-consensus - Refines the layouts produced by the tigger to generate accurate multiple alignments within the reads
Minimus uses AMOS message files as both the inputs and the outputs. Please see the File conversion utilities documentation for more information.
Minimus2 is a modified version of the minimus pipeline designed for merging two sequence sets. Instead of hash-overlap it uses a nucmer based overlap detector which is much faster.
Documentation
Documentation on running minimus is included with the distribution in the /docs subdirectory.
See Minimus/README.
Examples
Examples of a flu assembly and a Zebrafish gene can be found in the test/minimus directory created when the AMOS distribution is untarred. Documentation on the examples is included with the distribution in /docs/minimus.README.
Basic usage
To run minimus will you need a set of sequence files. Assuming you have a set of reads in fasta format called my_reads.seq, you can run minimus with the following two commands:
toAmos -s my_reads.seq -o my_reads.afg minimus my_reads
The output will be a fasta formatted file called '"my_reads.fasta"', a contig file with details about the assembly of each contig called '"my_reads.contig"', and an AMOS bank folder with various files used internally by minimus. The toAmos file conversion utility is the most general and probably the most useful of the file conversion utilities included with minimus. More information about toAmos and the other file conversion utilities can be found in the AMOS documentation wiki. For example, you can include quality data from a Phred style quality score file by running toAmos with the -q option as follows:
toAmos -s my_reads.fasta -q my_reads.qual -o asm_reads.afg
Minimus can also be called with the following equivalent command:
runAmos -C $AMOSBASE/src/Pipeline/minimus.acf asm_reads
The AMOS package also includes other helpful tools such as Hawkeye, which is useful for evaluating your assembly with respect to paired-end reads. It can be run on the minimus bank with the following command:
hawkeye asm_reads.bnk/
Publication
Minimus: a fast, lightweight genome assembler
Sommer, DD, Delcher, AL, Salzberg, SL, and Pop, M. (2007) BMC Bioinformatics, 8:64, doi:10.1186/1471-2105-8-64.
Acknowledgements
The development of minimus was supported by the National Institutes of Health under grants R01-LM06845 and R01-LM007938 to SLS and by Department of Homeland Security cooperative agreement W81XWH-05-2-0051.