Difference between revisions of "Minimo"
|  (→Documentation) |  (→Basic usage) | ||
| Line 55: | Line 55: | ||
|    Minimo my_reads.fa -D MIN_LEN=80 -D MIN_IDENT=90 |    Minimo my_reads.fa -D MIN_LEN=80 -D MIN_IDENT=90 | ||
| + | |||
| + | For the assembly of transcripts or other directional sequence datasets, try: | ||
| + | |||
| + |   Minimo my_reads.fa -D STRAND_SPEC=1 | ||
Revision as of 03:23, 19 October 2011
Overview
Minimo is largely based on Minimus, and as such favours assembly quality to speed. Use on moderately-sized data! Minimo follows the Overlap-Layout-Consensus paradigm just like Minimus.
The main advantage of Minimo over Minimus is that it takes simple FASTA files as input and generates contigs formatted in ACE and FASTA. In addition two parameters can be used to tune the assembly stringency (minimum overlap length and minimum identity).
Generally, decreasing the minimum overlap identity results in a less fragmented assembly, but likely less faithful, as sequencing errors or small varitions between closely related species (in the case of metagenomic data) might cause chimeric contigs. Similarly, decreasing the minimum overlap length might produce less fragmented, less faithful assemblies. However, increasing the minimum overlap length may sometimes also produce better assemblies by resolving the assembly of small repeated regions.
Documentation
Documentation on how to run Minimo is obtained by typing:
Minimo -h
The usage message is:
Minimo is a de novo assembler based on the AMOS infrastructure. Minimo uses a
conservative overlap-layout-consensus algorithm to avoid mis-assemblies and
can be applied to short read or strand-specific assemblies. The input is a
FASTA file and there are options to control the stringency of the assembly
and the processing of the quality scores. By default, the results are in the
AMOS format and written to the directory where the input FASTA file is located.
Usage:
    Minimo FASTA_IN [options]
Options:
    -D QUAL_IN=<file>   Input quality score file (in Phred format)
    -D GOOD_QUAL=<n>    Quality score to set for bases within the clear
                          range if no quality file was given (default: 30)
    -D BAD_QUAL=<n>     Quality score to set for bases outside clear range
                          if no quality file was given (default: 10). If your
                          sequences are trimmed, try the same value as GOOD_QUAL.
    -D MIN_LEN=<n>      Minimum contig overlap length (at least 20 bp, 
                          default: 35)
    -D MIN_IDENT=<d>    Minimum contig overlap identity percentage (between 0
                          and 100 %, default: 98)
    -D STRAND_SPEC=<n>  Do a strand-specific assembly (e.g. for transcripts)
                          (0:no 1:yes, default: 0)
    -D ALN_WIGGLE=<d>   Alignment wiggle value (from 2 for short reads to 15 for
                          long reads, default: 2)
    -D FASTA_EXP=<n>    Export results in FASTA format (0:no 1:yes, default: 0)
    -D ACE_EXP=<n>      Export results in ACE format (0:no 1:yes, default: 0)
    -D OUT_PREFIX=< s>  Prefix to use for the output file path and name
Basic usage
To run Minimo will you need a set of sequence files. Assuming you have a set of reads in fasta format called my_reads.fa, you can run minimus with the following two commands:
Minimo my_reads.fa
To export the contigs in a FASTA file or in ACE format (i.e. for downstream processing), use the FASTA_EXP and ACE_EXP options:
Minimo my_reads.fa -D FASTA_EXP=1 -D ACE_EXP=1
If you need to use a specific overlap length or identity between reads of a contig, try:
Minimo my_reads.fa -D MIN_LEN=80 -D MIN_IDENT=90
For the assembly of transcripts or other directional sequence datasets, try:
Minimo my_reads.fa -D STRAND_SPEC=1

