Minimo

From AMOS WIKI
Jump to: navigation, search

Overview

Minimo is largely based on Minimus, and as such favours assembly quality to speed. Just like Minimus, Minimo follows the Overlap-Layout-Consensus paradigm.

The main advantage of Minimo over Minimus is that it takes simple FASTA files as input and generates contigs formatted in ACE and FASTA. Additional parameters can be used to tune the assembly stringency (minimum overlap length and minimum identity), or to do a strand-specific assembly. You can use Minimo on short reads, but the number of sequences should be kept reasonable!

Generally, decreasing the minimum overlap identity results in a less fragmented assembly, but likely less faithful, as sequencing errors or small varitions between closely related species (in the case of metagenomic data) might cause chimeric contigs. Similarly, decreasing the minimum overlap length might produce less fragmented, less faithful assemblies. However, increasing the minimum overlap length may sometimes also produce better assemblies by resolving the assembly of small repeated regions.

Documentation

Documentation on how to run Minimo is obtained by typing:

  Minimo -h

The usage message is:

Minimo is a de novo assembler based on the AMOS infrastructure. Minimo uses a
conservative overlap-layout-consensus algorithm to avoid mis-assemblies and
can be applied to short read or strand-specific assemblies. The input is a
FASTA file and there are options to control the stringency of the assembly
and the processing of the quality scores. By default, the results are in the
AMOS format and written to the directory where the input FASTA file is located.
Usage:
    Minimo FASTA_IN [options]
Options:
    -D QUAL_IN=<file>   Input quality score file (in Phred format)
    -D GOOD_QUAL=<n>    Quality score to set for bases within the clear
                          range if no quality file was given (default: 30)
    -D BAD_QUAL=<n>     Quality score to set for bases outside clear range
                          if no quality file was given (default: 10). If your
                          sequences are trimmed, try the same value as GOOD_QUAL.
    -D MIN_LEN=<n>      Minimum contig overlap length (at least 20 bp, 
                          default: 35)
    -D MIN_IDENT=<d>    Minimum contig overlap identity percentage (between 0
                          and 100 %, default: 98)
    -D STRAND_SPEC=<n>  Do a strand-specific assembly (e.g. for transcripts)
                          (0:no 1:yes, default: 0)
    -D ALN_WIGGLE=<d>   Alignment wiggle value (from 2 for short reads to 15 for
                          long reads, default: 2)
    -D FASTA_EXP=<n>    Export results in FASTA format (0:no 1:yes, default: 0)
    -D ACE_EXP=<n>      Export results in ACE format (0:no 1:yes, default: 0)
    -D OUT_PREFIX=< s>  Prefix to use for the output file path and name

Basic usage

To run Minimo will you need a set of sequence files. Assuming you have a set of reads in fasta format called my_reads.fa, you can run Minimo with the following commands:

 Minimo my_reads.fa

To export the contigs in a FASTA file or in ACE format (i.e. for downstream processing), use the FASTA_EXP and ACE_EXP options:

 Minimo my_reads.fa -D FASTA_EXP=1 -D ACE_EXP=1

If you need to use a specific overlap length or identity between reads of a contig, try:

 Minimo my_reads.fa -D MIN_LEN=80 -D MIN_IDENT=90

For the assembly of transcripts or other directional sequence datasets, try a strand-specific assembly:

 Minimo my_reads.fa -D STRAND_SPEC=1

Publication

Next generation sequence assembly with AMOS

Treangen TJ, Sommer DD, Angly FE, Koren S, Pop M. (2011) Curr Protoc Bioinformatics, Chapter 11:Unit 11.8, doi:10.1002/0471250953.bi1108s33