Difference between revisions of "Minimo"

From AMOS WIKI
Jump to: navigation, search
(Overview)
(Documentation)
Line 15: Line 15:
 
The usage message is:
 
The usage message is:
  
  Usage:
+
Minimo is a de novo assembler based on the AMOS infrastructure. Minimo uses a
 +
conservative overlap-layout-consensus algorithm to avoid mis-assemblies and
 +
can be applied to short read or strand-specific assemblies. The input is a
 +
FASTA file and there are options to control the stringency of the assembly
 +
and the processing of the quality scores. By default, the results are in the
 +
AMOS format and written to the directory where the input FASTA file is located.
 +
Usage:
 
     Minimo FASTA_IN [options]
 
     Minimo FASTA_IN [options]
  Options:
+
Options:
     -D QUAL_IN=<file>  Input quality score file
+
     -D QUAL_IN=<file>  Input quality score file (in Phred format)
 
     -D GOOD_QUAL=<n>    Quality score to set for bases within the clear
 
     -D GOOD_QUAL=<n>    Quality score to set for bases within the clear
                        range if no quality file was given (default: 30)
+
                          range if no quality file was given (default: 30)
 
     -D BAD_QUAL=<n>    Quality score to set for bases outside clear range
 
     -D BAD_QUAL=<n>    Quality score to set for bases outside clear range
                        if no quality file was given (default: 10). If your
+
                          if no quality file was given (default: 10). If your
                        sequences are trimmed, try the same value as GOOD_QUAL.
+
                          sequences are trimmed, try the same value as GOOD_QUAL.
     -D MIN_LEN=<n>      Minimum contig overlap length (at least 20 bp,
+
     -D MIN_LEN=<n>      Minimum contig overlap length (at least 20 bp,  
                        default: 35)
+
                          default: 35)
 
     -D MIN_IDENT=<d>    Minimum contig overlap identity percentage (between 0
 
     -D MIN_IDENT=<d>    Minimum contig overlap identity percentage (between 0
                        and 100 %, default: 98)
+
                          and 100 %, default: 98)
 +
    -D STRAND_SPEC=<n>  Do a strand-specific assembly (e.g. for transcripts)
 +
                          (0:no 1:yes, default: 0)
 
     -D ALN_WIGGLE=<d>  Alignment wiggle value (from 2 for short reads to 15 for
 
     -D ALN_WIGGLE=<d>  Alignment wiggle value (from 2 for short reads to 15 for
                        long reads, default: 2)
+
                          long reads, default: 2)
 
     -D FASTA_EXP=<n>    Export results in FASTA format (0:no 1:yes, default: 0)
 
     -D FASTA_EXP=<n>    Export results in FASTA format (0:no 1:yes, default: 0)
 
     -D ACE_EXP=<n>      Export results in ACE format (0:no 1:yes, default: 0)
 
     -D ACE_EXP=<n>      Export results in ACE format (0:no 1:yes, default: 0)
     -D OUT_PREFIX=< s>   Prefix to use for the output file path and name
+
     -D OUT_PREFIX=< s> Prefix to use for the output file path and name
  
 
== Basic usage  ==
 
== Basic usage  ==

Revision as of 03:22, 19 October 2011

Overview

Minimo is largely based on Minimus, and as such favours assembly quality to speed. Use on moderately-sized data! Minimo follows the Overlap-Layout-Consensus paradigm just like Minimus.

The main advantage of Minimo over Minimus is that it takes simple FASTA files as input and generates contigs formatted in ACE and FASTA. In addition two parameters can be used to tune the assembly stringency (minimum overlap length and minimum identity).

Generally, decreasing the minimum overlap identity results in a less fragmented assembly, but likely less faithful, as sequencing errors or small varitions between closely related species (in the case of metagenomic data) might cause chimeric contigs. Similarly, decreasing the minimum overlap length might produce less fragmented, less faithful assemblies. However, increasing the minimum overlap length may sometimes also produce better assemblies by resolving the assembly of small repeated regions.

Documentation

Documentation on how to run Minimo is obtained by typing:

  Minimo -h

The usage message is:

Minimo is a de novo assembler based on the AMOS infrastructure. Minimo uses a
conservative overlap-layout-consensus algorithm to avoid mis-assemblies and
can be applied to short read or strand-specific assemblies. The input is a
FASTA file and there are options to control the stringency of the assembly
and the processing of the quality scores. By default, the results are in the
AMOS format and written to the directory where the input FASTA file is located.
Usage:
    Minimo FASTA_IN [options]
Options:
    -D QUAL_IN=<file>   Input quality score file (in Phred format)
    -D GOOD_QUAL=<n>    Quality score to set for bases within the clear
                          range if no quality file was given (default: 30)
    -D BAD_QUAL=<n>     Quality score to set for bases outside clear range
                          if no quality file was given (default: 10). If your
                          sequences are trimmed, try the same value as GOOD_QUAL.
    -D MIN_LEN=<n>      Minimum contig overlap length (at least 20 bp, 
                          default: 35)
    -D MIN_IDENT=<d>    Minimum contig overlap identity percentage (between 0
                          and 100 %, default: 98)
    -D STRAND_SPEC=<n>  Do a strand-specific assembly (e.g. for transcripts)
                          (0:no 1:yes, default: 0)
    -D ALN_WIGGLE=<d>   Alignment wiggle value (from 2 for short reads to 15 for
                          long reads, default: 2)
    -D FASTA_EXP=<n>    Export results in FASTA format (0:no 1:yes, default: 0)
    -D ACE_EXP=<n>      Export results in ACE format (0:no 1:yes, default: 0)
    -D OUT_PREFIX=< s>  Prefix to use for the output file path and name

Basic usage

To run Minimo will you need a set of sequence files. Assuming you have a set of reads in fasta format called my_reads.fa, you can run minimus with the following two commands:

 Minimo my_reads.fa

To export the contigs in a FASTA file or in ACE format (i.e. for downstream processing), use the FASTA_EXP and ACE_EXP options:

 Minimo my_reads.fa -D FASTA_EXP=1 -D ACE_EXP=1

If you need to use a specific overlap length or identity between reads of a contig, try:

 Minimo my_reads.fa -D MIN_LEN=80 -D MIN_IDENT=90