Difference between revisions of "FRCurve"

From AMOS WIKI
Jump to: navigation, search
Line 17: Line 17:
 
* 2. invocation to the FRC module
 
* 2. invocation to the FRC module
 
The name of the pipeline in the AMOS distribution is "FRCurve_pipeline".
 
The name of the pipeline in the AMOS distribution is "FRCurve_pipeline".
 +
 +
Documentation on how to run FRCurve_pipeline is obtained by typing:
 +
 +
  FRCurve_pipeline -h
 +
 +
The usage message is:
 +
 +
  Usage:
 +
    Minimo FASTA_IN [options]
 +
  Options:
 +
    -D QUAL_IN=<file>  Input quality score file
 +
    -D GOOD_QUAL=<n>    Quality score to set for bases within the clear
 +
                        range if no quality file was given (default: 30)
 +
    -D BAD_QUAL=<n>    Quality score to set for bases outside clear range
 +
                        if no quality file was given (default: 10). If your
 +
                        sequences are trimmed, try the same value as GOOD_QUAL.
 +
    -D MIN_LEN=<n>      Minimum contig overlap length (at least 20 bp,
 +
                        default: 35)
 +
    -D MIN_IDENT=<d>    Minimum contig overlap identity percentage (between 0
 +
                        and 100 %, default: 98)
 +
    -D ALN_WIGGLE=<d>  Alignment wiggle value (from 2 for short reads to 15 for
 +
                        long reads, default: 2)
 +
    -D FASTA_EXP=<n>    Export results in FASTA format (0:no 1:yes, default: 0)
 +
    -D ACE_EXP=<n>      Export results in ACE format (0:no 1:yes, default: 0)
 +
    -D OUT_PREFIX=< s>  Prefix to use for the output file path and name
  
 
== References ==
 
== References ==

Revision as of 02:21, 15 February 2011

FRCurve: Feature-Response Curve

Overview

Inspired by the standard receiver operating characteristic (ROC) curve, the Feature-Response curve characterizes the sensitivity (coverage) of the sequence assembler as a function of its discrimination threshold (number of features).

The AMOS package provides an automated assembly validation pipeline called amosvalidate that analyzes the output of an assembler using a variety of assembly quality metrics (or features). Examples of features include: (M) mate-pair orientations and separations, (K) repeat content by k-mer analysis, (C) depth-of-coverage, (P) correlated polymorphism in the read alignments, and (B) read alignment breakpoints to identify structurally suspicious regions of the assembly. After running amosvalidate on the output of the assembler, each contig is assigned a number of features that correspond to doubtful regions of the sequence.

Given any such set of features, the response (quality) of the assembler output is then analyzed as a function of the maximum number of possible errors (features) allowed in the contigs. More specifically, for a fixed feature threshold <math>\phi</math>, the contigs are sorted by size and, starting from the longest, only those contigs are tallied, if their sum of features is <math>\leq \phi</math>. For this set of contigs, the corresponding genome coverage is computed, leading to a single point of the Feature-Response curve.

Documentation

Following the AMOS philosophy, the FRCurve is implemented as a pipeline that consists of two steps:

  • 1. invocation to the amosvalidate tool to compute the features for the set of contigs;
  • 2. invocation to the FRC module

The name of the pipeline in the AMOS distribution is "FRCurve_pipeline".

Documentation on how to run FRCurve_pipeline is obtained by typing:

  FRCurve_pipeline -h

The usage message is:

 Usage:
    Minimo FASTA_IN [options]
 Options:
    -D QUAL_IN=<file>   Input quality score file
    -D GOOD_QUAL=<n>    Quality score to set for bases within the clear
                        range if no quality file was given (default: 30)
    -D BAD_QUAL=<n>     Quality score to set for bases outside clear range
                        if no quality file was given (default: 10). If your
                        sequences are trimmed, try the same value as GOOD_QUAL.
    -D MIN_LEN=<n>      Minimum contig overlap length (at least 20 bp,
                        default: 35)
    -D MIN_IDENT=<d>    Minimum contig overlap identity percentage (between 0
                        and 100 %, default: 98)
    -D ALN_WIGGLE=<d>   Alignment wiggle value (from 2 for short reads to 15 for
                        long reads, default: 2)
    -D FASTA_EXP=<n>    Export results in FASTA format (0:no 1:yes, default: 0)
    -D ACE_EXP=<n>      Export results in ACE format (0:no 1:yes, default: 0)
    -D OUT_PREFIX=< s>   Prefix to use for the output file path and name

References

Acknowledgements

Research reported here was supported by grants from NSF CDI program and Abraxis BioScience, LLC.