Figaro User Manual

From AMOS WIKI
Revision as of 20:51, 12 July 2009 by Mcschatz (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The following is a quick how-to for using Figaro. If you have any questions/comments/ suggestions, please email: amos-help [at] lists.sourceforge.net.


Installation

1. Untar/unzip the Figaro-X.X.tar.gz file using:

   tar -xvzf Figaro-X.X.tar.gz

2. cd into Figaro-X.X/

3. If you plan to use Lucy with Figaro, edit the Makefile to include the correct path to Lucy, e.g LUCY = /fs/sz-user-supported/bin/lucy

4. Install by executing:

   make install

5. Now you may run figaro.


Usage

figaro -F <fasta file of reads> -P <prefix for output> [options]

Basic parameters:

  • -F – fasta file of reads (required). Do not quality-trim the fasta file of reads beforehand.
  • -P – prefix for output files (required) All output files will contain this prefix and will be created in the current directory.
  • -T – threshold value (optional) This value is used to limit the number of kmers associated with putative vector sequence. Whatever T is set to, the program will allow up to (Tx100) kmers to be considered as putative vector sequence. If T is not provided, then Figaro will use a predetermined threshold depending on the size of the read set. The recommended range of T is [1 - 60].
  • -M – maximum cut length (default 100) This is the maximum vector clip length allowed for a read. A safe bet is to set M = 100 or 200.
  • -E – end of safe zone (default 500) This represents the end of the considered read sequence. Try to make E as large as possible without going into too much poor quality sequence on the 3’ ends of reads. The safe zone is E-M bases long.
  • -V – verbose output, t or f (default f) Will provide a number of additional output files for testing purposes.


FigaroClearRange.png


Output files

You only really care about two: the .vectorcuts file, and the .summary file. The .vectorcuts file are the final 5’ vector trimpoints. The .summary is self-explanatory.

If you’re interested in trying to improve the vector trimpoints, use verbose output and observe the .vectorclrs file which is a kmer frequency chart of each read up until the safe zone. One can get an idea if they’re overtrimming or undertrimming the reads by viewing the frequency counts and the final cut point chosen for each read.

If the frequency tables contain many non-zero low frequency numbers, it’s likely that many reads are being overtrimmed. A “-1” value indicates an endmer. The algorithm is searching for a high level of frequency followed directly by an endmer. This is usually visible by eye, but difficult for the algorithm when too many vectormers are declared. If it looks like you’re overtrimming a significant number of reads, then decrease the T parameter, and try again.


Usage with Lucy

Users are encouraged to use Figaro with a quality-trimming program such as Lucy. The following scripts are helpful. Note that the Makefile must include the correct path to Lucy prior to step 3 of installation.


figaro_lucy

figaro_lucy -o <prefix> fasta1 ... fastan

DESCRIPTION

This script runs Lucy and Figaro on a set of reads and quality values and outputs a set of clear ranges for the reads which includes vector trimming and quality trimming.

The program assumes that for each file called <file>.seq there is a <file>.qual and a <file>.xml. (alternatively the files may be called fasta.<file>, qual.<file> and xml.<file>). The xml file is not required, but is helpful. The output is a clear range file. <prefix>.clr

Be sure Lucy is installed.

OPTIONS

  <prefix> - prefix for the output files
  fasta1 ... fastan - list of files to be converted.



figaro_trim_seqs

 figaro_trim_seqs -o <prefix> -c <trim pts> -f <read sequences> -q <quality values>

DESCRIPTION

This script physically trims a set of reads and quality values, outputting a set of clean reads and quality values. Trim points are provided as input in the following format:

<readname>  <5' trim pt>  <3' trim pt>

Trim points begin at 1, not 0, (i.e. 2 means the first good base is the 2nd bp.)

3' trim points are not required, but if used, must be present for all reads. Quality values are not required, but if used, must be present for all reads. Reads without trim points are ignored.

OPTIONS

  • -o <output prefix>
  • -c <trim pts>
  • -f <fasta sequences>
  • -q <quality values in fasta format>


tarchive2_figaro_lucy_2amos

tarchive2_figaro_lucy_2amos -o <prefix> fasta1 ... fastan

DESCRIPTION

This program takes sequence data from the NCBI Trace Archive, and produces a set of clear ranges for reads which includes quality and vector trims with Lucy and Figaro, respectively. An afg file is output as well which includes the new clear ranges.

OPTIONS

  <prefix> - prefix for the output files
  fasta1 ... fastan - list of files to be converted.
          The program assumes that for each file called <file>.seq there
          is a <file>.qual and a <file>.xml. (alternatively the files may
          be called fasta.<file>, qual.<file> and xml.<file>).  The output
          includes a clear range file, and an afg file with the new clear
          ranges. This script will die without the xml file.