Figaro User Manual
The following is a quick how-to for using Figaro. If you have any questions/comments/ suggestions, please email: amos-help [at] lists.sourceforge.net.
1. Untar/unzip the Figaro-X.X.tar.gz file using:
tar -xvzf Figaro-X.X.tar.gz
2. cd into Figaro-X.X/
3. If you plan to use Lucy with Figaro, edit the Makefile to include the correct path to Lucy, e.g LUCY = /fs/sz-user-supported/bin/lucy
4. Install by executing:
5. Now you may run figaro.
figaro -F <fasta file of reads> -P <prefix for output> [options]
- -F – fasta file of reads (required). Do not quality-trim the fasta file of reads beforehand.
- -P – prefix for output files (required) All output files will contain this prefix and will be created in the current directory.
- -T – threshold value (optional) This value is used to limit the number of kmers associated with putative vector sequence. Whatever T is set to, the program will allow up to (Tx100) kmers to be considered as putative vector sequence. If T is not provided, then Figaro will use a predetermined threshold depending on the size of the read set. The recommended range of T is [1 - 60].
- -M – maximum cut length (default 100) This is the maximum vector clip length allowed for a read. A safe bet is to set M = 100 or 200.
- -E – end of safe zone (default 500) This represents the end of the considered read sequence. Try to make E as large as possible without going into too much poor quality sequence on the 3’ ends of reads. The safe zone is E-M bases long.
- -V – verbose output, t or f (default f) Will provide a number of additional output files for testing purposes.
You only really care about two: the .vectorcuts file, and the .summary file. The .vectorcuts file are the final 5’ vector trimpoints. The .summary is self-explanatory.
If you’re interested in trying to improve the vector trimpoints, use verbose output and observe the .vectorclrs file which is a kmer frequency chart of each read up until the safe zone. One can get an idea if they’re overtrimming or undertrimming the reads by viewing the frequency counts and the final cut point chosen for each read.
If the frequency tables contain many non-zero low frequency numbers, it’s likely that many reads are being overtrimmed. A “-1” value indicates an endmer. The algorithm is searching for a high level of frequency followed directly by an endmer. This is usually visible by eye, but difficult for the algorithm when too many vectormers are declared. If it looks like you’re overtrimming a significant number of reads, then decrease the T parameter, and try again.
Usage with Lucy
Users are encouraged to use Figaro with a quality-trimming program such as Lucy. The following scripts are helpful. Note that the Makefile must include the correct path to Lucy prior to step 3 of installation.
figaro_lucy -o <prefix> fasta1 ... fastan
This script runs Lucy and Figaro on a set of reads and quality values and outputs a set of clear ranges for the reads which includes vector trimming and quality trimming.
The program assumes that for each file called <file>.seq there is a <file>.qual and a <file>.xml. (alternatively the files may be called fasta.<file>, qual.<file> and xml.<file>). The xml file is not required, but is helpful. The output is a clear range file. <prefix>.clr
Be sure Lucy is installed.
<prefix> - prefix for the output files fasta1 ... fastan - list of files to be converted.
figaro_trim_seqs -o <prefix> -c <trim pts> -f <read sequences> -q <quality values>
This script physically trims a set of reads and quality values, outputting a set of clean reads and quality values. Trim points are provided as input in the following format:
<readname> <5' trim pt> <3' trim pt>
Trim points begin at 1, not 0, (i.e. 2 means the first good base is the 2nd bp.)
3' trim points are not required, but if used, must be present for all reads. Quality values are not required, but if used, must be present for all reads. Reads without trim points are ignored.
- -o <output prefix>
- -c <trim pts>
- -f <fasta sequences>
- -q <quality values in fasta format>
tarchive2_figaro_lucy_2amos -o <prefix> fasta1 ... fastan
This program takes sequence data from the NCBI Trace Archive, and produces a set of clear ranges for reads which includes quality and vector trims with Lucy and Figaro, respectively. An afg file is output as well which includes the new clear ranges.
<prefix> - prefix for the output files fasta1 ... fastan - list of files to be converted. The program assumes that for each file called <file>.seq there is a <file>.qual and a <file>.xml. (alternatively the files may be called fasta.<file>, qual.<file> and xml.<file>). The output includes a clear range file, and an afg file with the new clear ranges. This script will die without the xml file.