Difference between revisions of "ToAmos"

From AMOS WIKI
Jump to: navigation, search
(Options: link for gde-like format contig file)
(Errors)
Line 87: Line 87:
 
  Cannot find ID for sequence lid05.f
 
  Cannot find ID for sequence lid05.f
  
This problem is caused by...
+
This problem is caused by forgetting to pass a fasta file (with -s) for the read sequences in the contig file. This is a bit weird as the reads are already in the .contig file.

Revision as of 16:10, 28 February 2011

toAmos: converter from various types of inputs to AMOS messages


Overview

toAmos is primarily designed for converting the output of an assembly program into the AMOS format so that it can be stored in an AMOS bank. toAmos can be used as a replacement for tarchive2amos however the latter is more flexible when converting from Trace Archive or simple .seq and .qual inputs.


Synopsis

toAmos -o out_file 
       (-s fasta_reads (-q qual_file) (-gq good_qual) (-bq bad_qual))
       (-c tigr_contig | -a celera_asm [-S][-utg] | -ta tigr_asm | -ace phrap_ace [-phd])
       (-m bambus_mates | -x trace_xml | -f celera_frg [-acc])
       (-arachne arachne_links | -scaff bambus_scaff)
       (-i insert_file | -map dst_map)
       (-pos pos_file)
       (-id min_id)

toAmos reads the inputs specified on the command line and converts the information into AMOS message format. The following types of information can be provided to toAmos:

  • Sequence and quality data (options -f, -s, -q, -gq, or -bq)
  • Library and mate-pair data (options -m, -x, -f, -i, or -map)
  • Contig data (options -c, -a, -ta, or -ace)
  • Scaffold data (option -a)

Options

-o <out_file> output filename ('-' for standard output)
-s <fasta_reads> sequence data file in FASTA format (reads names ending in .1 or /1 are taken as mate pairs)
-q <qual_file> sequence quality score file in QUAL format
-gq <bad_qual> minimum quality score for high-quality bases (default: 30) - if no quality file provided bases within clear range are assigned this quality value
-bq <good_qual> maximum quality score for low-quality bases (default: 10) - if no quality file provided bases outside the clear range are assigned this quality value (default 10)
-c <tigr_contig> provide TIGR .contig file in GDE-like format
-a <celera_asm> use Celera Assembler .asm contig file (contig and scaffold information)
-S include the surrogate unitigs in the .asm file as AMOS contigs
-utg include all UTG unitig messages in the .asm file as AMOS contigs
-ta <tigr_asm> contig file in TIGR Assembler format (.tasm)
-ace <phrap_ace> contig file in Phred ACE format (can be accompanied by -q)
-phd read the content of PHD file referenced in ACE files
-m <bambus_mates> library and mate-pair information file in Bambus format
-x <trace_xml> ancilliary data file (library, mate-pair, clear range) in Trace Archive XML format
-f <celera_frg> library, mate-pair, sequence, quality, and clear range data file in Celera Assembler format
-acc use accession numbers in FRG files
-arachne <arachne_links> scaffold file in Arachne .links format
-scaff <bambus_scaff> scaffold file in Bambus .scaff format
-map <dst_map> read map information - mapping from internal library ID to external library ID useful in conjunction with the -f option. This file consists of space-separated records providing a mapping from the "acc:" field in "DST" records within the .frg file to an externally recognizable name for each library.
-pos <pos_file> TIGR-style .pos position file
-id <min_id> start numbering contigs at this number

TIGR specific options (not too useful outside TIGR)

  • -i <insert file> - use mapping from internal library ID to external library ID provided in a .insert file produced by pullfrag.


Known issues

The -ta (TIGR Assembler input) and -ace (ACE formatted input) options have not been throughly tested and likely do not properly work. Contact us if either of these options is important to you.


Errors

toAmos -c my.test.contig -m my.test.mates -o my.test.afg
Cannot find ID for sequence lid05.f

This problem is caused by forgetting to pass a fasta file (with -s) for the read sequences in the contig file. This is a bit weird as the reads are already in the .contig file.