ToAmos

From AMOS WIKI
Jump to: navigation, search

toAmos: converter from various types of inputs to AMOS messages


Overview

toAmos is primarily designed for converting the output of an assembly program into the AMOS format so that it can be stored in an AMOS bank. toAmos can be used as a replacement for tarchive2amos however the latter is more flexible when converting from Trace Archive or simple .seq and .qual inputs.


Synopsis

toAmos -o out_file 
       (-s fasta_reads (-q qual_file) (-gq good_qual) (-bq bad_qual))
       (-c tigr_contig | -a celera_asm [-S][-utg] | -ta tigr_asm | -ace phrap_ace [-phd])
       (-m bambus_mates | -x trace_xml | -f celera_frg [-acc])
       (-arachne arachne_links | -scaff bambus_scaff)
       (-i insert_file | -map dst_map)
       (-pos pos_file)
       (-id min_id)

toAmos reads the inputs specified on the command line and converts the information into AMOS message format. The following types of information can be provided to toAmos:

  • Sequence and quality data (options -f, -s, -q, -gq, or -bq)
  • Library and mate-pair data (options -m, -x, -f, -i, or -map)
  • Contig data (options -c, -a, -ta, or -ace)
  • Scaffold data (option -a)

Options

-o <out_file> output filename ('-' for standard output)
-s <fasta_reads> sequence data file in FASTA format (reads names ending in .1 or /1 are taken as mate pairs)
-q <qual_file> sequence quality score file in QUAL format
-gq <bad_qual> minimum quality score for high-quality bases (default: 30) - if no quality file provided bases within clear range are assigned this quality value
-bq <good_qual> maximum quality score for low-quality bases (default: 10) - if no quality file provided bases outside the clear range are assigned this quality value (default 10)
-c <tigr_contig> provide TIGR .contig file in GDE-like format
-a <celera_asm> use Celera Assembler .asm contig file (contig and scaffold information)
-S include the surrogate unitigs in the .asm file as AMOS contigs
-utg include all UTG unitig messages in the .asm file as AMOS contigs
-ta <tigr_asm> contig file in TIGR Assembler format (.tasm)
-ace <phrap_ace> contig file in Phred ACE format (can be accompanied by -q)
-phd read the content of PHD file referenced in ACE files
-m <bambus_mates> library and mate-pair information file in Bambus format
-x <trace_xml> ancilliary data file (library, mate-pair, clear range) in Trace Archive XML format
-f <celera_frg> library, mate-pair, sequence, quality, and clear range data file in Celera Assembler format
-acc use accession numbers in FRG files
-arachne <arachne_links> scaffold file in Arachne .links format
-scaff <bambus_scaff> scaffold file in Bambus .scaff format
-map <dst_map> read map information - mapping from internal library ID to external library ID useful in conjunction with the -f option. This file consists of space-separated records providing a mapping from the "acc:" field in "DST" records within the .frg file to an externally recognizable name for each library.
-pos <pos_file> TIGR-style .pos position file
-id <min_id> start numbering contigs at this number

TIGR specific options (not too useful outside TIGR)

  • -i <insert file> - use mapping from internal library ID to external library ID provided in a .insert file produced by pullfrag.


Known issues

The -ta (TIGR Assembler input) option has not been thoroughly tested and likely does not properly work. Contact us if either of these options is important to you.

Errors

n/a