toAmos: converter from various types of inputs to AMOS messages

Overview

toAmos is primarily designed for converting the output of an assembly program into the AMOS format so that it can be stored in an AMOS bank.  toAmos can be used as a replacement for tarchive2amos however the latter is more flexible when converting from Trace Archive or simple .seq and .qual inputs.

System requirements

toAmos is written in Perl. It requires Perl 5.6.0 or newer and was tested on several Unix systems including Linux RedHat 7.3, OSF5.1, Sun Solaris, and Linux SuSE 9.1 and should run on most systems UNIX systems.

Obtaining amos2ace

toAmos can be downloaded as a part of the AMOS package.

This software is OSI Certified Open Source Software.

 

Documentation

Synopsis

toAmos (-m mates|-x traceinfo.xml|-f frg)
       (-c contig|-a asm|-ta tasm|-ace ace|-s fasta|-q qual)
        -o outfile
       [-i insertfile | -map dstmap]
       [-gq goodqual] [-bq badqual]

toAmos reads the inputs specified on the command line and converts the information into AMOS message format.  The following types of information can be provided to toAmos:

  • Sequence and quality data (options -f, -s,  -q, -gq, or -bq)
  • Library and mate-pair data (options -m, -x, -f, -i,  or  -map)
  • Contig  data (options -c, -a, -ta, or -ace)
  • Scaffold data (option -a)

Options

-o <outfile> - place output in <outfile>
-m <matefile> - library and mate-pair information in Bambus format
-x <trace.xml> - ancilliary data (library, mate-pair, clear range) in Trace Archive format
-f <frg file> - library, mate-pair, sequence, quality, and clear range data in Celera Assembler message format
-s <fasta> - sequence information in multi-FASTA format
-q <qual> - quality information in multi-FASTA format
-gq <goodqual> - if no quality file provided bases within clear range are assigned this quality value (default 30)
-bq <badqual> - if no quality file provided bases outside the clear range are assigned this quality value (default 10)
-a <asm file> - contig and scaffold information in Celera Assembler message format
-c <contig file> - contig information in TIGR Assembler GDE-like output
-ta <TA asm file> - contig information in TIGR Assembler .asm output
-ace <ace file> - contig information in ACE format
-map <dstmap> - mapping from internal library ID to external library ID useful in conjunction with the -f option.  This file consists of space-separated records providing a mapping from the "acc:" field in "DST" records within the .frg file to an externally recognizable name for each library.

TIGR specific options (not too useful outside TIGR)

-i <insert file>  - use mapping from internal library ID to external library ID provided in a .insert file produced by pullfrag.

Known issues

The -ta (TIGR Assembler input) and -ace (ACE formatted input) options have not been throughly tested and likely do not properly work.  Contact us if either of these options is important to you.

Contact Information

Please direct your questions and suggestions to:

Acknowledgements

The development of amos2ace was supported by the National Science Foundation under grant KDI-9980088 and by the National Institutes of Health under grant R01-LM06845.