OverviewThe AMOS package uses a compact representation for the information exchange to and from the assembler. This representation, the AMOS message format, is described in detail here, and was inspired by the interchange format developed at Celera Genomics for use in Celera Assembler. Tarchive2amos is a utility that allows users to convert files from the NCBI Trace Archive format into the AMOS message format. System requirementstarchive2amos is written in Perl. It requires Perl 5.6.0 or newer and was tested on several Unix systems including Linux RedHat 7.3, OSF5.1, Sun Solaris, and Linux SuSE 9.1 and should run on most systems UNIX systems. Obtaining tarchive2amostarchive2amos can be downloaded as a part of the AMOS package. This software is OSI Certified
Open Source Software.
DocumentationRequired inputs tarchive2amos can use data specified in one of the following three formats:
In addition to these files, the user can provide a list of clear ranges (clipping coordinates) in a separate file. This information will override any set by the xml files. Furthermore, reads not present in the clear range file will be excluded from the conversion. Note that if a clear range file is not specified, reads with no clear range set in the XML or the sequence file (see below) will be assigned a clear range that spans the entire extent of the read. Sequence file formats tarchive2amos accepts four different formats for the header lines in the sequence file:
Note that the sequence and quality files are linked through the first identifier on the multi-fasta header line. The XML and the sequence files are linked through the TRACE_NAME field in the XML (it has to match the trace name portion of the header in the Trace Archive format, or the trace identifier in the other two formats). Synopsis tarchive2amos -o
<prefix> [-c <clear_ranges>] [-l <libs>] tarchive2amos will read one or more sequence files (as described above) and place the ouptut in a file called <prefix>.afg. Note that the -o option is required. A set of clear ranges may be specified in an addional file (with option -c) in the format: <read id> <clip_left> <clip_right> These values will overwrite any value specified in the XML or
sequence files. In addition to Trace Archive XMLs, tarchive2amos also accepts
library and read mate information in a Bambus-style .mates
file. Furthermore, library information can also be provided
with the -l option in a file formatted as follows: <lib_id> <mean_size>
<size_stdev> Additional options -i <id>
- specifies the starting identifier for the messages generated. This
option is useful when appending to an already existing AMOS bank. NotesThe program produces rather verbose output when inconsistencies are found in the data. Contact InformationPlease direct your questions and suggestions to: AcknowledgementsThe development of tarchive2amos was supported by the National Science Foundation under grant KDI-9980088 and by the National Institutes of Health under grant R01-LM06845. |
|||