AMOSLib
Contents
Overview
The interface to the AMOS package is done through AMOS message files. This representation is described in detail here, and was inspired by the interchange format developed at Celera Genomics for use in Celera Assembler.
In order to help those who want to directly read these files we provide in the AMOS::AmosLib Perlmodule a set of routines that abstract the message structure.
AMOS::AmosLib is fully documented through perldoc.
System requirements
AMOS::AmosLib is written in Perl. It requires Perl 5.6.0 or newer and was tested on several Unix systems including Linux RedHat 7.3, OSF5.1, Sun Solaris, and Linux SuSE 9.1 and should run on most systems UNIX systems. Obtaining AMOS::AmosLib
AMOS::AmosLib can be downloaded as a part of the AMOS package.
SYNOPSIS
use AmosLib;
AMOS/Celera Assembler message processing
my $rec = getRecord(\*STDIN);
Reads from stdin the text between "extreme" { and } . for example if the input is:
{A {B } }
getRecord eturns the whole: {A{B}}
my($id, $fields, $recs) = parseRecord($rec);
Parses a record and returns a triplet consisting of - record type - hash of fields and values - array of sub-records
my($id) = getCAId($CAid);
Obtains the ID from a "paired" id, that is, converts (10, 1000) into 10. If the Id is not a pair in parantheses, it returns the input. Thus, getCAId('(10, 1000)') returns 10 while getCAId("abba") returns "abba".
Fasta file creation
printFastaSequence($file, $header, $seq); Prints sequence in Fasta format Inputs are: $file - output file opened for writing $header - Fasta header (without >) $seq - sequence to be written
printFastaQual($file, $header, $qual); Prints quality values in Fasta format. Inputs are: $file - output file $header - fasta header (without >) $qual - string of quality values
Sequence processing
my($rev) = reverseComplement($seq); Reverse complements a sequence.
TIGR .contig format generation
printContigRecord($file, $id, $len, $nseq, $sequence, $how); Prints contig in specified format Inputs are: $file - output file (opened for writing) $id - contig ID $len - contig length $nseq - number of sequences in contig (same as number of sequence records that will follow the contig $sequence - consensus sequence for the contig $how - what type of output is required: contig - TIGR .contig format asm - TIGR .asm format fasta - multi-fasta format
printSequenceRecord($file, $name, $seq, $offset, $rc, $seqleft, $seqright, $asml, $asmr, $type); Prints the record for a sequence aligned to a contig Inputs are: $file - output file opened for writing $name - sequence name $seq - actual sequence $offset - offset in consensus $rc - "RC" if sequence is reverse complemented, "" otherwise $seqleft, $seqright - alignment range within sequence $asml, $asmr - alignment range within consensus $type - type of output: contig - output is in TIGR .contig format asm - output is in TIGR .asm format