AMOSLib

From AMOS WIKI
Jump to: navigation, search

Overview

The interface to the AMOS package is done through AMOS message files. This representation is described in detail here, and was inspired by the interchange format developed at Celera Genomics for use in Celera Assembler.


In order to help those who want to directly read these files we provide in the AMOS::AmosLib Perlmodule a set of routines that abstract the message structure.


AMOS::AmosLib is fully documented through perldoc.


System requirements

AMOS::AmosLib is written in Perl. It requires Perl 5.6.0 or newer and was tested on several Unix systems including Linux RedHat 7.3, OSF5.1, Sun Solaris, and Linux SuSE 9.1 and should run on most systems UNIX systems. Obtaining AMOS::AmosLib

AMOS::AmosLib can be downloaded as a part of the AMOS package.


SYNOPSIS

use AmosLib;


AMOS/Celera Assembler message processing

my $rec = getRecord(\*STDIN);

Reads from stdin the text between "extreme" { and } . for example if the input is:

{A
{B
}
}

getRecord eturns the whole: {A{B}}


my($id, $fields, $recs) = parseRecord($rec);

Parses a record and returns a triplet consisting of - record type - hash of fields and values - array of sub-records


my($id) = getCAId($CAid);

Obtains the ID from a "paired" id, that is, converts (10, 1000) into 10. If the Id is not a pair in parantheses, it returns the input. Thus, getCAId('(10, 1000)') returns 10 while getCAId("abba") returns "abba".


Fasta file creation

printFastaSequence($file, $header, $seq);

Prints sequence in Fasta format Inputs are: $file - output file opened for writing $header - Fasta header (without >) $seq - sequence to be written


printFastaQual($file, $header, $qual);

Prints quality values in Fasta format. Inputs are: $file - output file $header - fasta header (without >) $qual - string of quality values


Sequence processing

my($rev) = reverseComplement($seq);

Reverse complements a sequence.


TIGR .contig format generation

printContigRecord($file, $id, $len, $nseq, $sequence, $how);

Prints contig in specified format Inputs are:

  • $file - output file (opened for writing)
  • $id - contig ID
  • $len - contig length
  • $nseq - number of sequences in contig (same as number of sequence records that will follow the contig
  • $sequence - consensus sequence for the contig
  • $how - what type of output is required:
    • contig - TIGR .contig format
    • asm - TIGR .asm format fasta - multi-fasta format


printSequenceRecord($file, $name, $seq, $offset, $rc, $seqleft, $seqright, $asml, $asmr, $type);

Prints the record for a sequence aligned to a contig Inputs are:

  • $file - output file opened for writing
  • $name - sequence name
  • $seq - actual sequence
  • $offset - offset in consensus
  • $rc - "RC" if sequence is reverse complemented, "" otherwise
  • $seqleft, $seqright - alignment range within sequence
  • $asml, $asmr - alignment range within consensus
  • $type - type of output:
    • contig - output is in TIGR .contig format
    • asm - output is in TIGR .asm format