AMOScmp-shortReads-alignmentTrimmed

From AMOS WIKI
Jump to: navigation, search

Overview

AMOScmp-shortReads-alignmentTrimmed is a modified version of AMOScmp designed for alignment based trimming and assembling of short reads. Differences compared to AMOScmp:

  • runs a reference based alignment trimming of the reads prior to the assembly
  • uses a smaller nucmer alignment cluster size
  • uses a smaller make-consensus alignment wiggle value


Algorithm

The trimming is performed as a set of several processing steps following the alignment of reads to the reference. Steps:

1. identify the zero coverage regions in the reference sequence (delta2cvg) 2. extract the read clear ranges from the alignment file (delta2clr) 3. extend the read clear ranges for the ones adjacent to zero coverage regions (delta2clr) 4. update the bank with the new clear ranges (updateClrRanges) 5. update the alignment file with the new read lengths and clear ranges (updateDeltaClr)


Three Perl scripts recently added to the AMOS package (release 2.0.8) are called by the AMOScmp-shortReads-alignmentTrimmed pipeline. Their main purpose is to parse and update the nucmer alignment(delta) file. Scripts:

  • delta2cvg: computes the alignment coverage of the reference sequence(s)
  • delta2clr: computes the minimum 5' and maximum 3' alignment coordinates of the aligned reads
  • updateDeltaClr: shifts the alignment coordinates 5' positions to the left (for reads with the minimum 5'>0)


Parameters

AMOScmp-shortReads-alignmentTrimmed also allows additional parameters than AMOScmp. Defaults:

  MINCLUSTER  = 16
  MINMATCH    = 16
  MINLEN      = 24  # delta-filter -l 24
  MINOVL      = 5
  MAXTRIM     = 10
  MAJORITY    = 50
  CONSERR     = 0.06
  ALIGNWIGGLE = 2


How to Run

Input files: Assuming that prefix is the name of the organism to assemble, two files are required:

  1. prefix.1con : reference sequence: a related organism sequence in FASTA format (complete or well assembled, usually downloaded from GenBank)
  2. prefix.afg : AMOS message file that contains read/fragment messages corresponding to each short read; it can be generated using the toAmos script

Example:

  $ toAmos -s prefix.seq -o prefix.afg                           # create an AMOS message file from short read FASTA sequences
  $ AMOScmp-shortReads-alignmentTrimmed prefix                   # assemble reads (alignment based trimming, default parameters)