Skip to content
sebhtml edited this page Sep 26, 2011 · 1 revision

NAME Ray - assemble genomes in parallel using the message-passing interface

SYNOPSIS mpiexec -np NUMBER_OF_RANKS Ray -k KMERLENGTH -p l1_1.fastq l1_2.fastq -p l2_1.fastq l2_2.fastq -o test

DESCRIPTION:

   -help
          Displays this help page.

   -version
          Displays Ray version and compilation options.

K-mer length

   -k kmerLength
          Selects the length of k-mers. The default value is 21.
          It must be odd because reverse-complement vertices are stored together.
          The maximum length is defined at compilation by MAXKMERLENGTH
          Larger k-mers utilise more memory.

Inputs

   -p leftSequenceFile rightSequenceFile [averageOuterDistance standardDeviation]
          Provides two files containing paired-end reads.
          averageOuterDistance and standardDeviation are automatically computed if not provided.

   -i interleavedSequenceFile [averageOuterDistance standardDeviation]
          Provides one file containing interleaved paired-end reads.
          averageOuterDistance and standardDeviation are automatically computed if not provided.

   -s sequenceFile
          Provides a file containing single-end reads.

Outputs

   -o outputDirectory
          Specifies the directory for outputted files. Default is RayOutput

   -amos
          Writes the AMOS file called RayOutput/AMOS.afg
          An AMOS file contains read positions on contigs.
          Can be opened with software with graphical user interface.

   -write-kmers
          Writes k-mer graph to RayOutput/kmers.txt
          The resulting file is not utilised by Ray.
          The resulting file is very large.

   -write-seeds
          Writes seed DNA sequences to RayOutput/Rank<rank>.RaySeeds.fasta

   -write-extensions
          Writes extension DNA sequences to RayOutput/Rank<rank>.RayExtensions.fasta

   -write-contig-paths
          Writes contig paths with coverage values
          to RayOutput/Rank<rank>.RayContigPaths.txt

Memory usage

   -show-memory-usage
          Shows memory usage. Data is fetched from /proc on GNU/Linux
          Needs __linux__

   -show-memory-allocations
          Shows memory allocation events

Algorithm verbosity

   -show-extension-choice
          Shows the choice made (with other choices) during the extension.

   -show-ending-context
          Shows the ending context of each extension.
          Shows the children of the vertex where extension was too difficult.

   -show-distance-summary
          Shows summary of outer distances used for an extension path.

Assembly options (defaults work well)

   -color-space
          Runs in color-space
          Needs csfasta files. Activated automatically if csfasta files are provided.

   -minimumCoverage minimumCoverage
          Sets manually the minimum coverage.
          If not provided, it is computed by Ray automatically.

   -peakCoverage peakCoverage
          Sets manually the peak coverage.
          If not provided, it is computed by Ray automatically.

   -repeatCoverage repeatCoverage
          Sets manually the repeat coverage.
          If not provided, it is computed by Ray automatically.

Checkpointing

   -write-checkpoints
          Write checkpoint files

   -read-checkpoints
          Read checkpoint files

   -read-write-checkpoints
          Read and write checkpoint files

Hardware testing

   -test-network-only
          Test the network and return. This option enables -write-network-test-raw-data.

   -write-network-test-raw-data
          Writes one additional file per rank detailing the network test.

Debugging

   -run-profiler
          Runs the profiler as the code runs.
          Running the profiler increases running times.

   -show-communication-events
          Shows all messages sent and received.

   -show-read-placement
          Shows read placement in the graph during the extension.

   -debug-bubbles
          Debugs bubble code.
          Bubbles can be due to heterozygous sites or sequencing errors or other (unknown) events

   -debug-seeds
          Debugs seed code.
          Seeds are paths in the graph that are likely unique.

   -debug-fusions
          Debugs fusion code.

   -debug-scaffolder
          Debug the scaffolder.

FILES

Input files

 Note: file format is determined with file extension.

 .fasta
 .fasta.gz (needs HAVE_LIBZ=y at compilation)
 .fasta.bz2 (needs HAVE_LIBBZ2=y at compilation)
 .fastq
 .fastq.gz (needs HAVE_LIBZ=y at compilation)
 .fastq.bz2 (needs HAVE_LIBBZ2=y at compilation)
 .sff (paired reads must be extracted manually)
 .csfasta (color-space reads)

Outputted files

Scaffolds

 RayOutput/Scaffolds.fasta
  The scaffold sequences in FASTA format
 RayOutput/ScaffoldComponents.txt
  The components of each scaffold
 RayOutput/ScaffoldLengths.txt
  The length of each scaffold
 RayOutput/ScaffoldLinks.txt
  Scaffold links

Contigs

 RayOutput/Contigs.fasta
  Contiguous sequences in FASTA format
 RayOutput/ContigLengths.txt
  The lengths of contiguous sequences

Summary

 RayOutput/OutputNumbers.txt
  Overall numbers for the assembly

de Bruijn graph

 RayOutput/CoverageDistribution.txt
  The distribution of coverage values
 RayOutput/CoverageDistributionAnalysis.txt
  Analysis of the coverage distribution
 RayOutput/degreeDistribution.txt
  Distribution of ingoing and outgoing degrees
 RayOutput/kmers.txt
  k-mer graph, required option: -write-kmers
     The resulting file is not utilised by Ray.
     The resulting file is very large.

Assembly steps

 RayOutput/SeedLengthDistribution.txt
     Distribution of seed length
 RayOutput/Rank<rank>.RaySeeds.fasta
     Seed DNA sequences, required option: -write-seeds
 RayOutput/Rank<rank>.RayExtensions.fasta
     Extension DNA sequences, required option: -write-extensions
 RayOutput/Rank<rank>.RayContigPaths.txt
     Contig paths with coverage values, required option: -write-contig-paths

Paired reads

 RayOutput/LibraryStatistics.txt
  Estimation of outer distances for paired reads
 RayOutput/Library<LibraryNumber>.txt
     Frequencies for observed outer distances (insert size + read lengths)

Partition

 RayOutput/NumberOfSequences.txt
     Number of reads in each file
 RayOutput/SequencePartition.txt
  Sequence partition

Ray software

 RayOutput/RayVersion.txt
  The version of Ray
 RayOutput/RayCommand.txt
  The exact same command provided

AMOS

 RayOutput/AMOS.afg
  Assembly representation in AMOS format, required option: -amos

Communication

 RayOutput/MessagePassingInterface.txt

Number of messages sent RayOutput/NetworkTest.txt Latencies in microseconds RayOutput/RankNetworkTestData.txt Network test raw data

DOCUMENTATION

   This help page (always up-to-date)
   Manual (Portable Document Format): InstructionManual.pdf
   Mailing list archives: http://sourceforge.net/mailarchive/forum.php?forum_name=denovoassembler-users

AUTHOR Written by Sébastien Boisvert.

REPORTING BUGS Report bugs to denovoassembler-users@lists.sourceforge.net Home page: http://denovoassembler.sourceforge.net/

COPYRIGHT This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3 of the License.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
   GNU General Public License for more details.

   You have received a copy of the GNU General Public License
   along with this program (see LICENSE).

Ray 1.7-devel

Clone this wiki locally