Skip to content

Commit

Permalink
Version 0.95 beta
Browse files Browse the repository at this point in the history
  • Loading branch information
V-Z committed Nov 27, 2015
1 parent f5a63ec commit a0fb54a
Show file tree
Hide file tree
Showing 7 changed files with 201 additions and 98 deletions.
4 changes: 2 additions & 2 deletions .info
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
CURRENTVERSION=0.9
NEWVERSION=https://github.com/V-Z/sondovac/releases/download/v0.9-beta/sondovac-0.9-beta.zip
CURRENTVERSION=0.95
NEWVERSION=https://github.com/V-Z/sondovac/releases/download/v0.95-beta/sondovac-0.95-beta.zip
6 changes: 4 additions & 2 deletions CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@ Sondovač is a script to create orthologous low-copy nuclear probes from
transcriptome and genome skim data for target enrichment.


Version 0.95 beta released 2015-11-
Version 0.95 beta released 2015-11-27
================================================================================

* Offer the possibility to choose between transcripts or genome skim sequences
for further processing.
* Colorization of command-line user interface.
* Colorization of command-line user interface (incomplete).
* Added possibility to change -minIdentity parameter of BLAT in step 11, part B.
* Fixed problems with some transcriptome input files.
* Added possibility to set custom bait length.
* Added information about article in MER introducing Sondovač.


Expand Down
103 changes: 57 additions & 46 deletions README
Original file line number Diff line number Diff line change
Expand Up @@ -215,33 +215,33 @@ General parameters:

-h, -v Print help message and exit.
-u Check for updates. If there is newer version of Sondovač available on
https://github.com/V-Z/sondovac/releases/ download of newer version
will be offered to the user.
https://github.com/V-Z/sondovac/releases/ download of newer version
will be offered to the user.
-l Display LICENSE for license information (this script is licensed under
GNU GPL v.3, other software under variable licenses). Exit viewing by
pressing the "Q" key.
GNU GPL v.3, other software under variable licenses). Exit viewing
by pressing the "Q" key.
-r Display README (this file) for detailed usage instructions. Exit
viewing by pressing the "Q" key. More information is available in
PDF manual.
viewing by pressing the "Q" key. More information is available in
PDF manual.
-p Display INSTALL for detailed installation instructions. Exit viewing
by pressing the "Q" key. More information is available in PDF manual.
by pressing the "Q" key. More information is available in PDF manual.
-e Display detailed citation information and exit. See PDF manual for
more information.
more information.
-o Set name of output files. Output files will start with that name. Do
not use spaces or special characters - some software can not handle
them correctly. Default value (if user does not provide) another name
is "output". See below for list of produced output files.
not use spaces or special characters - some software can not handle
them correctly. Default value (if user does not provide) another
name is "output". See below for list of produced output files.
-i Running in interactive mode - script will on-demand ask for required
input files, installation of missing software etc. This is recommended
default value (the script runs interactively without explicit using
option "-n").
input files, installation of missing software etc. This is
recommended default value (the script runs interactively without
explicit using option "-n").
-n Running in non-interactive mode. User must provide at least required
input files (see below). You can use only one of parameters "-i" or
"-n" (not both of them). If script fails to find some of required
software packages, it will exit. This is recommended for batch or
repeated analysis, on remote servers and for more advanced users. User
must be sure that all required software is installed (see INSTALL and
PDF manual for details).
input files (see below). You can use only one of parameters "-i" or
"-n" (not both of them). If script fails to find some of required
software packages, it will exit. This is recommended for batch or
repeated analysis, on remote servers and for more advanced users.
User must be sure that all required software is installed (see
INSTALL and PDF manual for details).

Input files:
Those parameters are required when running in non-interactive mode.
Expand Down Expand Up @@ -276,41 +276,42 @@ Optional parameters:
possible to change them any time later (not even in interactive mode).

-a ### Read length of paired-end genome skim reads (parameter -M of FLASH,
see its manual for details).
see its manual for details).
Step 4 of Sondovač, sondovac_part_a.sh.
Ensure to use a certain insert size of the genome skim genomic library
in combination with an appropriate read length for sequencing in order
to enable merging of the paired-end genome skim reads.
in combination with an appropriate read length for sequencing in order
to enable merging of the paired-end genome skim reads.
DEFAULT: 250
OPTIONS: 125, 150, 250, 300
-y ## Sequence similarity between unique transcripts and the filtered,
combined genome skim reads (parameter -minIdentity of BLAT, see its
manual for details).
Step 5 of Sondovač, sondovac_part_a.sh.
combined genome skim reads (parameter -minIdentity of BLAT, see its
manual for details).
Step 5 of Sondovač, sondovac_part_a.sh. Step 11 of Sondovač,
sondovac_part_b.sh.
Consider the trade-off between probe specificity and number of
remaining matching sequences for probe design. Sequence similarity is
in percent.
remaining matching sequences for probe design. Sequence similarity
is in percent.
DEFAULT: 85
OPTIONS: Integer ranging from 70 to 100
-s #### Number of BLAT hits per transcript when matching unique transcripts
and the filtered, combined genome skim reads.
and the filtered, combined genome skim reads.
Step 6.2 of Sondovač, sondovac_part_a.sh.
Transcripts with a high number of BLAT hits, indicating repetitive
elements, need to be removed from the putative probe sequences.
elements, need to be removed from the putative probe sequences.
DEFAULT: 1000
OPTIONS: Integer ranging from 100 to 10000
-b ### Minimum exon (bait) length.
Steps 8 and 10 of Sondovač, sondovac_part_b.sh.
The minimum exon length should not fall below the bait length in order
to facilitate specific binding between genomic libraries and baits
during hybridization.
to facilitate specific binding between genomic libraries and baits
during hybridization.
DEFAULT: 120 (optimal length for phylogeny).
OPTIONS: Integer ranging from 120 to 200
-d 0.## Sequence similarity between probe sequences (parameter -c of
cd-hit-est, see its manual for details).
cd-hit-est, see its manual for details).
Step 9 of Sondovač, sondovac_part_b.sh.
Too similar probe sequences will interact with each other during
hybridization and thereby reduce enrichment efficiency.
hybridization and thereby reduce enrichment efficiency.
DEFAULT: 0.9 (highly recommended).
OPTIONS: Decimal ranging from 0.85 to 0.95
-g Use genome skim sequences instead of transcripts for making the probes.
Expand Down Expand Up @@ -349,32 +350,42 @@ All names of input files and paths to them must be without spaces and without
special characters (some software has difficulties to handle them in such case).

Script sondovac_part_a.sh requires as input files:
1) Transcriptome input file in FASTA format.
1) Transcriptome input file in FASTA format. Note: For technical reasons, names
of FASTA sequences in must be only unique number (no any other characters).
Sondovač will check the names and if they are not in appropriate form, copy
of this input files with correct names will be created.
2) Plastome reference sequence input file in FASTA format.
3) Paired-end genome skim input file in FASTQ format (two files).
3) Paired-end genome skim input file in FASTQ format (two files - forward and
reverse reads).
4) OPTIONAL: Mitochondriome reference sequence input file in FASTA format.
This file is not required.

Script sondovac_part_a.sh creates the following files:
0) *_renamed.fasta - If needed, copy of transcriptome input file with changed
names of FASTA sequences (unique numbers respective to line numbers in
original file) will be created. File *_old_and_new_names.tsv then
contains two columns: 1) original sequence names as in user provided
transcriptome input file and 2) new sequence names. This might be useful
to trace back some sequences/probes.
1) *_blat_unique_transcripts.psl - Output of BLAT (removal of transcripts
sharing ≥90% sequence similarity)
2) *_unique_transcripts.fasta - Unique transcripts in FASTA format
sharing ≥90% sequence similarity).
2) *_unique_transcripts.fasta - Unique transcripts in FASTA format.
3) *_genome_skim_data_no_cp_reads.bam - SAM converted to BAM (removal of reads
of plastid origin)
4) *_genome_skim_data_no_cp_reads* - Genome skim data without cpDNA reads
of plastid origin).
4) *_genome_skim_data_no_cp_reads* - Genome skim data without cpDNA reads.
5) *_genome_skim_data_no_cp_no_mt_reads.bam - SAM converted to BAM (removal of
reads of mitochondrial origin) - only if mitochondriome reference
sequence was used
sequence was used.
6) *_genome_skim_data_no_cp_no_mt_reads* - Genome skim data without mtDNA
reads - only if mitochondriome reference sequence was used
7) *_combined_reads_co_cp_no_mt_reads* - Combined paired-end genome skim reads
reads - only if mitochondriome reference sequence was used.
7) *_combined_reads_co_cp_no_mt_reads* - Combined paired-end genome skim reads.
8) *_blat_unique_transcripts_versus_genome_skim_data.pslx - Output of BLAT
(matching of the unique transcripts and the filtered, combined genome
skim reads sharing ≥85% sequence similarity)
skim reads sharing ≥85% sequence similarity).
9) *_blat_unique_transcripts_versus_genome_skim_data.fasta - Matching
sequences in FASTA
sequences in FASTA.
10) *_blat_unique_transcripts_versus_genome_skim_data-no_missing_fin.fsa -
Final FASTA sequences for usage in Geneious
Final FASTA sequences for usage in Geneious.
Files 1-9 are not necessary for further processing by this pipeline, but may be
useful for the user. The last file (10) is used as input file for Geneious in
the next step. Asterisk (*) denotes beginning of the output files names
Expand Down
Binary file modified manual/sondovac_manual.pdf
Binary file not shown.
23 changes: 14 additions & 9 deletions manual/sondovac_manual.tex
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
\setdefaultlanguage{english}

% Opening
\title{Manual for Sondovač 0.9 beta}
\title{Manual for Sondovač 0.95 beta}
\author{Roswitha Schmickl, Aaron Liston, Vojtěch Zeisek and others}

% Allow line breaks within URLs
Expand All @@ -42,7 +42,7 @@
unicode=true,
colorlinks=true,
pagebackref=true,
pdftitle={Sondovac manual 0.9 beta},
pdftitle={Sondovac manual 0.95 beta},
plainpages=false,
pdfauthor={Roswitha Schmickl, Aaron Liston, Vojtech Zeisek and others},
pdfsubject={Software manual},
Expand Down Expand Up @@ -640,7 +640,7 @@ \subsubsection{Optional parameters}
\end{itemize}
\item[\texttt{-y \#\#}] Sequence similiarity between unique transcripts and the filtered, combined genome skim reads (parameter -minIdentity of BLAT, see its manual for details).
\begin{itemize}
\item Step 5 of Sondovač, \texttt{sondovac$\_$part$\_$a.sh}.
\item Step 5 of Sondovač, \texttt{sondovac$\_$part$\_$a.sh}. Step 11 of Sondovač, \texttt{sondovac$\_$part$\_$b.sh}.
\item Consider the trade-off between probe specificity and number of remaining matching sequences for probe design. Sequence similarity is in percent.
\item DEFAULT: 85
\item OPTIONS: Integer ranging from 70 to 100
Expand Down Expand Up @@ -679,15 +679,16 @@ \subsection{Input and output files}
\textbf{Script \texttt{sondovac$\_$part$\_$a.sh} requires as input files:}

\begin{enumerate}
\item Transcriptome input file in FASTA format.
\item Transcriptome input file in FASTA format. \textbf{Note:} For technical reasons, names of FASTA sequences in \textit{must} be only unique number (no any other characters). Sondovač will check the names and if they are not in appropriate form, copy of this input files with correct names will be created.
\item Plastome reference sequence input file in FASTA format.
\item Paired-end genome skim input file in FASTQ format (two files).
\item Paired-end genome skim input file in FASTQ format (two files -- forward and reverse reads).
\item OPTIONAL: Mitochondriome reference sequence input file in FASTA format. This file is not required.
\end{enumerate}

\textbf{Script \texttt{sondovac$\_$part$\_$a.sh} creates the following files:}

\begin{enumerate}
\begin{enumerate}[start=0]
\item \texttt{*$\_$renamed.fasta} -- If needed, copy of transcriptome input file with changed names of FASTA sequences (unique numbers respective to line numbers in original file) will be created. File \texttt{*$\_$old$\_$and$\_$new$\_$names.tsv} then contains two columns: \textbf{1)} original sequence names as in user provided transcriptome input file and \textbf{2)} new sequence names. This might be useful to trace back some sequences/probes.
\item \texttt{*$\_$blat$\_$unique$\_$transcripts.psl} -- Output of BLAT (removal of transcripts sharing $\geq$90\% sequence similarity).
\item \texttt{*$\_$unique$\_$transcripts.fasta} -- Unique transcripts in FASTA format.
\item \texttt{*$\_$genome$\_$skim$\_$data$\_$no$\_$cp$\_$reads.bam} -- SAM converted to BAM (removal of reads of plastid origin).
Expand Down Expand Up @@ -830,12 +831,14 @@ \section{Changelog}

List of changes in released versions of Sondovač.

\subsection{Version 0.95 beta released 2015-11-}
\subsection{Version 0.95 beta released 2015-11-27}

\begin{itemize}
\item Offer the possibility to choose between transcripts or genome skim sequences for further processing.
\item Colorization of command-line user interface.
\item Colorization of command-line user interface (incomplete).
\item Added possibility to change -minIdentity parameter of BLAT in step 11, part B.
\item Fixed problems with some transcriptome input files.
\item Added possibility to set custom bait length.
\item Added information about article in MER introducing Sondovač.
\end{itemize}

Expand Down Expand Up @@ -1508,6 +1511,8 @@ \subsection{MIT License}
\vfill
\hrule
\vfill
Created in typographical system \XeLaTeX, \href{http://www.xelatex.org/}{http://www.xelatex.org/}, references with \BibTeX, \href{http://www.bibtex.org/}{http://www.bibtex.org/} on openSUSE GNU/Linux, \href{http://www.opensuse.org/}{http://www.opensuse.org/}, \today.
\begin{tiny}
Created in typographical system \XeLaTeX, \href{http://www.xelatex.org/}{http://www.xelatex.org/}, references with \BibTeX, \href{http://www.bibtex.org/}{http://www.bibtex.org/} on openSUSE GNU/Linux, \href{http://www.opensuse.org/}{http://www.opensuse.org/}, \today.
\end{tiny}

\end{document}
Loading

0 comments on commit a0fb54a

Please sign in to comment.