Merge pull request #15 from Rappsilber-Laboratory/rolling_dev_1

Rolling dev 1
Rappsilber-Laboratory · Aug 3, 2020 · 4e6dec3 · 4e6dec3
2 parents da71cf8 + da39d3b
commit 4e6dec3
Show file tree

Hide file tree

Showing 23 changed files with 687 additions and 21 deletions.
diff --git a/Pipfile b/Pipfile
@@ -16,6 +16,7 @@ biopython = "*"
 pydot = "*"
 graphviz = "*"
 xlwt = "*"
+xlrd = "*"
 
 [dev-packages]
 twine = "*"
@@ -29,4 +30,7 @@ pytest-cov = "*"
 pytest-pydocstyle = "*"
 autopep8 = "*"
 docformatter = "*"
-coverage-badge = "*"
+coverage-badge = "*"
+sphinx = "*"
+sphinx_rtd_theme = "*"
+recommonmark = "*"
diff --git a/README.md b/README.md
@@ -1,11 +1,11 @@
-![logo](./documentation/xiRT_logo.png) 
+![logo](documentation/imgs/xiRT_logo.png) 
 
 ![release](https://flat.badgen.net/github/tag/Rappsilber-Laboratory/xirt)
 [![GitHub](https://flat.badgen.net/github/license/Rappsilber-Laboratory/xirt)](https://www.apache.org/licenses/LICENSE-2.0)
 [![Twitter](https://flat.badgen.net/twitter/follow/rappsilberlab?icon=twitter)](https://twitter.com/compomics)
 [![Python 3.7](https://img.shields.io/badge/python-3.7-blue.svg)](https://www.python.org/downloads/release/python-370/)
 ![PyPI version](https://flat.badgen.net/pypi/v/xiRT)
-![coverage](./documentation/coverage.svg)
+![coverage](documentation/imgs/coverage.svg)
 
 A python package for multi-dimensional retention time prediction for linear and crosslinked 
 peptides using a (siamese) deep neural network architecture.
@@ -27,13 +27,13 @@ xiRT requires the columns shown in the table below. Importantly, the xiRT framew
 CSM are sorted such that in the Peptide1 - Peptide2, Peptide1 is the longer or lexicographically 
 larger one for crosslinked RT predictions.
 
-![xiRT Architecture](documentation/xiRT.PNG)
+![xiRT Architecture](documentation/imgs/xiRT.PNG)
 
 ## Description
 xiRT is meant to be used to generate additional information about CSMs for machine learning-based
 rescoring frameworks (similar to percolator). However, xiRT also delivers RT prediction for various 
 scenarios. Therefore xiRT offers several training / prediction  modes that need to be configured 
-depending on the use case. At the moment training, prediction, crossvalidation are the supporte
+depending on the use case. At the moment training, prediction, crossvalidation are the supported
 modes.
 - *training*: trains xiRT on the input CSMs (using 10% for validation) and stores a trained model
 - *prediction*: use a pretrained model and predict RTs for the input CSMs
@@ -65,6 +65,11 @@ the CUDA libraries and other dependencies.
 >
 > conda install tensorflow-gpu
 
+Hint:
+pydot and graphviz sometimes make trouble when they are installed via pip. If on linux,
+simply use *sudo apt-get install graphviz*, on windows download latest graphviz package from 
+[here](https://www2.graphviz.org/Packages/stable/windows/), unzip the content of the file and the
+*bin* directory path to the windows PATH variable.
 
 #### Usage
 The command line interface (CLI) requires three inputs:
@@ -80,7 +85,7 @@ is used to determine network parameters (number of neurons, layers, regularizati
 definition of the prediction task (classification, regression, ordered regression). Depending
 on the decoding of the target variable the output layers need to be adapted. For standard RP 
 prediction, regression is essentially the only viable option. For SCX/hSAX (general classification
-from fractionation sexperiments) the prediction task can be formulated as classification, 
+from fractionation experiments) the prediction task can be formulated as classification, 
 regression or ordered regression. For the usage of regression for fractionation it is recommended 
 that the estimated salt concentrations are used as target variable for the prediction  (raw 
 fraction numbers are possible too).
@@ -114,10 +119,14 @@ This file determines the input data to be used and gives some training procedure
 - Sven Giese
 
 ## Citation
-If you consider xiRT helpful for your work please cite our manuscript. Currently, in preparation
-on soon on bioRxiv.org "xiRT: Retention Time Prediction using Neural Networks increases 
-Identifications in Crosslinking Mass Spectrometry".
+If you consider xiRT helpful for your work please cite our manuscript. *Currently, in preparation.*
 
 ## RappsilberLab
 The Rappsilber applies and developes crosslinking chemistry methods, workflows and software.
-Visit the lab page to learn more about the developed [software](https://www.rappsilberlab.org/software/).
+Visit the lab page to learn more about the developed [software](https://www.rappsilberlab.org/software/).
+
+## xiSUITE
+1) xiVIEW: Graham, M. J.; Combe, C.; Kolbowski, L.; Rappsilber, J. bioRxiv 2019.
+2) xiNET: Combe, C. W.; Fischer, L.; Rappsilber, J. Mol. Cell. Proteomics 2015.
+3) xiSPEC: Kolbowski, L.; Combe, C.; Rappsilber, J. Nucleic Acids Res. 2018, 46 (W1), W473–W478.
+4) xiSEARCH: Mendes, M. L.; Fischer, L.; Chen, Z. A.; Barbon, M.; O’Reilly, F. J.; Giese, S. H.; Bohlke‐Schneider, M.; Belsom, A.; Dau, T.; Combe, C. W.; Graham, M.; Eisele, M. R.; Baumeister, W.; Speck, C.; Rappsilber, J. Mol. Syst. Biol. 2019, 15 (9), e8994.
diff --git a/documentation/Makefile b/documentation/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = source
+BUILDDIR      = build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/documentation/coverage.svg → documentation/imgs/coverage.svg b/documentation/coverage.svg → documentation/imgs/coverage.svg
diff --git a/documentation/xiRT.PNG → documentation/imgs/xiRT.PNG b/documentation/xiRT.PNG → documentation/imgs/xiRT.PNG
diff --git a/documentation/xiRT_logo.png → documentation/imgs/xiRT_logo.png b/documentation/xiRT_logo.png → documentation/imgs/xiRT_logo.png
diff --git a/documentation/make.bat b/documentation/make.bat
@@ -0,0 +1,35 @@
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=source
+set BUILDDIR=build
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+	echo.installed, then set the SPHINXBUILD environment variable to point
+	echo.to the full path of the 'sphinx-build' executable. Alternatively you
+	echo.may add the Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.http://sphinx-doc.org/
+	exit /b 1
+)
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+
+:end
+popd
diff --git a/documentation/source/conf.py b/documentation/source/conf.py
@@ -0,0 +1,66 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+import os
+import sys
+sys.path.insert(0, os.path.abspath('.'))
+sys.path.insert(0, os.path.abspath('../'))
+
+
+# -- Project information -----------------------------------------------------
+
+project = 'xiRT'
+copyright = '2020, Sven Giese'
+author = 'Sven Giese'
+
+# The full version, including alpha/beta/rc tags
+release = '1.0.32'
+
+
+# -- General configuration ---------------------------------------------------
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = ['sphinx.ext.autodoc', 'sphinx.ext.coverage', 'sphinx.ext.napoleon',
+              'recommonmark']
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = []
+
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+html_theme = 'sphinx_rtd_theme'
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+
+napoleon_google_docstring = True
+napoleon_use_param = False
+napoleon_use_ivar = True
+
+source_suffix = {
+    '.rst': 'restructuredtext',
+    '.txt': 'markdown',
+    '.md': 'markdown',
+}
diff --git a/documentation/source/index.rst b/documentation/source/index.rst
@@ -0,0 +1,31 @@
+.. xiRT documentation master file, created by
+   sphinx-quickstart on Mon Aug  3 17:01:09 2020.
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+
+Welcome to xiRT's documentation!
+================================
+
+.. image:: xiRT_logo.png
+
+xiRT is a versatile python package for multi-dimensional retention time prediction
+for linear and crosslinked peptides.
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Contents:
+
+   readme
+   installation
+   usage
+   parameters
+
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
+
+
diff --git a/documentation/source/installation.rst b/documentation/source/installation.rst
@@ -0,0 +1,30 @@
+Installation
+==============
+To install xiRT simply run the command below. We recommend to use an isolated python environment,
+for example by using pipenv or conda.
+Using pipenv::
+
+Pipenv
+******
+To use pipenv as package manager, first make sure that pipenv is installed and run::
+
+>pipenv shell
+>pip install xirt
+
+conda
+******
+
+To enable CUDA support, the easiest thing is to create a conda environment. Conda will take care of
+the CUDA libraries and other dependencies::
+
+>conda create --name xirt_env python=3.7
+>conda activate xirt_env
+>pip install xirt
+>conda install tensorflow-gpu
+
+Hint
+*****
+pydot and graphviz sometimes make trouble when they are installed via pip. If on linux,
+simply use *sudo apt-get install graphviz*, on windows download latest graphviz package from
+[here](https://www2.graphviz.org/Packages/stable/windows/), unzip the content of the file and the
+*bin* directory path to the windows PATH variable.
diff --git a/documentation/source/modules.rst b/documentation/source/modules.rst
@@ -0,0 +1,7 @@
+xirt
+====
+
+.. toctree::
+   :maxdepth: 4
+
+   xirt
diff --git a/documentation/source/parameters.rst b/documentation/source/parameters.rst
@@ -0,0 +1,15 @@
+Parameters
+===============
+
+xiRT needs to two set of parameter files that are supplied via YAML files.
+
+
+xiRT-Parameters
+***************
+
+Parameters that govern the xiRT parameters.
+
+Learning-Parameters
+*******************
+
+Parameters that govern the separation of training and testing data for the learning.
diff --git a/documentation/source/readme.rst b/documentation/source/readme.rst
@@ -0,0 +1,26 @@
+xiRT - Introduction
+===================
+
+xiRT is a deep learning tool to predict the retention times(s) of linear and crosslinked peptides
+from multiple fractionation dimensions including RP (typically coupled to the mass spectrometer).
+xiRT was developed with a combination of SCX / hSAX / RP chromatography. However, xiRT supports
+all available chromatography methods.
+
+xiRT requires the columns shown in the table below. Importantly, the xiRT framework requires that
+CSM are sorted such that in the Peptide1 - Peptide2, Peptide1 is the longer or lexicographically
+larger one for crosslinked RT predictions.
+
+Description
+***********
+
+xiRT is meant to be used to generate additional information about CSMs for machine learning-based
+rescoring frameworks (similar to percolator). However, xiRT also delivers RT prediction for various
+scenarios. Therefore xiRT offers several training / prediction  modes that need to be configured
+depending on the use case. At the moment training, prediction, crossvalidation are the supported
+modes.
+- *training*: trains xiRT on the input CSMs (using 10% for validation) and stores a trained model
+- *prediction*: use a pretrained model and predict RTs for the input CSMs
+- *crossvalidation*: load/train a model and predict RTs for all data points without using them
+in the training process. Requires the training of several models during CV
+
+Note: all modes can be supplemented by using a pretrained model ("transfer learning").
diff --git a/documentation/source/usage.rst b/documentation/source/usage.rst
@@ -0,0 +1,21 @@
+Usage
+=====
+The command line interface (CLI) requires three inputs:
+
+1) input PSM/CSM file
+2) a `YAML <https://docs.ansible.com/ansible/latest/reference_appendices/YAMLSyntax.html>`_ file to configure the neural network architecture
+3) another YAML file to configure the general training / prediction behaviour, called setup-config
+
+To use xiRT these options are put together as shown below::
+
+> xirt(.exe) -i peptides.csv -o out_dir -x xirt_params.yaml -l learning_params.yaml
+
+To adapt the xiRT parameters a yaml config file needs to be prepared. The configuration file
+is used to determine network parameters (number of neurons, layers, regularization) but also for the
+definition of the prediction task (classification, regression, ordered regression). Depending
+on the decoding of the target variable the output layers need to be adapted. For standard RP
+prediction, regression is essentially the only viable option. For SCX/hSAX (general classification
+from fractionation experiments) the prediction task can be formulated as classification,
+regression or ordered regression. For the usage of regression for fractionation it is recommended
+that the estimated salt concentrations are used as target variable for the prediction  (raw
+fraction numbers are possible too).
diff --git a/documentation/source/xiRT_logo.png b/documentation/source/xiRT_logo.png