Typically, you would install Pull Into Place (PIP) both on your workstation and
on your supercomputing cluster. On your cluster, you would run the steps in
the pipeline that involve long rosetta simulations. On your workstation, you
would run the analysis and filtering steps between those simulations. The
reason for splitting up the work like this is that the analysis scripts have
more dependencies than the simulation scripts, and those dependencies can be
hard to install on clusters with limited internet access and/or out-of-date
software. Some of the analysis scripts also require a GUI environment, which
most clusters don't have. Also note that the simulation scripts require
python>=2.6
while the analysis scripts require python>=2.7
.
Note
Right now, the pipeline will only work as written on the QB3 cluster at UCSF. There are two issues preventing more general use. The first issue is that all of the scripts that submit jobs to the cluster use Sun Grid Engine (SGE) commands to do so. This would be easy to generalize, but so far there hasn't been any need to do so. If you have a need, send an email to the maintainer describing your system and we'll do what we can to support it.
The second issue is that the fragment generation scripts contain a number of hard-coded paths to executables and databases that are specific to the QB3 cluster. For a variety of reasons, fixing this would be a fairly serious undertaking. Nonetheless, please let the maintainer know if you need this done, and we'll do what we can. In the meantime, you can try using simulations that don't require fragments (although these don't perform as well) or generating fragments yourself.
PIP is available on PyPI, so you can use pip
to install it. (Sorry if the
distinction between PIP and pip
is confusing. PIP is the Pull Into Place
pipeline, pip
is the package manager distributed with modern versions of
python):
# For reasons I don't fully understand, some dependencies can't be installed # unless numpy is manually installed on its own step before everything else. $ pip install numpy $ pip install 'pull_into_place [analysis]'
The [analysis]
part of the command instructs pip
to install all of the
dependencies for the analysis scripts. These dependencies aren't installed by
default because they aren't needed for the rosetta simulation steps and they
can be challenging to install on some clusters.
If the installation worked, this command should print out a nice help message:
$ pull_into_place --help
Note
If you don't have administrator access on your workstation, or if you just
don't want to install PIP system-wide, you can use the --user
flag to
install PIP in your home directory:
$ pip install 'pull_into_place [analysis]' --user
This will install the PIP executable in ~/.local/bin
, which may not be
on your $PATH
by default. If the installation seemed to work but you
get a "command not found" error when trying to run pull_into_place
, you
probably need to add ~/.local/bin
to $PATH
:
echo 'export PATH=~/.local/bin:$PATH' >> ~/.bashrc source ~/.bashrc
The plot_funnels
command creates an interactive GUI that can show score vs
RMSD funnels, open structures corresponding to individual points in pymol
or chimera
, and keep track of your notes on different designs.
In order to use this command, you have to install pygtk
yourself. This
dependency is not included with the other [analysis]
dependencies because
it can't be installed with pip
(except maybe on Windows). On Linux
systems, your package manager should be able to install it pretty easily:
$ apt-get install python-matplotlib pygtk # Ubuntu $ yum install pygtk2 # Fedora<=21 $ dnf install pygtk2 # Fedora>=22
On Mac systems, the easiest way to do this is to use homebrew
to install
matplotlib
with the --with-pygtk
option:
$ brew install matplotlib --with-pygtk
If you want to install PIP in a virtual environment, you will probably have to
create the environment with the --system-site-packages
flag. This will
allow the virtual environment to use the pygtk
package installed by your
package manager.
If pip
is available on your cluster, use it:
$ pip install pull_into_place
Otherwise, you will need to install PIP manually. The first step is to
download and install source distributions of setuptools
and klab
. PIP
needs setuptools
to install itself and klab
to access a number of
general-purpose tools developed by the Kortemme lab. Once those dependencies
are installed, you can download and install a source distribution of
pull_into_place
. The next section has example command lines for all of
these steps in the specific context of the QB3 cluster at UCSF.
Because the UCSF cluster is not directly connected to the internet, it cannot automatically download and install dependencies. Instead, we have to do these steps manually.
Download the most recent source distributions for
setuptools
,klab
, andpull_into_place
from PyPI (those are links, in case it's hard to tell) onto your workstation. When I did this, the most recent distributions were:setuptools-27.2.0.tar.gz
klab-0.3.0.tar.gz
pull_into_place-1.2.2.tar.gz
Note
Three new dependencies were added to
setuptools
in version34.0.0
:six
,packaging
, andappdirs
. You can either install these dependencies in the same way as the others, or you can just use an earlier version of setuptools.Copy the source distributions onto the cluster:
$ scp setuptools-27.2.0.tar.gz chef.compbio.ucsf.edu: $ scp klab-0.3.0.tar.gz chef.compbio.ucsf.edu: $ scp pull_into_place-1.2.2.tar.gz chef.compbio.ucsf.edu:
Log onto the cluster and unpack the source distributions:
$ ssh chef.compbio.ucsf.edu $ tar -xzf setuptools-27.2.0.tar.gz $ tar -xzf klab-0.3.0.tar.gz $ tar -xzf pull_into_place-1.2.2.tar.gz
Install
setuptools
:$ cd ~/setuptools-27.2.0 $ python setup.py install --user
Install
klab
:$ cd ~/klab-0.3.0 $ python setup.py install --user
Install
pull_into_place
:$ cd ~/pull_into_place-1.2.2 $ python setup.py install --user
Make sure
~/.local/bin
is on your$PATH
:The above commands install PIP into
~/.local/bin
. This directory is good because you can install programs there without needing administrator privileges, but it's not on your$PATH
by default (which means that any programs installed there won't be found). This command modifies your shell configuration file to add~/.local/bin
to your$PATH
:$ echo 'export PATH=~/.local/bin:$PATH' >> ~/.bashrc
This command reloads your shell configuration so the change takes place immediately (otherwise you'd have to log out and back in):
$ source ~/.bashrc
Make sure it works:
$ pull_into_place --help
PIP also requires Rosetta to be installed, both on your workstation and on your cluster. You can consult this page for more information on how to do this, but in general there are two steps. First, you need to check out a copy of the source code from GitHub:
$ git clone git@github.com:RosettaCommons/main.git ~/rosetta
Note
You have to be registered as a "Rosetta developer" in order to be able to clone this repository (even if you never plan on developing any C++ code). This page has step-by-step instructions on how to register as a developer. You'll be asked for a password; anyone who's already a developer can give it to you.
Second, you need to compile everything:
$ cd ~/rosetta/source $ ./scons.py bin mode=release -j8
Be aware that compiling Rosetta requires a C++11 compiler. This is much more likely to cause problems on your cluster than on your workstation. If you have problems, ask your administrator for help.
Installing Rosetta on the QB3 cluster is especially annoying because the
cluster has limited access to the internet and outdated versions of both the
C++ compiler and python. As above, the first step is to check out a copy of
the Rosetta source code from GitHub. This has to be done from one of the
interactive nodes (e.g. iqint
, optint1
, optint2
, or xeonint
)
because chef
and sous
are not allowed to communicate with GitHub:
$ ssh chef.compbio.ucsf.edu $ ssh iqint $ git clone git@github.com:RosettaCommons/main.git ~/rosetta
The second step is to install the QB3-specific build settings, which specify the path to the cluster's C++11 compiler (among other things):
$ ln -s site.settings.qb3 ~/rosetta/source/tools/build/site.settings
The final step is to compile Rosetta. This command has several parts: scl
enable python27
causes python2.7 to be used for the rest of the command,
which scons
requires. nice
reduces the compiler's CPU priority, which
helps the shell stay responsive. ./scons.py bin
is the standard command to
build Rosetta. mode=release
tells to compiler to leave out debugging code,
which actually makes Rosetta ≈10x faster. -j16
tells the compiler that
iqint
has 16 cores for it to use:
$ cd ~/rosetta/source $ scl enable python27 'nice ./scons.py bin mode=release -j16'