Skip to content
Mark edited this page Apr 13, 2018 · 2 revisions

About Latte

Latte is a GPU server, donated in part by NVIDIA Corp. for use by the CS community. It features 8 datacenter-class NVIDIA Tesla P100 GPUs, which offer a large speedup for machine learning and related GPU computing tasks. The Tensorflow and PyTorch libraries are available for use as well.

User Guide

Getting Started

To begin using latte, you need to first have a CSUA account and be a member of the ml2018 group. You can check if you are a member by logging into soda.csua.berkeley.edu and using the id command.

To get a CSUA account, please visit our office in 311 Soda and an officer will create an account for you.

To get into the ml2018 group, send an email to latte@csua.berkeley.edu with the following:

  • Name
  • CSUA Username
  • Intended use

Once we receive your email, we will give you access to the group.

Once you have an account, you can log into latte.csua.berkeley.edu over SSH. This will bring you into the slurmctld machine. From here, you can begin setting up your jobs.

Testing Your Jobs

slurmctld is meant for testing only. There are limits to the amount of compute you can use while in this machine.

The /datasets/ directory has some publicly-available datasets to use in /datasets/share/. If you are using your own dataset, please place them in /datasets/ inside a subdirectory of your choosing. /datasets/ has the restricted deletion bit set, so anything you put in your subdirectory cannot be deleted by anyone but you (and root). Make sure to check if the dataset you're adding does not already exist.

When you first login, you will have an empty home directory. The contents of your home directory on soda are in /sodahome/, which is mounted over a network filesystem and will be slower than /home. While it may be annoying to copy files over, I assure you nothing is worse than doing file operations over an network-mounted file system.

Once you run your program and it works, you can submit a job.

Running Your Jobs

Slurm is used to manage the job scheduling on latte.

To run a job, you need to submit it using the sbatch command. You can read about how to use Slurm here.

This will send the job to one of the GPU nodes and run the job.

Contact

If you have any questions, please email latte@csua.berkeley.edu.

Clone this wiki locally