License : Creative Commons Attribution 4.0 International (CC BY-NC-SA 4.0)
Copyright : Hervé Frezza-Buet, Jérémy Fix, CentraleSupelec
Last modified : May 2, 2024 02:04
Link to the source : allocation.md

Table of contents

Allocation of a machine

Setting up the cscluster script

You can run raw slurm commands on the cluster but we provide you with a convenient script to communicate with the scheduler and your allocations. This is a bash script called cscluster you need to download.

For convenient use, we suggest you to place that script in the folder ~/.local/bin that you may need to create if it does not exist:

mylogin@mymachine:~$ mkdir -p ~/.local/bin
mylogin@mymachine:~$ mv cscluster ~/.local/bin
mylogin@mymachine:~$ chmod u+x ~/.local/bin/cscluster

The ~/.local/ directory is pretty standard for getting use specific binaries, libraries, etc.. However, you still to indicate that path to your system so that it finds the binaries you give in the ~/.local/bin/ subdirectory. To do that, you need to extend the PATH environment variable with that path and to get it for every new terminal session, you can provide it in your shell specific rc-file such as ~/.bashrc if you use bash (usually the default). Adding the following line

export PATH=$PATH:~/.local/bin/

should be enough. If you now open a new terminal, which will reload the rc-file, you should be able to type in the command

mylogin@mymachine:~$ cscluster

which should outputs the help message of cscluster.

What information do I need ?

Before allocating a machine, be sure that you know the following:

For the latter, type on a terminal of your personal computer:

mylogin@mymachine:~$ cscluster

You should see an help.

Working (fasten your seatbelt)

Allocation : the “book” command of cscluster.

To get an help for the allocation command, ask cscluster like this

mylogin@mymachine:~$ cscluster book --help

So booking a machine is a command like the following, replace *** by the information that you should know. The two commands below show how to use a reservation ticket and how to do last-minute booking.

mylogin@mymachine:~$ cscluster book --user *** --cluster *** --reservation ***
mylogin@mymachine:~$ cscluster book --user *** --cluster *** --partition *** --waltime **:**

You have to wait a bit, ant then you will see a job id printed on the screen (it is an integer). Allocation is done.

Note that when you do not have a reservation name and have to specify a partition and walltime, the walltime must be smaller than the timelimit allowed on the partition. The timilimits can be accessed with cscluster by omitting to provide the partition :

mylogin@mymachine:~$ cscluster book --user *** --cluster ***

Release : the “kill” command of cscluster.

To get an help for the release command, ask cscluster like this

mylogin@mymachine:~$ cscluster kill --help

So releasing an allocation for which you know the job id is

mylogin@mymachine:~$ cscluster kill --user *** --cluster *** --jobid ***

If you omit the --jobid *** allocation specification, you will get a list of the possible job ids you can provide, according to what you have allocated so far.

Logging on an allocated machine : the “log” command of cscluster.

To get an help for the release command, ask cscluster like this

mylogin@mymachine:~$ cscluster log --help

So logging onto a machine allocated to you, for which you know the job id is

mylogin@mymachine:~$ cscluster log --user *** --cluster *** --jobid ***

If you omit the --jobid *** allocation specification, you will get a list of the possible job ids you can provide, according to what you have allocated so far.

Then, you are logged on the machine, as if you had done a ssh connection to it… which is directly feasible only with frontends, this is why the log command is useful for allocated machines.

SSH tunneling : the “port_forward” command of cscluster

Some of the programs you may want to run on allocated machines are starting servers (tensorboard, jupyter notebooks/labs, vnc servers, …). A server is a program that is waiting for clients to connect, via the network, in order to send requests and get answers.

Waiting servers are listening at a door, to see if some client knocks. In TCP/IP protocol, it is listening to a “port”, which is an integer. So when a client in the world needs to talk to the server, it has to provide a machine name (where the server runs) and a port number (the door where the server listens the visitors knockings). Both are often mentioned in a compact string like name:port (e.g. www.centralesupelec.fr:80).

The machine allocated on the cluster can host servers, but nobody in the world is authorized for coming directly at the door for knocking, for safety reasons. SSH offers a trick to overcome this limitation, which is called “port forwarding through encrypted ssh tunnels…”

The idea is to have a server on your personal computer listening on a door (not necessarily the same number as the remote door) and transferring the knocking on your personal computer’s door to the remote machine’s door.

Clients will have the feeling to talk to your machine, on a given port (door), but beyond the scene, they talk to the remote server.

So if you have allocated, the machine foo-026.cluster.centralesupelec.fr, and if you got job id 42 for this, you can log on the machine, using job id 42 and start a server listening on port 45678.

Every clients would like to talk to foo-026.cluster.centralesupelec.fr:45678 to use the service, but it is forbidden by the cluster safety policy.

Back on your personal computer (in another terminal), open a ssh tunnel with the cscluster script.

mylogin@mymachine:~$ cscluster port_forward --user *** --cluster *** --jobid 42 --port 45678

This blocks forever… until the keys CTRL-C is pressed. Keep it running on a separate terminal. Note you can also bind the remote port on a different number on your local computer. For example if you want the remote foo-026.cluster.centralesupelec.fr:45678 to be binded to your local computer localhost:93856, you would do invoke cscluster by :

mylogin@mymachine:~$ cscluster port_forward --user *** --cluster *** --jobid 42 --port 45678:93856

The command above is useful if you already have a server running on your computer and listening on the port 93856.

Back to our example, your personal computer may have an internet address, like spiderman.worldcompany.org so clients can connect to spiderman.worldcompany.org:45678 in order to talk to foo-026.cluster.centralesupelec.fr:45678.

The classical usecase is that your personal computer does not have a name on the internet, and clients are only programs running on your personal computer (they do not come from the wide world). So talking to foo-026.cluster.centralesupelec.fr:45678 can be done by talking to “the current machine I am using”, whose address is 127.0.0.1, also called localhost.

After setting up the ssh tunnel, you can tell clients to connect to 127.0.0.1:45678 and they will talk to foo-026.cluster.centralesupelec.fr:45678

Try the help to get more options for port forwarding

mylogin@mymachine:~$ cscluster port_forward --help
Hervé Frezza-Buet, Jérémy Fix,