License : Creative Commons Attribution 4.0 International (CC BY-NC-SA 4.0)
Copyright :
Hervé Frezza-Buet, Jérémy Fix,
CentraleSupelec
Last modified : May 2, 2024 02:04
Link to the source : allocation.md
You can run raw slurm commands on the cluster but we provide you with a convenient script to communicate with the scheduler and your allocations. This is a bash script called cscluster you need to download.
For convenient use, we suggest you to place that script in the folder ~/.local/bin
that you may need to create if it does not exist:
mylogin@mymachine:~$ mkdir -p ~/.local/bin
mylogin@mymachine:~$ mv cscluster ~/.local/bin
mylogin@mymachine:~$ chmod u+x ~/.local/bin/cscluster
The ~/.local/
directory is pretty standard for getting use specific binaries, libraries, etc.. However, you still to indicate that path to your system so that it finds the binaries you give in the ~/.local/bin/
subdirectory. To do that, you need to extend the PATH
environment variable with that path and to get it for every new terminal session, you can provide it in your shell specific rc-file such as ~/.bashrc
if you use bash (usually the default). Adding the following line
export PATH=$PATH:~/.local/bin/
should be enough. If you now open a new terminal, which will reload the rc-file, you should be able to type in the command
mylogin@mymachine:~$ cscluster
which should outputs the help message of cscluster.
Before allocating a machine, be sure that you know the following:
cscluster
script worksFor the latter, type on a terminal of your personal computer:
mylogin@mymachine:~$ cscluster
You should see an help.
To get an help for the allocation command, ask cscluster
like this
mylogin@mymachine:~$ cscluster book --help
So booking a machine is a command like the following, replace ***
by the information that you should know. The two commands below show how to use a reservation ticket and how to do last-minute booking.
mylogin@mymachine:~$ cscluster book --user *** --cluster *** --reservation ***
mylogin@mymachine:~$ cscluster book --user *** --cluster *** --partition *** --waltime **:**
You have to wait a bit, ant then you will see a job id printed on the screen (it is an integer). Allocation is done.
Note that when you do not have a reservation name and have to specify a partition and walltime, the walltime must be smaller than the timelimit allowed on the partition. The timilimits can be accessed with cscluster by omitting to provide the partition :
mylogin@mymachine:~$ cscluster book --user *** --cluster ***
To get an help for the release command, ask cscluster
like this
mylogin@mymachine:~$ cscluster kill --help
So releasing an allocation for which you know the job id is
mylogin@mymachine:~$ cscluster kill --user *** --cluster *** --jobid ***
If you omit the --jobid ***
allocation specification, you will get a list of the possible job ids you can provide, according to what you have allocated so far.
To get an help for the release command, ask cscluster
like this
mylogin@mymachine:~$ cscluster log --help
So logging onto a machine allocated to you, for which you know the job id is
mylogin@mymachine:~$ cscluster log --user *** --cluster *** --jobid ***
If you omit the --jobid ***
allocation specification, you will get a list of the possible job ids you can provide, according to what you have allocated so far.
Then, you are logged on the machine, as if you had done a ssh
connection to it… which is directly feasible only with frontends, this is why the log command is useful for allocated machines.
Some of the programs you may want to run on allocated machines are starting servers (tensorboard, jupyter notebooks/labs, vnc servers, …). A server is a program that is waiting for clients to connect, via the network, in order to send requests and get answers.
Waiting servers are listening at a door, to see if some client knocks. In TCP/IP protocol, it is listening to a “port”, which is an integer. So when a client in the world needs to talk to the server, it has to provide a machine name (where the server runs) and a port number (the door where the server listens the visitors knockings). Both are often mentioned in a compact string like name:port
(e.g. www.centralesupelec.fr:80
).
The machine allocated on the cluster can host servers, but nobody in the world is authorized for coming directly at the door for knocking, for safety reasons. SSH offers a trick to overcome this limitation, which is called “port forwarding through encrypted ssh tunnels…”
The idea is to have a server on your personal computer listening on a door (not necessarily the same number as the remote door) and transferring the knocking on your personal computer’s door to the remote machine’s door.
Clients will have the feeling to talk to your machine, on a given port (door), but beyond the scene, they talk to the remote server.
So if you have allocated, the machine foo-026.cluster.centralesupelec.fr
, and if you got job id 42
for this, you can log on the machine, using job id 42
and start a server listening on port 45678
.
Every clients would like to talk to foo-026.cluster.centralesupelec.fr:45678
to use the service, but it is forbidden by the cluster safety policy.
Back on your personal computer (in another terminal), open a ssh tunnel with the cscluster
script.
mylogin@mymachine:~$ cscluster port_forward --user *** --cluster *** --jobid 42 --port 45678
This blocks forever… until the keys CTRL-C
is pressed. Keep it running on a separate terminal. Note you can also bind the remote port on a different number on your local computer. For example if you want the remote foo-026.cluster.centralesupelec.fr:45678
to be binded to your local computer localhost:93856
, you would do invoke cscluster
by :
mylogin@mymachine:~$ cscluster port_forward --user *** --cluster *** --jobid 42 --port 45678:93856
The command above is useful if you already have a server running on your computer and listening on the port 93856.
Back to our example, your personal computer may have an internet address, like spiderman.worldcompany.org
so clients can connect to spiderman.worldcompany.org:45678
in order to talk to foo-026.cluster.centralesupelec.fr:45678
.
The classical usecase is that your personal computer does not have a name on the internet, and clients are only programs running on your personal computer (they do not come from the wide world). So talking to foo-026.cluster.centralesupelec.fr:45678
can be done by talking to “the current machine I am using”, whose address is 127.0.0.1
, also called localhost
.
After setting up the ssh tunnel, you can tell clients to connect to 127.0.0.1:45678
and they will talk to foo-026.cluster.centralesupelec.fr:45678
Try the help to get more options for port forwarding
mylogin@mymachine:~$ cscluster port_forward --help