GPUH Cluster
GPUH Cluster
This server is behind the campus firewall, so it is not directly accessible from off-campus. If you are off-campus, you will need to route into an existing front facing server first (e.g. alpine.cse.unr.edu).
ssh $CSE-ID@h1.cse.unr.edu
If you are unable to run jobs across multiple nodes following the instructions below, please email ehelp@cse.unr.edu.
Compiling SLIURM Jobs
#/bin/bash #We Storage some example code from Lawrence Livermore National lab in #/llnl/mpi #Copy it to your home directory cp -r /opt/llnl/tutorials/mpi/samples/C ~/mpi cd ~/mpi #Compile an example mpicc -lpmi -o mpi_hello mpi_hello.c #Run the example srun -n16 mpi_hello
Output
$ srun -n16 mpi_hello
Running Tasks
SRUN
https://slurm.schedmd.com/srun.html
srun is synchronous and blocking. Use sbatch to submit a job to the queue.
#-n indicates the number of cores #--mem indicates the memory needed per node in megabytes #--time indicates the specified run time of the job $ srun -n16 --mem=2048 --time=00:05:00 ~/mpi/mpi_hello
SBATCH
https://slurm.schedmd.com/sbatch.html
$ cat ~/mpi/run.sh #!/bin/bash #SBATCH -n 16 #SBATCH --mem=2048MB #SBATCH --time=00:30:00 #SBATCH --mail-user=YOUR_EMAIL@DOMAIN.COM #SBATCH --mail-type=ALL srun ~/mpi/mpi_hello
batch the job:
$ sbatch ~/mpi/run.sh Submitted batch job 536 $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 536 main run.sh cse-admi R 0:03 2 compute-0-[0,3]
Check the Cluster status:
$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST main* up infinite 2 alloc compute-0-[0,3] main* up infinite 6 idle compute-0-[4-9] controller up infinite 1 idle h1 test up infinite 1 down* test
Node Hardware:
The cluster consists of 4 nodes, each with 64GB of RAM, 2x10 core CPU and 4x NVIDIA GTX1080s
HOWTO: Setup SLURM on your personal computer
https://source2.cse.unr.edu/w/cse/tutorials/slurm-mpi-setup/
Tags
None
Subscribers
None
- Last Author
- newellz2
- Last Edited
- Feb 27 2017, 11:57 AM