Version 1 vs 8
Version 1 vs 8
Content Changes
Content Changes
!!This server is behind the campus firewall, so it is not directly accessible from off-campus. If you are off-campus, you will need to route into an existing front facing server first (e.g. **alpine.cse.unr.edu**).!!
```
ssh $CSE-ID@h1.cse.unr.edu
```
If you are unable to run jobs across multiple nodes following the instructions below, please email [[mailto:ehelp@cse.unr.edu | ehelp@cse.unr.edu]].
==Compiling SLIURM Jobs
```
#/bin/bash
#We Storage some example code from Lawrence Livermore National lab in
#/llnl/mpi
#Copy it to your home directory
cp -r /opt/llnl/tutorials/mpi/samples/C ~/mpi
cd ~/mpi
#Compile an example
mpicc -lpmi -o mpi_hello mpi_hello.c
#Run the example
srun -n16 mpi_hello
```
Output
```
$ srun -n16 mpi_hello
```
==Running Tasks
===SRUN
https://slurm.schedmd.com/srun.html
srun is synchronous and blocking. Use sbatch to submit a job to the queue.
```
#-n indicates the number of cores
#--mem indicates the memory needed per node in megabytes
#--time indicates the specified run time of the job
$ srun -n16 --mem=2048 --time=00:05:00 ~/mpi/mpi_hello
```
===SBATCH
https://slurm.schedmd.com/sbatch.html
```
$ cat ~/mpi/run.sh
#!/bin/bash
#SBATCH -n 16
#SBATCH --mem=2048MB
#SBATCH --time=00:30:00
#SBATCH --mail-user=YOUR_EMAIL@DOMAIN.COM
#SBATCH --mail-type=ALL
srun ~/mpi/mpi_hello
```
batch the job:
```
$ sbatch ~/mpi/run.sh
Submitted batch job 536
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
536 main run.sh cse-admi R 0:03 2 compute-0-[0,3]
```
Check the Cluster status:
```
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
main* up infinite 2 alloc compute-0-[0,3]
main* up infinite 6 idle compute-0-[4-9]
controller up infinite 1 idle h1
test up infinite 1 down* test
```
==Node Hardware:
The cluster consists of 4 nodes, each with 64GB of RAM, 2x10 core CPU and 4x NVIDIA GTX1080s
==HOWTO: Setup SLURM on your personal computer
https://source2.cse.unr.edu/w/cse/tutorials/slurm-mpi-setup/
!!This server is behind the campus firewall, so it is not directly accessible from off-campus. If you are off-campus, you will need to ssh into a jumphost first (e.g. **alpine.cse.unr.edu**).!!
```
ssh $CSE-ID@gpuh.cse.unr.edu
```
If you are unable to run jobs across multiple nodes following the instructions below, please email [[mailto:ehelp@cse.unr.edu | ehelp@cse.unr.edu]].
==Configuration
NFS Mounts
```
/scratch #Compile and run stuff
/opt #Install stuff here
/home #Overrides IPA1 homedir IE: /cse/home/$USER
```
Playbook
https://source2.cse.unr.edu/diffusion/GPUH/
===Ansible Playbook===
```
sudo su
source /srv/python_env/bin/activate
ansible-playbook /srv/playbook/site.yml
```
===Ganglia
https://www.cse.unr.edu/gpuh/ganglia/
==Libraries
OpenMPI > /opt/openmpi
Compiled with SLURM PMI and CUDA
CUDA > /usr/local/cuda
dpkg -l | grep $WHATEVER_YOU_ARE_LOOKING_FOR
===OpenMPI===
```
cd /opt/src/openmpi-.2.0.2
./configure --prefix=/opt/openmpi --with-pmi=/usr \
--with-pmi-libdir=/usr/lib/x86_64-linux-gnu --with-pmix=internal —with-cuda
make uninstall
make -j 20 all install
```
==Compiling SLIURM Jobs
```
#/bin/bash
#We Storage some example code from Lawrence Livermore National lab in
#/llnl/mpi
#Copy it to your home directory
cp -r /opt/llnl/tutorials/mpi/samples/C ~/mpi
cd ~/mpi
#Compile an example
mpicc -lpmi -o mpi_hello mpi_hello.c
#Run the example
srun -n16 mpi_hello
```
Output
```
$ srun -n16 mpi_hello
```
==Running Tasks
===SRUN
https://slurm.schedmd.com/srun.html
srun is synchronous and blocking. Use sbatch to submit a job to the queue.
```
#-n indicates the number of cores
#--mem indicates the memory needed per node in megabytes
#--time indicates the specified run time of the job
$ srun -n16 --mem=2048 --time=00:05:00 ~/mpi/mpi_hello
```
===SBATCH
https://slurm.schedmd.com/sbatch.html
```
$ cat ~/mpi/run.sh
#!/bin/bash
#SBATCH -n 16
#SBATCH --mem=2048MB
#SBATCH --time=00:30:00
#SBATCH --mail-user=YOUR_EMAIL@DOMAIN.COM
#SBATCH --mail-type=ALL
srun ~/mpi/mpi_hello
```
batch the job:
```
$ sbatch ~/mpi/run.sh
Submitted batch job 536
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
536 main run.sh cse-admi R 0:03 2 head,node[01-03]
```
Check the Cluster status:
```
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
main* up infinite 2 alloc node[01-02]
main* up infinite 6 idle node[03], head
```
==Node Hardware:
The cluster consists of 4 nodes, each with 64GB of RAM, 2x10 core CPU and 4x NVIDIA GTX1080s
Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
==HOWTO: Setup SLURM on your personal computer
https://source2.cse.unr.edu/w/cse/tutorials/slurm-mpi-setup/
!!This server is behind the campus firewall, so it is not directly accessible from off-campus. If you are off-campus, you will need to routessh into an existing front facing server jumphost first (e.g. **alpine.cse.unr.edu**).!!
```
ssh $CSE-ID@h1gpuh.cse.unr.edu
```
If you are unable to run jobs across multiple nodes following the instructions below, please email [[mailto:ehelp@cse.unr.edu | ehelp@cse.unr.edu]].
==Configuration
NFS Mounts
```
/scratch #Compile and run stuff
/opt #Install stuff here
/home #Overrides IPA1 homedir IE: /cse/home/$USER
```
Playbook
https://source2.cse.unr.edu/diffusion/GPUH/
===Ansible Playbook===
```
sudo su
source /srv/python_env/bin/activate
ansible-playbook /srv/playbook/site.yml
```
===Ganglia
https://www.cse.unr.edu/gpuh/ganglia/
==Libraries
OpenMPI > /opt/openmpi
Compiled with SLURM PMI and CUDA
CUDA > /usr/local/cuda
dpkg -l | grep $WHATEVER_YOU_ARE_LOOKING_FOR
===OpenMPI===
```
cd /opt/src/openmpi-.2.0.2
./configure --prefix=/opt/openmpi --with-pmi=/usr \
--with-pmi-libdir=/usr/lib/x86_64-linux-gnu --with-pmix=internal —with-cuda
make uninstall
make -j 20 all install
```
==Compiling SLIURM Jobs
```
#/bin/bash
#We Storage some example code from Lawrence Livermore National lab in
#/llnl/mpi
#Copy it to your home directory
cp -r /opt/llnl/tutorials/mpi/samples/C ~/mpi
cd ~/mpi
#Compile an example
mpicc -lpmi -o mpi_hello mpi_hello.c
#Run the example
srun -n16 mpi_hello
```
Output
```
$ srun -n16 mpi_hello
```
==Running Tasks
===SRUN
https://slurm.schedmd.com/srun.html
srun is synchronous and blocking. Use sbatch to submit a job to the queue.
```
#-n indicates the number of cores
#--mem indicates the memory needed per node in megabytes
#--time indicates the specified run time of the job
$ srun -n16 --mem=2048 --time=00:05:00 ~/mpi/mpi_hello
```
===SBATCH
https://slurm.schedmd.com/sbatch.html
```
$ cat ~/mpi/run.sh
#!/bin/bash
#SBATCH -n 16
#SBATCH --mem=2048MB
#SBATCH --time=00:30:00
#SBATCH --mail-user=YOUR_EMAIL@DOMAIN.COM
#SBATCH --mail-type=ALL
srun ~/mpi/mpi_hello
```
batch the job:
```
$ sbatch ~/mpi/run.sh
Submitted batch job 536
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
536 main run.sh cse-admi R 0:03 2 compute-0-[0,3head,node[01-03]
```
Check the Cluster status:
```
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
main* up infinite 2 alloc compute-0-[0,3node[01-02]
main* up infinite 6 idle compute-0-[4-9]
controller up infinite 1 idle h1
test up infinite 1 down* testnode[03], head
```
==Node Hardware:
The cluster consists of 4 nodes, each with 64GB of RAM, 2x10 core CPU and 4x NVIDIA GTX1080s
Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
==HOWTO: Setup SLURM on your personal computer
https://source2.cse.unr.edu/w/cse/tutorials/slurm-mpi-setup/