⚡ Document History

!!This server is behind the campus firewall, so it is not directly accessible from off-campus. If you are off-campus, you will need to route into an existing front facing server first (e.g. **alpine.cse.unr.edu**).!! ``` ssh $CSE-ID@h1.cse.unr.edu ``` If you are unable to run jobs across multiple nodes following the instructions below, please email [[mailto:ehelp@cse.unr.edu | ehelp@cse.unr.edu]]. ==Compiling SLIURM Jobs ``` #/bin/bash #We Storage some example code from Lawrence Livermore National lab in #/llnl/mpi #Copy it to your home directory cp -r /opt/llnl/tutorials/mpi/samples/C ~/mpi cd ~/mpi #Compile an example mpicc -lpmi -o mpi_hello mpi_hello.c #Run the example srun -n16 mpi_hello ``` Output ``` $ srun -n16 mpi_hello ``` ==Running Tasks ===SRUN https://slurm.schedmd.com/srun.html srun is synchronous and blocking. Use sbatch to submit a job to the queue. ``` #-n indicates the number of cores #--mem indicates the memory needed per node in megabytes #--time indicates the specified run time of the job $ srun -n16 --mem=2048 --time=00:05:00 ~/mpi/mpi_hello ``` ===SBATCH https://slurm.schedmd.com/sbatch.html ``` $ cat ~/mpi/run.sh #!/bin/bash #SBATCH -n 16 #SBATCH --mem=2048MB #SBATCH --time=00:30:00 #SBATCH --mail-user=YOUR_EMAIL@DOMAIN.COM #SBATCH --mail-type=ALL srun ~/mpi/mpi_hello ``` batch the job: ``` $ sbatch ~/mpi/run.sh Submitted batch job 536 $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 536 main run.sh cse-admi R 0:03 2 compute-0-[0,3] ``` Check the Cluster status: ``` $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST main* up infinite 2 alloc compute-0-[0,3] main* up infinite 6 idle compute-0-[4-9] controller up infinite 1 idle h1 test up infinite 1 down* test ``` ==Node Hardware: The cluster consists of 4 nodes, each with 64GB of RAM, 2x10 core CPU and 4x NVIDIA GTX1080s ==HOWTO: Setup SLURM on your personal computer https://source2.cse.unr.edu/w/cse/tutorials/slurm-mpi-setup/

!!This server is behind the campus firewall, so it is not directly accessible from off-campus. If you are off-campus, you will need to ssh into a jumphost first (e.g. **alpine.cse.unr.edu**).!! ``` ssh $CSE-ID@gpuh.cse.unr.edu ``` If you are unable to run jobs across multiple nodes following the instructions below, please email [[mailto:ehelp@cse.unr.edu | ehelp@cse.unr.edu]]. ==Configuration NFS Mounts ``` /scratch #Compile and run stuff /opt #Install stuff here /home #Overrides IPA1 homedir IE: /cse/home/$USER ``` Playbook https://source2.cse.unr.edu/diffusion/GPUH/ ===Ansible Playbook=== ``` sudo su source /srv/python_env/bin/activate ansible-playbook /srv/playbook/site.yml ``` ===Ganglia https://www.cse.unr.edu/gpuh/ganglia/ ==Libraries OpenMPI > /opt/openmpi Compiled with SLURM PMI and CUDA CUDA > /usr/local/cuda dpkg -l | grep $WHATEVER_YOU_ARE_LOOKING_FOR ===OpenMPI=== ``` cd /opt/src/openmpi-.2.0.2 ./configure --prefix=/opt/openmpi --with-pmi=/usr \ --with-pmi-libdir=/usr/lib/x86_64-linux-gnu --with-pmix=internal —with-cuda make uninstall make -j 20 all install ``` ==Compiling SLIURM Jobs ``` #/bin/bash #We Storage some example code from Lawrence Livermore National lab in #/llnl/mpi #Copy it to your home directory cp -r /opt/llnl/tutorials/mpi/samples/C ~/mpi cd ~/mpi #Compile an example mpicc -lpmi -o mpi_hello mpi_hello.c #Run the example srun -n16 mpi_hello ``` Output ``` $ srun -n16 mpi_hello ``` ==Running Tasks ===SRUN https://slurm.schedmd.com/srun.html srun is synchronous and blocking. Use sbatch to submit a job to the queue. ``` #-n indicates the number of cores #--mem indicates the memory needed per node in megabytes #--time indicates the specified run time of the job $ srun -n16 --mem=2048 --time=00:05:00 ~/mpi/mpi_hello ``` ===SBATCH https://slurm.schedmd.com/sbatch.html ``` $ cat ~/mpi/run.sh #!/bin/bash #SBATCH -n 16 #SBATCH --mem=2048MB #SBATCH --time=00:30:00 #SBATCH --mail-user=YOUR_EMAIL@DOMAIN.COM #SBATCH --mail-type=ALL srun ~/mpi/mpi_hello ``` batch the job: ``` $ sbatch ~/mpi/run.sh Submitted batch job 536 $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 536 main run.sh cse-admi R 0:03 2 head,node[01-03] ``` Check the Cluster status: ``` $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST main* up infinite 2 alloc node[01-02] main* up infinite 6 idle node[03], head ``` ==Node Hardware: The cluster consists of 4 nodes, each with 64GB of RAM, 2x10 core CPU and 4x NVIDIA GTX1080s Mellanox Technologies MT27520 Family [ConnectX-3 Pro] ==HOWTO: Setup SLURM on your personal computer https://source2.cse.unr.edu/w/cse/tutorials/slurm-mpi-setup/

!!This server is behind the campus firewall, so it is not directly accessible from off-campus. If you are off-campus, you will need to routessh into an existing front facing server jumphost first (e.g. **alpine.cse.unr.edu**).!! ``` ssh $CSE-ID@h1gpuh.cse.unr.edu ``` If you are unable to run jobs across multiple nodes following the instructions below, please email [[mailto:ehelp@cse.unr.edu | ehelp@cse.unr.edu]]. ==Configuration NFS Mounts ``` /scratch #Compile and run stuff /opt #Install stuff here /home #Overrides IPA1 homedir IE: /cse/home/$USER ``` Playbook https://source2.cse.unr.edu/diffusion/GPUH/ ===Ansible Playbook=== ``` sudo su source /srv/python_env/bin/activate ansible-playbook /srv/playbook/site.yml ``` ===Ganglia https://www.cse.unr.edu/gpuh/ganglia/ ==Libraries OpenMPI > /opt/openmpi Compiled with SLURM PMI and CUDA CUDA > /usr/local/cuda dpkg -l | grep $WHATEVER_YOU_ARE_LOOKING_FOR ===OpenMPI=== ``` cd /opt/src/openmpi-.2.0.2 ./configure --prefix=/opt/openmpi --with-pmi=/usr \ --with-pmi-libdir=/usr/lib/x86_64-linux-gnu --with-pmix=internal —with-cuda make uninstall make -j 20 all install ``` ==Compiling SLIURM Jobs ``` #/bin/bash #We Storage some example code from Lawrence Livermore National lab in #/llnl/mpi #Copy it to your home directory cp -r /opt/llnl/tutorials/mpi/samples/C ~/mpi cd ~/mpi #Compile an example mpicc -lpmi -o mpi_hello mpi_hello.c #Run the example srun -n16 mpi_hello ``` Output ``` $ srun -n16 mpi_hello ``` ==Running Tasks ===SRUN https://slurm.schedmd.com/srun.html srun is synchronous and blocking. Use sbatch to submit a job to the queue. ``` #-n indicates the number of cores #--mem indicates the memory needed per node in megabytes #--time indicates the specified run time of the job $ srun -n16 --mem=2048 --time=00:05:00 ~/mpi/mpi_hello ``` ===SBATCH https://slurm.schedmd.com/sbatch.html ``` $ cat ~/mpi/run.sh #!/bin/bash #SBATCH -n 16 #SBATCH --mem=2048MB #SBATCH --time=00:30:00 #SBATCH --mail-user=YOUR_EMAIL@DOMAIN.COM #SBATCH --mail-type=ALL srun ~/mpi/mpi_hello ``` batch the job: ``` $ sbatch ~/mpi/run.sh Submitted batch job 536 $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 536 main run.sh cse-admi R 0:03 2 compute-0-[0,3head,node[01-03] ``` Check the Cluster status: ``` $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST main* up infinite 2 alloc compute-0-[0,3node[01-02] main* up infinite 6 idle compute-0-[4-9] controller up infinite 1 idle h1 test up infinite 1 down* testnode[03], head ``` ==Node Hardware: The cluster consists of 4 nodes, each with 64GB of RAM, 2x10 core CPU and 4x NVIDIA GTX1080s Mellanox Technologies MT27520 Family [ConnectX-3 Pro] ==HOWTO: Setup SLURM on your personal computer https://source2.cse.unr.edu/w/cse/tutorials/slurm-mpi-setup/

Version 1 vs 8

Edits

Content Changes