# Connecting to the CFAM Cluster
In order to connect to the cluster remotely, you will have to use either SSH or X2Go. Use X2Go if you would like a desktop interface, and SSH if you would like a command line interface.
**SSH:** If you are on Mac or Linux, open the terminal. If you are on Windows, you will have to download Cygwin and use that. [[ https://cygwin.com/install.html | Cygwin Installer ]]
During Cygwin's setup, it will ask you which packages to install. Under the "Net" category, there is a package called "openssh." Click on the "Skip" button to the left of openssh to cycle it to 7.2p 1-1, so that it will install that version. Once at a terminal, run the following command, replacing <netid> with your netid. It will prompt you for your password, and then you'll be connected!
```
ssh -Y <netid>@cfam-cluster.engr.unr.edu
```
**X2Go:** If you prefer a desktop environment, you can use X2Go to connect.
Download and install X2Go here:
[[ http://wiki.x2go.org/doku.php/download:start | X2Go Download ]]
Once in X2Go, type `cfam-cluster.engr.unr.edu` for the host, and your netid for the login.
Under the Session Type, select **XFCE **and you are ready to connect.
# SLURM Introduction
SLURM (Simple Linux Utility for Resource Management) is a utility built around cluster management. It allows for commands to utilize all of the nodes that the CFAM cluster contains. You likely will be using `srun` and `sbatch` primarily, but additional documentation on SLURM can be found here: [[ http://slurm.schedmd.com/documentation.html | SLURM Documentation ]]
`srun` is used to run jobs in parallel, and there are two options you will primarily use; `-N` specifies the number of nodes to run the command on, and `-n` is used to specify the number of cores to run it with. For example, to run the command `hostname` on all 8 of the nodes in the CFAM cluster, you would run: `srun -N 8 hostname`, and if you wanted to run `hostname` on 100 cores you would run: `srun -n 100 hostname`. Additional documentation for `srun` can be found here: [[ http://slurm.schedmd.com/srun.html | srun ]]
`sbatch` is used to queue batch scripts for the cluster to run. You would specify a batch script to add to the queue, and when the cluster has the resources necessary, it will execute the script. To add scripts to the queue, run `sbatch /path/of/script`. To view the queue of scripts, you can use the command `squeue`. Additional documentation for `sbatch` can be found here: [[ http://slurm.schedmd.com/sbatch.html | sbatch ]]
==Compiling SLURM Jobs
```
#/bin/bash
#We Storage some example code from Lawrence Livermore National lab in
#/opt/mpi
#Copy it to your home directory
cp -r /opt/mpi/tutorials/mpi/samples/C ~/mpi
cd ~/mpi
#Compile an example
mpicc -lpmi -o mpi_hello mpi_hello.c
#Run the example
srun -n16 --mpi=pmi2 mpi_hello
```
Output
```
$ srun -n16 mpi_hello
Hello from task 10 on node03!
Hello from task 9 on node03!
Hello from task 11 on node03!
Hello from task 12 on node03!
Hello from task 14 on node03!
Hello from task 8 on node03!
Hello from task 15 on node03!
Hello from task 13 on node03!
Hello from task 2 on node03!
Hello from task 0 on node03!
MASTER: Number of MPI tasks is: 16
Hello from task 1 on node01!
Hello from task 4 on node01!
Hello from task 7 on node01!
Hello from task 5 on node01!
Hello from task 6 on node01!
Hello from task 3 on node01!
```
==Running Tasks
===SRUN
https://slurm.schedmd.com/srun.html
srun is synchronous and blocking. Use sbatch to submit a job to the queue.
```
#-n indicates the number of cores
#--mem indicates the memory needed per node in megabytes
#--time indicates the specified run time of the job
$ srun -n16 --mem=2048 --time=00:05:00 ~/mpi/mpi_hello
```
===SBATCH
https://slurm.schedmd.com/sbatch.html
```
$ cat ~/mpi/run.sh
#!/bin/bash
#SBATCH -n 16
#SBATCH --mem=2048MB
#SBATCH --time=00:30:00
#SBATCH --mail-user=YOUR_EMAIL@DOMAIN.COM
#SBATCH --mail-type=ALL
#SBATCH --mpi=pmi2
srun ~/mpi/mpi_hello
```
batch the job:
```
$ sbatch ~/mpi/run.sh
Submitted batch job 536
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
536 main run.sh cse-admi R 0:03 2 node0[1,3]
```
Check the Cluster status:
```
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
main* up infinite 2 alloc node0[1,3]
controller up infinite 1 idle head
test up infinite 1 down* test
```
=Applications
==Interactive ANSYS
===salloc
```
lang=bash
salloc -N6 -w node01,node02,node03,node04,node05,node06 \
srun --x11 -N1 /opt/ansys17-2/v172/fluent/bin/fluent -r17.2.0 \
3ddp -t110 -pinfiniband -mpi=openmpi \
-cnf=node01,node02,node03,node04,node05,node06 -nm -ssh
```
**salloc** allocates 6 nodes and **srun --x11** runs an interactive job with X forwarding.
===sinteractive
```
lang=bash
sinteractive -p nodes -N 2
```
You can find the name of the allocated hosts with the following:
```
lang=bash
$ nodeset -e $SLURM_JOB_NODELIST
highmem01 highmem02
```
==SBATCH
Template **submit.sh**
```
lang=bash
#!/bin/bash
#SBATCH -N 6 # Nodes
#SBATCH -n 100 # Total number of tasks
#SBATCH --exclusive # exclusive lock
#SBATCH --mail-type=end
#SBATCH --mail-user=$YOUR_EMAIL
#SBATCH --workdir=/home/YOUR/PROJECT/PATH
#SBATCH -w highmem01,highmem02,node01,node02,node03,node04
FLUENT_HOSTS=highmem01,highmem02,node01,node02,node03,node04
FLUENT_BIN=~/ansys16/v162/fluent/bin/fluent
export FLUENT_GUI=off
if [ -z "$SLURM_NPROCS" ]; then
N=$(( $(echo $SLURM_TASKS_PER_NODE | sed -r 's/([0-9]+)\(x([0-9]+)\)/\1 * \2/') ))
else
N=$SLURM_NPROCS
fi
echo -e "N: $N\n";
~/ansys16/v162/fluent/bin/fluent 3ddp -g -slurm -ssh \
-cnf=$FLUENT_HOSTS -t $N \
-mpi=openmpi -pinfiniband -i journal
```
= Administration
==Modules
```
lang=bash
$ module avail
--------------------------------------- /usr/share/Modules/modulefiles --------------
dot module-git module-info modules null use.own
-------------------------------------- /etc/modulefiles --------------------------------
mpi/mpich-3.2-x86_64
-------------------------------------- /act/modulefiles --------------------------------
impi mpich/gcc mvapich2-2.1/gcc openmpi-1.6/gcc openmpi-1.8/gcc
intel mpich/intel mvapich2-2.1/intel openmpi-1.6/intel openmpi-1.8/intel
```
===Load a module
```
module load mpich/intel
```
== Adding Users
For users to be able to run SLURM commands across the entire cluster, they need to have the same user account on each node in the cluster. To do this, an ansible playbook has been created. To run this playbook, you will need to be logged in as root. To do this, run `su root` and input the password for root. Once logged in as root, enter into the virtual environment for ansible by running `source ~/ansible-env/bin/activate`.
Now that we're in the virtual environment, we will be editing the file containing users to add by running `nano ~/ansible-env/addusers/roles/common/tasks/main.yml`. Add the user and their uid to this file under both of the sections, following the scheme below. A users uid can be found by running the command `id <netid>`. Make sure to keep the spacing consistent with the lines already in the file.
```
- { name: 'netid', uid: enter_uid_here }
```
After you've added the user to both sections of this file, hit 'control-X' to exit, and 'Y' to save the file. Now the playbook is ready to be run. Run the following command and watch the terminal for errors:
```
ansible-playbook ~/ansible-env/addusers/site.yml -i ~/ansible-env/addusers/hosts
```
If the playbook executed without any errors, then the user has been added across all of the nodes.
==HOWTO: Setup SLURM on your personal computer
https://source2.cse.unr.edu/w/cse/tutorials/slurm-mpi-setup/