Version 4 vs 14
Version 4 vs 14
Edits
Edits
- Edit by • abuettner, Version 14
- Aug 11 2017 8:19 PM
- Edit by • abuettner, Version 4
- Jul 25 2017 2:23 PM
Edit Older Version 4... | Edit Current Version 14... |
Content Changes
Content Changes
# CME Atlas Cluster
## Setup
### Head Node
The head node will need an OS, a running DHCP/DNS server, a running TFTP server, and a few NFS exports.
I've used Ubuntu Server 16.04.
#### Network
The server being used has four network ports, two embedded and two on a PCI expansion card. For this
cluseter, port 0 (enp32s0) will be used to connect to the UNR network, and port 1 (enp34s0) will be used to
connect to a local network switch. This can be accomplished by editing `/etc/network/interfaces`.
In this case, enp32s0 is set to dhcp to get an IP address from the UNR network, and enp34s0 is set to static with an IP of 10.0.0.1.
```
# /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
source /etc/network/interfaces.d/*
# The loopback network interface
auto lo
iface lo inet loopback
auto enp32s0
iface enp32s0 inet dhcp
auto enp34s0
iface enp34s0 inet static
address 10.0.0.1
netmask 255.255.255.0
network 10.0.0.0
```
#### DHCP/DNS
I've used `dnsmasq` for the DHCP/DNS server, and it is fairly straightforward to setup.
First, install the appropriate packages:
```
sudo apt update
sudo apt install dnsmasq
```
Once installed, the config file for dnsmasq is located at `/etc/dnsmasq.conf`. Below is an example config file.
This config file specifies an interface for dnsmasq to run on, in this case enp34s0 (port 1), which ensures
dhcp is only run on the local network (10.0.0.0), and not the UNR network (134.197.0.0).
The `dchp-option` line tells dhcp clients which IP address to PXE boot from, and the
`dhcp-boot` lines tell dhcp clients which PXE files to boot with.
```
# /etc/dnsmasq.conf
interface=enp34s0
dhcp-range=10.0.0.100,10.0.0.254,12h
dhcp-option=3,10.0.0.1
dhcp-authoritative
dhcp-boot=pxelinux.0
dhcp-boot=net:normalarch,pxelinux.0
#Optionally define MAC/IP for specific nodes
#dhcp-host=xx:xx:xx:xx:xx:xx,10.0.0.101
#dhcp-host=xx:xx:xx:xx:xx:xx,10.0.0.102
```
Make the dnsmasq service start on boot, and restart it to ensure all changes are live.
```
sudo update-rc.d dnsmasq defaults
sudo service dnsmasq restart
```
#### TFTP
For PXE booting clients to boot, they will need some files to boot with, provided by the head node. To accomplish this, a TFTP server must be configured, in this case `tftpd-hpa` was used. Install the appropriate packages:
```
sudo apt update
sudo apt install tftpd-hpa
```
The configuration file for tftpd-hpa is located at `/etc/default/tftdp-hpa`. Below is an example config file. This config file specifies some options for the tftpd-hpa service, as well as specifying the root directorhy of the tftp server, in this case `/tftp`.
```
# /etc/default/tftpd-hpa
TFTP_USERNAME="tftp"
TFTP_DIRECTORY="/tftp"
TFTP_ADDRESS=":69"
TFTP_OPTIONS="--secure"
RUN_DAEMON="yes"
OPTIONS="-l -s /tftp"
```
Once configured, the `/tftp` directory will need to be created and populated with some files. In order for PXE clients to boot, the following files are needed to be in the `/tftp` directory:
```
boot/ images/ pxelinux.0 pxelinux.cfg/
```
Most of the files can be populated from these commands:
```
sudo mkdir /tftp
sudo cp /usr/lib/PXELINUX/pxelinux.0 /tftp
sudo mkdir -p /tftp/boot
sudo cp -r /usr/lib/syslinux/modules/bios /tftp/boot/isolinux
sudo mkdir -p /tftp/pxelinux.cfg
sudo mkdir -p /tftp/images
sudo touch /tftp/pxelinux.cfg/default
```
Make the tftpd-hpa service start on boot, and restart it to ensure all changes are live.
```
sudo update-rc.d tftpd-hpa defaults
sudo service tftpd-hpa restart
```
#### PXE
The menu file for PXE is now located at `/tftp/pxelinux.cfg/default`. This can be configured to your liking, but here is a basic menu that will get the job done. The most important part to keep consistant if the menu is changed is the boot option for the NFSRoot label. This tells the PXE booting client to use the kernel located in the TFTP root, and to mount it's root filesystem from 10.0.0.1:/exports/xenial (which will be created later) as readonly.
```
# /tftp/pxelinux.cfg/default
default menu.c32
prompt 0
timeout 30
ONTIMEOUT AtlasNFSRoot
MENU TITLE PXE Boot Menu
LABEL AtlasNFSRoot
MENU LABEL Atlas NFS Root
KERNEL /images/ubuntu-1604/linux
APPEND root=/dev/nfs initrd=/images/ubuntu-1604/initrd.img nfsroot=10.0.0.1:/exports/xenial ip=dhcp ro
```
#### Exports
We will use `/exports/` as our exporting directory, so it will need to be created.
```
sudo mkdir /exports
```
Now, add this export to `/etc/exports`, and sync the changes with `sudo exportfs -arv`.
```
# /etc/exports: the access control list for filesystems which may be exported
# to NFS clients. See exports(5).
/exports/xenial 10.0.0.0/24(ro,async,no_root_squash,no_subtree_check,insecure)
```
#### Creating the Filesystem
We will use `debootstrap` to create a filesystem for the booting nodes to mount, which can be installed via
```
sudo apt update
sudo apt install debootstrap
```
Once created, use debootstrap to create a filesystem with a specified archetecture, distribution, and mirror, in our case amd64, xenial, and archive.ubuntu.com.
```
sudo debootstrap --arch amd64 xenial /exports/xenial http://archive.ubuntu.com/ubuntu
```
After debootstrap is finished, a few things will need to be configured within the created filesystem. You can use `chroot` to enter the filesystem and install packages and make configuration changes.
```
sudo chroot /exports/xenial/
```
Some essential packages to install within the filesystem are:
```
sudo apt install linux-firmware nano build-essential
```
For clients to boot from this nfsroot, some changes to fstab will need to be made. The nfs option tells fstab to mount a folder via nfs, and the tempfs option mounts a folder in memory. Within the chroot enviornment, replace `/ect/fstab` with this:
```
proc /proc proc defaults 0 0
/dev/nfs / nfs defaults 1 1
none /tmp tmpfs defaults 0 0
none /var/tmp tmpfs defaults,acl 0 0
none /var/log tmpfs defaults,acl 0 0
none /var/lib/lightdm-data tmpfs defaults 0 0
none /var/lib/ubuntu-drivers-common tmpfs defaults 0 0
none /var/lib/pbis tmpfs defaults 0 0
none /var/lib/lightdm tmpfs defaults 0 0
none /home tmpfs defaults,acl 0 0
none /usr/local/home/cse-admin tmpfs defaults 0 0
none /var/lib/dhcp tmpfs defaults 0 0
```
#### Updating network configuration
In order to ensure booting does not lag if interfaces are not up, enable hotplugging:
Edit `/etc/network/interfaces` and ensure `allow-hotplug` is set for the primary interface.
```
#/etc/network/interfaces
source-directory /etc/network/interfaces.d
allow-hotplug enp34s0
iface enp34s0 inet dhcp
```
Exit the chroot enviornment with `exit` when finished.
#### Generating initramfs
You will need to generate a new kernel and initramfs in order for it to support nfsroot arguments. This can be done with `initramfs-tools`
```
sudo apt update
sudo apt install initramfs-tools
```
Edit `/etc/initramfs-tools/initramfs.conf`, and change the entries BOOT to nfs, MODULES to most, and NFSROOT to auto.
```
# initramfs.conf
# Configuration file for mkinitramfs(8). See initramfs.conf(5).
#
# Note that configuration options from this file can be overridden
# by config files in the /etc/initramfs-tools/conf.d directory.
BOOT=nfs
#
# MODULES: [ most | netboot | dep | list ]
#
# most - Add most filesystem and all harddrive drivers.
#
# dep - Try and guess which modules to load.
#
# netboot - Add the base modules, network modules, but skip block devices.
#
# list - Only include modules from the 'additional modules' list
#
MODULES=most
#
# BUSYBOX: [ y | n | auto ]
#
# Use busybox shell and utilities. If set to n, klibc utilities will be used.
# If set to auto (or unset), busybox will be used if installed and klibc will
# be used otherwise.
#
BUSYBOX=auto
#
# COMPCACHE_SIZE: [ "x K" | "x M" | "x G" | "x %" ]
#
# Amount of RAM to use for RAM-based compressed swap space.
#
# An empty value - compcache isn't used, or added to the initramfs at all.
# An integer and K (e.g. 65536 K) - use a number of kilobytes.
# An integer and M (e.g. 256 M) - use a number of megabytes.
# An integer and G (e.g. 1 G) - use a number of gigabytes.
# An integer and % (e.g. 50 %) - use a percentage of the amount of RAM.
#
# You can optionally install the compcache package to configure this setting
# via debconf and have userspace scripts to load and unload compcache.
#
COMPCACHE_SIZE=""
#
# COMPRESS: [ gzip | bzip2 | lzma | lzop | xz ]
#
COMPRESS=gzip
#
# NFS Section of the config.
#
#
# DEVICE: ...
#
# Specify a specific network interface, like eth0
# Overridden by optional ip= bootarg
#
DEVICE=
#
# NFSROOT: [ auto | HOST:MOUNT ]
#
NFSROOT=auto
```
Now, generate the initramfs, and copy it and the kernel to the tftp directory:
Note:
```
sudo mkdir ~/tmp
sudo mkinitramfs -c -o ~/tmp/PXE-initramfs-$(uname -r)
sudo cp ~/tmp/PXE-initrd.img-$(uname -r) /tftp/images/ubuntu-1604/initrd.img
sudo cp /boot/vmlinuz-$(uname -r) /tftp/images/ubuntu-1604/linux
```
### Booting
At this point, you should have a bootable system. Add another node to the local network switch, turn it on, and enable PXE booting in the BIOS. The machine should come up with a PXE menu, and boot from the AtlasNFSRoot
# CME Atlas Cluster
## Setup
### Head Node
The head node will need an OS, a running DHCP/DNS server, a running TFTP server, and a few NFS exports.
I've used Ubuntu Server 16.04.
#### Some basics
Install a couple things for good measure, just in case.
```
sudo apt update
sudo apt install openssh-server ansible build-essential openssh-server nfs-kernel-server
```
#### Network
The server being used has four network ports, two embedded and two on a PCI expansion card. For this
cluseter, port 0 (enp32s0) will be used to connect to the UNR network, and port 1 (enp34s0) will be used to
connect to a local network switch. This can be accomplished by editing `/etc/network/interfaces`.
In this case, enp32s0 is set to dhcp to get an IP address from the UNR network, and enp34s0 is set to static with an IP of 10.0.0.1.
```
# /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
source /etc/network/interfaces.d/*
# The loopback network interface
auto lo
iface lo inet loopback
auto enp32s0
iface enp32s0 inet dhcp
auto enp34s0
iface enp34s0 inet static
address 10.0.0.1
netmask 255.255.255.0
network 10.0.0.0
```
#### DHCP/DNS
I've used `dnsmasq` for the DHCP/DNS server, and it is fairly straightforward to setup.
First, install the appropriate packages:
```
sudo apt update
sudo apt install dnsmasq
```
Once installed, the config file for dnsmasq is located at `/etc/dnsmasq.conf`. Below is an example config file.
This config file specifies an interface for dnsmasq to run on, in this case enp34s0 (port 1), which ensures
dhcp is only run on the local network (10.0.0.0), and not the UNR network (134.197.0.0).
The `dchp-option` line tells dhcp clients which IP address to PXE boot from, and the
`dhcp-boot` lines tell dhcp clients which PXE files to boot with.
```
# /etc/dnsmasq.conf
interface=enp34s0
dhcp-range=10.0.0.100,10.0.0.254,12h
dhcp-option=3,10.0.0.1
dhcp-authoritative
dhcp-boot=pxelinux.0
dhcp-boot=net:normalarch,pxelinux.0
#Optionally define MAC/IP for specific nodes
#dhcp-host=xx:xx:xx:xx:xx:xx,compute-1-01,10.0.0.101
#dhcp-host=xx:xx:xx:xx:xx:xx,compute-1-02,10.0.0.102
```
Make the dnsmasq service start on boot, and restart it to ensure all changes are live.
```
sudo update-rc.d dnsmasq defaults
sudo service dnsmasq restart
```
#### TFTP
For PXE booting clients to boot, they will need some files to boot with, provided by the head node. To accomplish this, a TFTP server must be configured, in this case `tftpd-hpa` was used. Install the appropriate packages:
```
sudo apt update
sudo apt install tftpd-hpa
```
The configuration file for tftpd-hpa is located at `/etc/default/tftdp-hpa`. Below is an example config file. This config file specifies some options for the tftpd-hpa service, as well as specifying the root directorhy of the tftp server, in this case `/tftp`.
```
# /etc/default/tftpd-hpa
TFTP_USERNAME="tftp"
TFTP_DIRECTORY="/tftp"
TFTP_ADDRESS=":69"
TFTP_OPTIONS="--secure"
RUN_DAEMON="yes"
OPTIONS="-l -s /tftp"
```
Once configured, the `/tftp` directory will need to be created and populated with some files. In order for PXE clients to boot, the following files are needed to be in the `/tftp` directory:
```
boot/ images/ pxelinux.0 pxelinux.cfg/
```
Most of the files can be populated from these commands:
```
sudo mkdir /tftp
sudo cp /usr/lib/PXELINUX/pxelinux.0 /tftp
sudo mkdir -p /tftp/boot
sudo cp -r /usr/lib/syslinux/modules/bios /tftp/boot/isolinux
sudo mkdir -p /tftp/pxelinux.cfg
sudo mkdir -p /tftp/images
sudo touch /tftp/pxelinux.cfg/default
```
Make the tftpd-hpa service start on boot, and restart it to ensure all changes are live.
```
sudo update-rc.d tftpd-hpa defaults
sudo service tftpd-hpa restart
```
#### Creating the Filesystem
We will use `/exports/` as our exporting directory, so it will need to be created.
```
sudo mkdir /exports
```
Then we use `debootstrap` to create a filesystem for the booting nodes to mount, which can be installed via
```
sudo apt update
sudo apt install debootstrap
```
Once created, use debootstrap to create a filesystem with a specified archetecture, distribution, and mirror, in our case amd64, xenial, and archive.ubuntu.com.
```
sudo debootstrap --arch amd64 xenial /exports/xenial http://archive.ubuntu.com/ubuntu
```
After debootstrap is finished, a few things will need to be configured within the created filesystem.
First, copy over the current apt sources from the head node:
```
sudo cp /etc/apt/sources.list /exports/xenial/etc/apt/sources.list
```
Now use `chroot` to enter the filesystem and install packages and make configuration changes.
```
sudo chroot /exports/xenial/
```
Some essential packages to install within the filesystem are:
```
sudo apt install linux-firmware nano build-essential openssh-server munge slurm-llnl ntp nfs-common
```
For clients to boot from this nfsroot, some changes to fstab will need to be made. The nfs option tells fstab to mount a folder via nfs, and the tempfs option mounts a folder in memory.
Other NFS mounts are included for mirror synchronicity across compute nodes (you'll see it all as we go)
Within the chroot enviornment, replace `/ect/fstab` with this:
```
#/etc/fstab
proc /proc proc defaults 0 0
/dev/nfs / nfs defaults,ro 1 1
none /tmp tmpfs defaults 0 0
none /var/tmp tmpfs defaults 0 0
none /var/log tmpfs defaults 0 0
none /var/lib/lightdm-data tmpfs defaults 0 0
none /var/lib/ubuntu-drivers-common tmpfs defaults 0 0
none /var/lib/pbis tmpfs defaults 0 0
none /var/lib/lightdm tmpfs defaults 0 0
#none /usr/local/home/cse-admin tmpfs defaults 0 0
none /var/lib/dhcp tmpfs defaults 0 0
none /var/spool/slurm tmpfs defaults,uid=slurm,gid=slurm 0 0
10.0.0.1:/opt /opt nfs defaults,ro,nolock 0 0
10.0.0.1:/usr /usr nfs defaults,ro,nolock 0 0
10.0.0.1:/home /home nfs defaults,rw,nolock 0 0
10.0.0.1:/scratch /scratch nfs defaults,rw,nolock 0 0
10.0.0.1:/etc/slurm-llnl /etc/slurm-llnl nfs defaults,ro,nolock 0 0
```
#### Updating network configuration
In order to ensure booting does not lag if interfaces are not up, enable hotplugging of the interface.
Edit `/etc/network/interfaces` and ensure `allow-hotplug` is set for the primary PXE boot interface.
```
#/etc/network/interfaces
source-directory /etc/network/interfaces.d
allow-hotplug enp34s0
iface enp34s0 inet dhcp
```
Exit the chroot enviornment with `exit` when finished.
#### PXE
The menu file for PXE is now located at `/tftp/pxelinux.cfg/default`. This can be configured to your liking, but here is a basic menu that will get the job done. The most important part to keep consistant if the menu is changed is the boot option for the NFSRoot label. This tells the PXE booting client to use the kernel located in the TFTP root, and to mount it's root filesystem from 10.0.0.1:/exports/xenial (which will be created later) as readonly.
```
# /tftp/pxelinux.cfg/default
default menu.c32
prompt 0
timeout 30
ONTIMEOUT AtlasNFSRoot
MENU TITLE PXE Boot Menu
LABEL AtlasNFSRoot
MENU LABEL Atlas NFS Root
KERNEL /images/ubuntu-1604/linux
APPEND root=/dev/nfs initrd=/images/ubuntu-1604/initrd.img nfsroot=10.0.0.1:/exports/xenial ip=dhcp ro
```
#### SSH Access Setup
Now we will generate an ssh key that will be distributed to each node and allow seamless ssh access
```
ssh-keygen
sudo mkdir -p /exports/xenial/root/.ssh/
sudo cp ~/.ssh/id_rsa.pub /exports/xenial/root/.ssh/authorized_keys
```
(If the last command doesnt work, just copy-pasta the hash into the authorized keys file manually)
We can also put our key in our own authorized keys file, allowing other nodes to be accessed easily.
```
sudo cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
```
(If this command doesnt work, just copy-pasta the hash into the authorized keys file manually)
#### Management setup
Next we need to install a few things on both the head node and PXE root filesystem.
Management software such as Slurm require packages such as munge (credentials) and ntp (synced time) to work correctly.
Ensure they are installed on the head node:
```
sudo apt update
sudo apt install ntp munge slurm-llnl
```
##### NTP and munge
Configure ntp as a local-net timeserver by adding the following lines to the end of `/etc/ntp.conf`:
```
server 127.127.1.0
fudge 127.127.1.0 stratum 10
```
Similarly, we edit the ntp configuration of the PXE root filesystem by editing `/exports/xenial/etc/ntp.conf`.
and adding the line:
```
server head-01 iburst
```
Where head-01 is the hostname of the head node.
Restart the ntp service and check munge on the head node
```
sudo service ntp restart
munge -n | unmunge
```
##### Slurm
make a directory for slurm in `/var/spool` and `/var/spool/slurm-state` on head node and PXE root
```
sudo mkdir -p /var/spool/slurm /var/spool/slurm-state
sudo mkdir -p /exports/xenial/var/spool/slurm /exports/xenial/var/spool/slurm-state
```
Now make sure they are owned by slurm
```
sudo chown -R slurm:slurm /var/spool/slurm /var/pool/slurm-state
sudo chown -R slurm:slurm /exports/xenial/var/spool/slurm /exports/xenial/var/spool/slurm-state
```
Edit the `/usr/lib/tmpfiles.d/[munge, slurmd, slurmctld].conf` files on head node. These are mounted
directly in the PXE boot too, so only one set are needed.
```
#munge.conf
d /var/run/munge 0755 munge munge -
d /var/log/munge 0700 munge munge -
d /var/lib/munge 0711 munge munge -
```
```
#slurmd.conf
d /var/run/slurm-llnl 0755 slurm slurm - -
```
```
#slurmctld.conf
d /var/run/slurm-llnl 0755 slurm slurm - -
```
Now edit `/etc/slurm-llnl/slurm.conf` in the head node, changing the node configurations
as needed for your system. This is not a complete conf file; you must make one by following the link below.
https://slurm.schedmd.com/configurator.easy.html
Edit as needed.
```
ControlMachine=head-01
ControlAddr=10.0.0.1
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurm
SlurmUser=slurm
StateSaveLocation=/var/spool/slurm-state
SlurmctldTimeout=10
SlurmdTimeout=10
ClusterName=cme_atlas
# COMPUTE NODES
NodeName=compute-1-01 Sockets=1 CPUs=4 RealMemory=3500 CoresPerSocket=4 ThreadsPerCore=1 State=IDLE
NodeName=compute-1-02 Sockets=1 CPUs=4 RealMemory=7900 CoresPerSocket=4 ThreadsPerCore=1 State=IDLE
NodeName=compute-1-03 Sockets=1 CPUs=4 RealMemory=7900 CoresPerSocket=4 ThreadsPerCore=1 State=IDLE
NodeName=head-01 Sockets=1 CPUs=4 RealMemory=7900 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
PartitionName=debug Nodes=head-01 Default=NO MaxTime=INFINITE State=UP
PartitionName=comp Nodes=head-01,compute-1-[01-03] Default=YES MaxTime=INFINITE State=UP
```
#### Exportfs
Finally, we export all this glory over NFS.
Add our needed exports to `/etc/exports`, and sync the changes with `sudo exportfs -arv`.
```
# /etc/exports: the access control list for filesystems which may be exported
# to NFS clients. See exports(5).
/exports/xenial 10.0.0.0/24(ro,async,no_root_squash,no_subtree_check,insecure)
/opt 10.0.0.0/24(ro,async,no_root_squash,no_subtree_check,insecure)
/home 10.0.0.0/24(rw,sync,no_root_squash,no_subtree_check,insecure)
/scratch 10.0.0.0/24(rw,sync,no_subtree_check,insecure)
/usr 10.0.0.0/24(ro,async,no_root_squash,no_subtree_check,insecure)
/etc/slurm-llnl 10.0.0.0/24(ro,async,no_root_squash,no_subtree_check)
```
#### Generating initramfs
Last but not least, you will need to generate a new kernel and initramfs in order for it to support nfsroot arguments. This can be done with `initramfs-tools`
```
sudo apt update
sudo apt install initramfs-tools
```
Edit `/etc/initramfs-tools/initramfs.conf`, and change the entries BOOT to nfs, MODULES to most, and NFSROOT to auto.
```
# initramfs.conf
# Configuration file for mkinitramfs(8). See initramfs.conf(5).
#
# Note that configuration options from this file can be overridden
# by config files in the /etc/initramfs-tools/conf.d directory.
BOOT=nfs
MODULES=most
BUSYBOX=auto
COMPRESS=gzip
NFSROOT=auto
```
Now, generate the initramfs, and copy it and the kernel to the tftp directory:
Note:
```
sudo mkdir ~/tmp
sudo mkinitramfs -o /tftp/images/ubuntu-1604/initrd.img
sudo cp /boot/vmlinuz-$(uname -r) /tftp/images/ubuntu-1604/linux
```
IMPORTANT:
Edit your `/etc/initramfs-tools/initramfs.conf` and comment out the change the `BOOT=nfs` line.
This prevents updateinitramfs from turning the head node into a PXE boot machine later on!
### Booting
At this point, you should have a bootable system. Add another node to the local network, turn it on, and enable PXE booting in the BIOS. The machine should come up with a PXE menu, and boot from the AtlasNFSRoot.
After booting, attempt to connect to each compute node via ssh. Ensure munge, slurm, and other tools are operating normally.
## Infiniband
...Now the hard part. Maybe.
usually infiniband has special drivers from the manufacturer, but lets try to do it using some OpenSM and the OpenFabrics Enterprise Distribution (OFED).
IMPORTANT: Do all this stuff in both the head node and PXE root environment!
Add a file to `/etc/udev/rules.d` called, say, `99-udev-umad.rules`. This will cause the correct entries to be created in `/sys`.
Insert the following:
```
#/etc/udev/rules.d/99-udev-umad.rules
KERNEL==”umad*”, NAME=”infiniband/%k”, MODE=”0666″
KERNEL==”issm*”, NAME=”infiniband/%k”, MODE=”0666″
```
Edit `/etc/modules` and add the following modules:
```
ib_sa
ib_cm
ib_umad
ib_addr
ib_uverbs
ib_ipoib
ib_ipath
ib_qib
```
Next, `sudo apt install opensm`. This will install the subnet manager and all the relevant dependencies (hopefully).
Then add the relevant entries for the interface into `/etc/network/interfaces` file:
```
auto ib0
iface ib0 inet static
address 10.0.1.1
netmask 255.255.255.0
```
Then reboot. This will create the relevant InfiniBand entries in `/sys`, load the IPoIB modules, and bring up the InfiniBand port with an ip address.
You should now have a functioning infiniband port on your Ubuntu machines, provided
Now go test the network with `netperf` (or something) for speed and function and hope things didn't get FUBAR while you blinked. :)
# CME Atlas Cluster
## Setup
### Head Node
The head node will need an OS, a running DHCP/DNS server, a running TFTP server, and a few NFS exports.
I've used Ubuntu Server 16.04.
#### Some basics
Install a couple things for good measure, just in case.
```
sudo apt update
sudo apt install openssh-server ansible build-essential openssh-server nfs-kernel-server
```
#### Network
The server being used has four network ports, two embedded and two on a PCI expansion card. For this
cluseter, port 0 (enp32s0) will be used to connect to the UNR network, and port 1 (enp34s0) will be used to
connect to a local network switch. This can be accomplished by editing `/etc/network/interfaces`.
In this case, enp32s0 is set to dhcp to get an IP address from the UNR network, and enp34s0 is set to static with an IP of 10.0.0.1.
```
# /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
source /etc/network/interfaces.d/*
# The loopback network interface
auto lo
iface lo inet loopback
auto enp32s0
iface enp32s0 inet dhcp
auto enp34s0
iface enp34s0 inet static
address 10.0.0.1
netmask 255.255.255.0
network 10.0.0.0
```
#### DHCP/DNS
I've used `dnsmasq` for the DHCP/DNS server, and it is fairly straightforward to setup.
First, install the appropriate packages:
```
sudo apt update
sudo apt install dnsmasq
```
Once installed, the config file for dnsmasq is located at `/etc/dnsmasq.conf`. Below is an example config file.
This config file specifies an interface for dnsmasq to run on, in this case enp34s0 (port 1), which ensures
dhcp is only run on the local network (10.0.0.0), and not the UNR network (134.197.0.0).
The `dchp-option` line tells dhcp clients which IP address to PXE boot from, and the
`dhcp-boot` lines tell dhcp clients which PXE files to boot with.
```
# /etc/dnsmasq.conf
interface=enp34s0
dhcp-range=10.0.0.100,10.0.0.254,12h
dhcp-option=3,10.0.0.1
dhcp-authoritative
dhcp-boot=pxelinux.0
dhcp-boot=net:normalarch,pxelinux.0
#Optionally define MAC/IP for specific nodes
#dhcp-host=xx:xx:xx:xx:xx:xx,compute-1-01,10.0.0.101
#dhcp-host=xx:xx:xx:xx:xx:xx,compute-1-02,10.0.0.102
```
Make the dnsmasq service start on boot, and restart it to ensure all changes are live.
```
sudo update-rc.d dnsmasq defaults
sudo service dnsmasq restart
```
#### TFTP
For PXE booting clients to boot, they will need some files to boot with, provided by the head node. To accomplish this, a TFTP server must be configured, in this case `tftpd-hpa` was used. Install the appropriate packages:
```
sudo apt update
sudo apt install tftpd-hpa
```
The configuration file for tftpd-hpa is located at `/etc/default/tftdp-hpa`. Below is an example config file. This config file specifies some options for the tftpd-hpa service, as well as specifying the root directorhy of the tftp server, in this case `/tftp`.
```
# /etc/default/tftpd-hpa
TFTP_USERNAME="tftp"
TFTP_DIRECTORY="/tftp"
TFTP_ADDRESS=":69"
TFTP_OPTIONS="--secure"
RUN_DAEMON="yes"
OPTIONS="-l -s /tftp"
```
Once configured, the `/tftp` directory will need to be created and populated with some files. In order for PXE clients to boot, the following files are needed to be in the `/tftp` directory:
```
boot/ images/ pxelinux.0 pxelinux.cfg/
```
Most of the files can be populated from these commands:
```
sudo mkdir /tftp
sudo cp /usr/lib/PXELINUX/pxelinux.0 /tftp
sudo mkdir -p /tftp/boot
sudo cp -r /usr/lib/syslinux/modules/bios /tftp/boot/isolinux
sudo mkdir -p /tftp/pxelinux.cfg
sudo mkdir -p /tftp/images
sudo touch /tftp/pxelinux.cfg/default
```
Make the tftpd-hpa service start on boot, and restart it to ensure all changes are live.
```
sudo update-rc.d tftpd-hpa defaults
sudo service tftpd-hpa restart
```
#### Creating the Filesystem
We will use `/exports/` as our exporting directory, so it will need to be created.
```
sudo mkdir /exports
```
Then we use `debootstrap` to create a filesystem for the booting nodes to mount, which can be installed via
```
sudo apt update
sudo apt install debootstrap
```
Once created, use debootstrap to create a filesystem with a specified archetecture, distribution, and mirror, in our case amd64, xenial, and archive.ubuntu.com.
```
sudo debootstrap --arch amd64 xenial /exports/xenial http://archive.ubuntu.com/ubuntu
```
After debootstrap is finished, a few things will need to be configured within the created filesystem.
First, copy over the current apt sources from the head node:
```
sudo cp /etc/apt/sources.list /exports/xenial/etc/apt/sources.list
```
Now use `chroot` to enter the filesystem and install packages and make configuration changes.
```
sudo chroot /exports/xenial/
```
Some essential packages to install within the filesystem are:
```
sudo apt install linux-firmware nano build-essential openssh-server munge slurm-llnl ntp nfs-common
```
For clients to boot from this nfsroot, some changes to fstab will need to be made. The nfs option tells fstab to mount a folder via nfs, and the tempfs option mounts a folder in memory.
Other NFS mounts are included for mirror synchronicity across compute nodes (you'll see it all as we go)
Within the chroot enviornment, replace `/ect/fstab` with this:
```
#/etc/fstab
proc /proc proc defaults 0 0
/dev/nfs / nfs defaults,ro 1 1
none /tmp tmpfs defaults 0 0
none /var/tmp tmpfs defaults 0 0
none /var/log tmpfs defaults 0 0
none /var/lib/lightdm-data tmpfs defaults 0 0
none /var/lib/ubuntu-drivers-common tmpfs defaults 0 0
none /var/lib/pbis tmpfs defaults 0 0
none /var/lib/lightdm tmpfs defaults 0 0
#none /usr/local/home/cse-admin tmpfs defaults 0 0
none /var/lib/dhcp tmpfs defaults 0 0
none /var/spool/slurm tmpfs defaults,uid=slurm,gid=slurm 0 0
10.0.0.1:/opt /opt nfs defaults,ro,nolock 0 0
10.0.0.1:/usr /usr nfs defaults,ro,nolock 0 0
10.0.0.1:/home /home nfs defaults,rw,nolock 0 0
10.0.0.1:/scratch /scratch nfs defaults,rw,nolock 0 0
10.0.0.1:/etc/slurm-llnl /etc/slurm-llnl nfs defaults,ro,nolock 0 0
```
#### Updating network configuration
In order to ensure booting does not lag if interfaces are not up, enable hotplugging of the interface.
Edit `/etc/network/interfaces` and ensure `allow-hotplug` is set for the primary PXE boot interface.
```
#/etc/network/interfaces
source-directory /etc/network/interfaces.d
allow-hotplug enp34s0
iface enp34s0 inet dhcp
```
Exit the chroot enviornment with `exit` when finished.
#### PXE
The menu file for PXE is now located at `/tftp/pxelinux.cfg/default`. This can be configured to your liking, but here is a basic menu that will get the job done. The most important part to keep consistant if the menu is changed is the boot option for the NFSRoot label. This tells the PXE booting client to use the kernel located in the TFTP root, and to mount it's root filesystem from 10.0.0.1:/exports/xenial (which will be created later) as readonly.
```
# /tftp/pxelinux.cfg/default
default menu.c32
prompt 0
timeout 30
ONTIMEOUT AtlasNFSRoot
MENU TITLE PXE Boot Menu
LABEL AtlasNFSRoot
MENU LABEL Atlas NFS Root
KERNEL /images/ubuntu-1604/linux
APPEND root=/dev/nfs initrd=/images/ubuntu-1604/initrd.img nfsroot=10.0.0.1:/exports/xenial ip=dhcp ro
```
#### SSH Access Setup
Now we will generate an ssh key that will be distributed to each node and allow seamless ssh access
#### Exports```
We will use `/exports/` as our exporting directory, so it will need to be created.ssh-keygen
sudo mkdir -p /exports/xenial/root/.ssh/
```sudo cp ~/.ssh/id_rsa.pub /exports/xenial/root/.ssh/authorized_keys
sudo mkdir /exports```
(If the last command doesnt work, just copy-pasta the hash into the authorized keys file manually)
We can also put our key in our own authorized keys file, allowing other nodes to be accessed easily.
```
Now, add this export to `/etc/exports`, and sync the changes with `sudo exportfs -arv`.sudo cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
```
# /etc/exports: the access control list for filesystems which may be exported(If this command doesnt work, just copy-pasta the hash into the authorized keys file manually)
#### Management setup
# to NFS clients. See exports(5)Next we need to install a few things on both the head node and PXE root filesystem.
Management software such as Slurm require packages such as munge (credentials) and ntp (synced time) to work correctly.
/exports/xenial 10.0.0.0/24(ro,async,no_root_squash,no_subtree_check,insecure)Ensure they are installed on the head node:
```
#### Creating the Filesystem
We will use `debootstrap` to create a filesystem for the booting nodes to mount, which can be installed viasudo apt update
sudo apt install ntp munge slurm-llnl
```
sudo apt update##### NTP and munge
sudo apt install debootstrapConfigure ntp as a local-net timeserver by adding the following lines to the end of `/etc/ntp.conf`:
```
server 127.127.1.0
fudge 127.127.1.0 stratum 10
```
Similarly, we edit the ntp configuration of the PXE root filesystem by editing `/exports/xenial/etc/ntp.conf`.
and adding the line:
```
Once created, use debootstrap to create a filesystem with a specified archetecture, distribution, and mirror, in our case amd64, xenial, and archive.ubuntu.com.server head-01 iburst
```
Where head-01 is the hostname of the head node.
Restart the ntp service and check munge on the head node
```
sudo debootstrap --arch amd64 xenial /exports/xenial http://archive.ubuntu.com/ubuntuservice ntp restart
munge -n | unmunge
```
##### Slurm
After debootstrap is finished, a few things will need to be configured within the created filesystem. You can use `chroot` to enter the filesystem and install packages and make configuration changes.make a directory for slurm in `/var/spool` and `/var/spool/slurm-state` on head node and PXE root
```
sudo chroot /exports/xenial/mkdir -p /var/spool/slurm /var/spool/slurm-state
sudo mkdir -p /exports/xenial/var/spool/slurm /exports/xenial/var/spool/slurm-state
```
Some essential packages to install within the filesystem are:Now make sure they are owned by slurm
```
sudo apt install linux-firmware nano build-essentialchown -R slurm:slurm /var/spool/slurm /var/pool/slurm-state
sudo chown -R slurm:slurm /exports/xenial/var/spool/slurm /exports/xenial/var/spool/slurm-state
```
Edit the `/usr/lib/tmpfiles.d/[munge, slurmd, slurmctld].conf` files on head node. These are mounted
directly in the PXE boot too, so only one set are needed.
```
#munge.conf
For clients to boot from this nfsroot, some changes to fstab will need to be made. The nfs option tells fstab to mount a folder via nfs, and the tempfs option mounts a folder in memory. Within the chroot enviornment, replace `/ect/fstab` with this:d /var/run/munge 0755 munge munge -
d /var/log/munge 0700 munge munge -
d /var/lib/munge 0711 munge munge -
```
```
proc /proc proc defaults 0 0#slurmd.conf
/dev/nfs / nfs defaults 1 1d /var/run/slurm-llnl 0755 slurm slurm - -
none /tmp tmpfs defaults 0 0```
```
#slurmctld.conf
none /var/tmp tmpfs defaults,acl 0 0d /var/run/slurm-llnl 0755 slurm slurm - -
```
Now edit `/etc/slurm-llnl/slurm.conf` in the head node, changing the node configurations
none /var/log tmpfs defaults,acl 0 0as needed for your system. This is not a complete conf file; you must make one by following the link below.
none /var/lib/lightdm-data tmpfs defaults 0 0https://slurm.schedmd.com/configurator.easy.html
Edit as needed.
```
ControlMachine=head-01
none /var/lib/ubuntu-drivers-common tmpfs defaults 0 0ControlAddr=10.0.0.1
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
none /var/lib/pbis tmpfs defaults 0 0SlurmctldPort=6817
none /var/lib/lightdm tmpfs defaults 0 0SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
none /home tmpfs defaults,acl 0 0SlurmdPort=6818
none /usr/local/home/cse-admin tmpfs defaults 0 0SlurmdSpoolDir=/var/spool/slurm
none /var/lib/dhcp tmpfs defaults 0 0SlurmUser=slurm
```StateSaveLocation=/var/spool/slurm-state
#### Updating network configurationSlurmctldTimeout=10
In order to ensure booting does not lag if interfaces are not up, enable hotplugging:SlurmdTimeout=10
ClusterName=cme_atlas
Edit `/etc/network/interfaces` and ensure `allow-hotplug` is set for the primary interface.# COMPUTE NODES
NodeName=compute-1-01 Sockets=1 CPUs=4 RealMemory=3500 CoresPerSocket=4 ThreadsPerCore=1 State=IDLE
NodeName=compute-1-02 Sockets=1 CPUs=4 RealMemory=7900 CoresPerSocket=4 ThreadsPerCore=1 State=IDLE
NodeName=compute-1-03 Sockets=1 CPUs=4 RealMemory=7900 CoresPerSocket=4 ThreadsPerCore=1 State=IDLE
NodeName=head-01 Sockets=1 CPUs=4 RealMemory=7900 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
PartitionName=debug Nodes=head-01 Default=NO MaxTime=INFINITE State=UP
PartitionName=comp Nodes=head-01,compute-1-[01-03] Default=YES MaxTime=INFINITE State=UP
```
#### Exportfs
Finally, we export all this glory over NFS.
#Add our needed exports to `/etc/network/interfacesexports`, and sync the changes with `sudo exportfs -arv`.
source-directory /etc/network/interfaces.d```
allow-hotplug enp34s0# /etc/exports: the access control list for filesystems which may be exported
iface enp34s0 inet dhcp# to NFS clients. See exports(5).
/exports/xenial 10.0.0.0/24(ro,async,no_root_squash,no_subtree_check,insecure)
/opt 10.0.0.0/24(ro,async,no_root_squash,no_subtree_check,insecure)
/home 10.0.0.0/24(rw,sync,no_root_squash,no_subtree_check,insecure)
/scratch 10.0.0.0/24(rw,sync,no_subtree_check,insecure)
/usr 10.0.0.0/24(ro,async,no_root_squash,no_subtree_check,insecure)
/etc/slurm-llnl 10.0.0.0/24(ro,async,no_root_squash,no_subtree_check)
```
Exit the chroot enviornment with `exit` when finished.
#### Generating initramfs
YouLast but not least, you will need to generate a new kernel and initramfs in order for it to support nfsroot arguments. This can be done with `initramfs-tools`
```
sudo apt update
sudo apt install initramfs-tools
```
Edit `/etc/initramfs-tools/initramfs.conf`, and change the entries BOOT to nfs, MODULES to most, and NFSROOT to auto.
```
# initramfs.conf
# Configuration file for mkinitramfs(8). See initramfs.conf(5).
#
# Note that configuration options from this file can be overridden
# by config files in the /etc/initramfs-tools/conf.d directory.
BOOT=nfs
#
# MODULES: [ most | netboot | dep | list ]
#
# most - Add most filesystem and all harddrive drivers.
#
# dep - Try and guess which modules to load.
#
# netboot - Add the base modules, network modules, but skip block devices.
#
# list - Only include modules from the 'additional modules' list
#
MODULES=most
#
# BUSYBOX: [ y | n | auto ]
#
# Use busybox shell and utilities. If set to n, klibc utilities will be used.
# If set to auto (or unset), busybox will be used if installed and klibc will
# be used otherwise.
#
BUSYBOX=auto
#
# COMPCACHE_SIZE: [ "x K" | "x M" | "x G" | "x %" ]RESS=gzip
NFSROOT=auto
#```
Now, generate the initramfs, and copy it and the kernel to the tftp directory:
# Amount of RAM to use for RAM-based compressed swap space.Note:
```
#sudo mkdir ~/tmp
# An empty value - compcache isn't used, or added to the sudo mkinitramfs -o /tftp/images/ubuntu-1604/initramfs at all.d.img
# An integer and K (e.g. 65536 K) - use a number of kilobytes.sudo cp /boot/vmlinuz-$(uname -r) /tftp/images/ubuntu-1604/linux
# An integer and M (e.g. 256 M) - use a number of megabytes.```
IMPORTANT:
# An integer and G (e.g. 1 G) - use a number of gigabytesEdit your `/etc/initramfs-tools/initramfs.conf` and comment out the change the `BOOT=nfs` line.
# An integer and % (e.g. 50 %) - use a percentage of the amount of RAM.This prevents updateinitramfs from turning the head node into a PXE boot machine later on!
### Booting
#At this point, you should have a bootable system. Add another node to the local network, turn it on, and enable PXE booting in the BIOS. The machine should come up with a PXE menu, and boot from the AtlasNFSRoot.
# You can optionally install the compcache package to configure this settingAfter booting, attempt to connect to each compute node via ssh. Ensure munge, slurm, and other tools are operating normally.
## Infiniband
# via debconf and have userspace scripts to load and unload compcache...Now the hard part. Maybe.
#
COMPCACHE_SIZE=""usually infiniband has special drivers from the manufacturer, but lets try to do it using some OpenSM and the OpenFabrics Enterprise Distribution (OFED).
IMPORTANT: Do all this stuff in both the head node and PXE root environment!
#Add a file to `/etc/udev/rules.d` called, say, `99-udev-umad.rules`. This will cause the correct entries to be created in `/sys`.
# COMPRESS: [ gzip | bzip2 | lzma | lzop | xz ]Insert the following:
#
COMPRESS=gzip```
#/etc/udev/rules.d/99-udev-umad.rules
#KERNEL==”umad*”, NAME=”infiniband/%k”, MODE=”0666″
# NFS Section of the config.KERNEL==”issm*”, NAME=”infiniband/%k”, MODE=”0666″
#```
#Edit `/etc/modules` and add the following modules:
# DEVICE: ...```
#ib_sa
# Specify a specific network interface, like eth0ib_cm
# Overridden by optional ip= bootargib_umad
#
DEVICE=
#ib_addr
# NFSROOT: [ auto | HOST:MOUNT ]ib_uverbs
#
NFSROOT=autoib_ipoib
```ib_ipath
Now, generate the initramfs, and copy it and the kernel to the tftp directory:ib_qib
Note:
```
Next, `sudo apt install opensm`. This will install the subnet manager and all the relevant dependencies (hopefully).
Then add the relevant entries for the interface into `/etc/network/interfaces` file:
```
auto ib0
sudo mkdir ~/tmpiface ib0 inet static
sudo mkinitramfs -c -o ~/tmp/PXE-initramfs-$(uname -r)address 10.0.1.1
sudo cp ~/tmp/PXE-initrd.img-$(uname -r) /tftp/images/ubuntu-1604/initrd.img
sudo cp /boot/vmlinuz-$(uname -r) /tftp/images/ubuntu-1604/linuxnetmask 255.255.255.0
```
Then reboot. This will create the relevant InfiniBand entries in `/sys`, load the IPoIB modules, and bring up the InfiniBand port with an ip address.
You should now have a functioning infiniband port on your Ubuntu machines, provided
### Booting
At this point, you should have a bootable system. Add another node to the local network switch, turn it on, and enable PXE booting in the BIOSNow go test the network with `netperf` (or something) for speed and function and hope things didn't get FUBAR while you blinked. The machine should come up with a PXE menu, and boot from the AtlasNFSRoot:)