Tutorial: Setting Up a 3-Node Raspberry Pi 5 SLURM Cluster (Rev2)
Created: 2025-04-13 17:33:58 | Last updated: 2025-04-13 17:56:45 | Status: Public
This tutorial guides you through setting up a small High-Performance Computing (HPC) cluster using three Raspberry Pi 5 devices, SLURM Workload Manager, and a specific network configuration involving both Wi-Fi and a private Ethernet network.
Cluster Configuration:
- Nodes: 3 x Raspberry Pi 5 (8GB RAM recommended)
- OS: Raspberry Pi OS Bookworm (64-bit recommended)
- Boot: From SSDs
- Cluster User:
cuser
- Networking:
pi-head
:- WLAN (
wlan0
): Connects to your main router via Wi-Fi, gets192.168.1.20
via DHCP reservation (Gateway:192.168.1.1
). Provides internet access. - Ethernet (
eth0
): Connects to private switch, static IP10.0.0.1/24
.
- WLAN (
pi-c01
:- Ethernet (
eth0
): Connects to private switch, static IP10.0.0.2/24
. Gateway viapi-head
(10.0.0.1
).
- Ethernet (
pi-c02
:- Ethernet (
eth0
): Connects to private switch, static IP10.0.0.3/24
. Gateway viapi-head
(10.0.0.1
).
- Ethernet (
- SLURM: Basic setup (
slurmctld
,slurmd
,munge
).
Prerequisites
- Hardware:
- 3 x Raspberry Pi 5 (8GB RAM)
- 3 x NVMe SSDs (or SATA SSDs with appropriate adapters) compatible with RPi 5 boot.
- 3 x Reliable Power Supplies for RPi 5 (5V/5A recommended).
- 1 x Gigabit Ethernet Switch (unmanaged is fine).
- 3 x Ethernet Cables.
- Access to your existing Wi-Fi network and router admin interface (for DHCP reservation).
- Software:
- Raspberry Pi Imager tool.
- Raspberry Pi OS Bookworm (64-bit recommended) flashed onto each SSD.
- Initial Setup:
- Ensure each Pi boots correctly from its SSD.
- Complete the initial Raspberry Pi OS setup wizard (create the initial user - this is NOT
cuser
yet, set locale, keyboard, etc.). - Enable SSH on each Pi:
sudo raspi-config
-> Interface Options -> SSH -> Enable. - Connect
pi-head
to your Wi-Fi network. - Configure the DHCP reservation on your OpenWRT router to assign
192.168.1.20
topi-head
’s WLAN MAC address. Verifypi-head
gets this IP (ip a show wlan0
). - Physically connect all three Pis to the Gigabit switch using Ethernet cables.
Phase 1: Basic OS Configuration & Hostnames
(Perform these steps on each Pi, adjusting hostnames accordingly. You’ll need SSH access.)
- Login: SSH into each Pi using the initial user you created during setup.
- Set Hostnames:
- On the first Pi (intended as head node):
sudo hostnamectl set-hostname pi-head
* On the second Pi (compute node 1):
sudo hostnamectl set-hostname pi-c01
* On the third Pi (compute node 2):
sudo hostnamectl set-hostname pi-c02
* Reboot each Pi (`sudo reboot`) or log out and log back in for the change to take effect in your shell prompt and network identity.
- Update System (
pi-head
only for now):- Ensure
pi-head
has internet via Wi-Fi. - SSH into
pi-head
:
- Ensure
sudo apt update
sudo apt full-upgrade -y
sudo apt install -y vim git build-essential # Essential tools
* *Note: We will update `pi-c01` and `pi-c02` after setting up network routing.*
Phase 2: Network Configuration (Revised)
Goal: Configure network interfaces on all three Raspberry Pis. pi-head
will connect to your home network/internet via Wi-Fi (wlan0
) and to the private cluster network via Ethernet (eth0
). pi-c01
and pi-c02
will connect only to the private cluster network via Ethernet (eth0
) and use pi-head
as their gateway to reach the internet. We will use nmcli
for interface configuration and nftables
for firewall/NAT on pi-head
.
Recap of Target Configuration:
pi-head
:wlan0
:192.168.1.20
(via DHCP reservation), Gateway192.168.1.1
(Internet Access)eth0
:10.0.0.1/24
(Static, Private Network)
pi-c01
:eth0
:10.0.0.2/24
(Static, Private Network), Gateway10.0.0.1
wlan0
: Disabled
pi-c02
:eth0
:10.0.0.3/24
(Static, Private Network), Gateway10.0.0.1
wlan0
: Disabled
Steps:
- Verify
pi-head
WLAN Connection:- SSH into
pi-head
. - Confirm it received the correct IP address from your router’s DHCP reservation and has a default route via your main gateway:
- SSH into
ip addr show wlan0
# Look for 'inet 192.168.1.20/XX ...' (XX is your subnet mask, often 24)
ip route show default
# Should show 'default via 192.168.1.1 dev wlan0 ...'
* If the IP or route is incorrect, double-check your router's DHCP reservation settings and ensure `pi-head`'s Wi-Fi is connected to the correct network.
- Configure
pi-head
Ethernet (eth0
- Private Network):- Still on
pi-head
. - Identify the Ethernet interface name (usually
eth0
):ip a
- Add a NetworkManager connection profile for
eth0
with the static private IP. We explicitly set no gateway and mark it as never the default route:
- Still on
# Replace 'eth0' if your interface name is different
sudo nmcli connection add type ethernet con-name 'static-eth0' ifname eth0 ip4 10.0.0.1/24
# Critical: Prevent this interface from ever becoming the default route
sudo nmcli connection modify 'static-eth0' ipv4.gateway '' # Ensure no gateway is set
sudo nmcli connection modify 'static-eth0' ipv4.never-default yes # Prevent it from being the default route
sudo nmcli connection modify 'static-eth0' connection.autoconnect yes # Connect automatically
# Bring the connection up (may happen automatically)
sudo nmcli connection up 'static-eth0'
* **Verify `pi-head` Network State:**
ip addr show eth0
# Should show 'inet 10.0.0.1/24 ...'
ip route show default
# Should STILL show 'default via 192.168.1.1 dev wlan0 ...'
- Configure
pi-c01
Ethernet (eth0
- Private Network):- SSH into
pi-c01
. (Use temporary keyboard/monitor or connecteth0
temporarily to main network if needed for first access). - Identify the Ethernet interface name (usually
eth0
):ip a
- Add the static IP configuration, setting
pi-head
(10.0.0.1
) as the gateway and providing DNS servers:
- SSH into
# Replace 'eth0' if needed
sudo nmcli connection add type ethernet con-name 'static-eth0' ifname eth0 ip4 10.0.0.2/24 gw4 10.0.0.1
# Set DNS servers (e.g., Google DNS and Cloudflare DNS)
# These requests will be routed via pi-head
sudo nmcli connection modify 'static-eth0' ipv4.dns "8.8.8.8 1.1.1.1"
sudo nmcli connection modify 'static-eth0' ipv4.ignore-auto-dns yes # Use only the specified DNS
sudo nmcli connection modify 'static-eth0' connection.autoconnect yes
# Bring the connection up
sudo nmcli connection up 'static-eth0'
* **Verify `pi-c01` Network State:**
ip addr show eth0
# Should show 'inet 10.0.0.2/24 ...'
ip route show default
# Should show 'default via 10.0.0.1 dev eth0 ...'
cat /etc/resolv.conf
# Should show 'nameserver 8.8.8.8' and 'nameserver 1.1.1.1'
- Configure
pi-c02
Ethernet (eth0
- Private Network):- SSH into
pi-c02
. - Identify the Ethernet interface name (usually
eth0
):ip a
- Add the static IP configuration, similar to
pi-c01
:
- SSH into
# Replace 'eth0' if needed
sudo nmcli connection add type ethernet con-name 'static-eth0' ifname eth0 ip4 10.0.0.3/24 gw4 10.0.0.1
# Set DNS servers
sudo nmcli connection modify 'static-eth0' ipv4.dns "8.8.8.8 1.1.1.1"
sudo nmcli connection modify 'static-eth0' ipv4.ignore-auto-dns yes
sudo nmcli connection modify 'static-eth0' connection.autoconnect yes
# Bring the connection up
sudo nmcli connection up 'static-eth0'
* **Verify `pi-c02` Network State:**
ip addr show eth0
# Should show 'inet 10.0.0.3/24 ...'
ip route show default
# Should show 'default via 10.0.0.1 dev eth0 ...'
cat /etc/resolv.conf
# Should show nameservers 8.8.8.8 and 1.1.1.1
- Enable IP Forwarding and Configure
nftables
NAT/Firewall onpi-head
:- SSH back into
pi-head
. - Enable kernel IP forwarding:
- SSH back into
echo 'net.ipv4.ip_forward=1' | sudo tee /etc/sysctl.d/99-ip_forward.conf
sudo sysctl -p /etc/sysctl.d/99-ip_forward.conf # Apply immediately
sudo sysctl net.ipv4.ip_forward # Verify output is '= 1'
* **Install `nftables` (if not already present):**
sudo apt update
sudo apt install -y nftables
* **Create `nftables` Configuration:**
* Backup the default config: `sudo cp /etc/nftables.conf /etc/nftables.conf.bak`
* Edit the configuration file: `sudo vim /etc/nftables.conf`
* **Replace the entire content** with this ruleset (adjust `wlan0`/`eth0` if needed):
#!/usr/sbin/nft -f
# Flush the entire previous ruleset
flush ruleset
# Table for IPv4 NAT
table ip nat {
chain postrouting {
type nat hook postrouting priority 100; policy accept;
# Masquerade traffic from private network (eth0) going OUT via wlan0
oifname "wlan0" ip saddr 10.0.0.0/24 masquerade comment "NAT cluster traffic to WAN"
}
# Optional: Add prerouting rules here if needed for port forwarding INTO the cluster
}
# Table for IPv4/IPv6 Filtering
table inet filter {
chain input {
type filter hook input priority 0; policy accept;
# Basic stateful firewall for input traffic to pi-head
ct state established,related accept
# Allow loopback traffic
iifname "lo" accept
# Allow SSH (port 22) - Recommended! Add source IP ranges if possible
tcp dport 22 accept
# Allow ICMP (ping)
icmp type echo-request accept
# Allow traffic from cluster nodes (optional, if needed for services hosted on pi-head)
iifname "eth0" ip saddr 10.0.0.0/24 accept comment "Allow traffic from cluster nodes"
# Uncomment below to drop other traffic instead of accept-all policy
# drop
}
chain forward {
type filter hook forward priority 0; policy drop; # Default: Drop forwarded traffic
# Allow established/related connections coming back IN from WAN (wlan0) to LAN (eth0)
iifname "wlan0" oifname "eth0" ct state related,established accept comment "Allow established WAN to LAN"
# Allow NEW and established connections going OUT from LAN (eth0) to WAN (wlan0)
iifname "eth0" oifname "wlan0" ip saddr 10.0.0.0/24 accept comment "Allow LAN to WAN"
}
chain output {
type filter hook output priority 0; policy accept;
# Basic stateful firewall for output traffic from pi-head
ct state established,related accept
# Allow loopback traffic
oifname "lo" accept
# Uncomment below to drop other traffic instead of accept-all policy
# drop
}
}
* **Apply and Persist `nftables` Rules:**
sudo nft -f /etc/nftables.conf # Apply the ruleset, check for errors
sudo systemctl enable nftables.service # Make rules persistent on boot
sudo systemctl restart nftables.service # Restart service to load rules definitively
sudo systemctl status nftables.service # Check service status
sudo nft list ruleset # Review the active ruleset
- Troubleshoot SSH Slowness on
pi-head
(Potential Fix):- Slow SSH logins are often caused by the SSH server trying to perform a reverse DNS lookup on the connecting client’s IP address, which can time out if not configured correctly.
- On
pi-head
:
# Edit the SSH server configuration file
sudo vim /etc/ssh/sshd_config
* Find the line `#UseDNS yes` or `UseDNS yes`. Uncomment it if needed, and change `yes` to `no`:
UseDNS no
* Save the file and restart the SSH service:
sudo systemctl restart sshd
* Try SSHing into `pi-head` again from your workstation. If the login is now significantly faster, this was likely the cause.
- Test Basic Network Connectivity:
- From
pi-head
:
- From
ping -c 2 10.0.0.2 # Ping pi-c01
ping -c 2 10.0.0.3 # Ping pi-c02
* From `pi-c01`:
ping -c 2 10.0.0.1 # Ping pi-head
ping -c 2 10.0.0.3 # Ping pi-c02
* From `pi-c02`:
ping -c 2 10.0.0.1 # Ping pi-head
ping -c 2 10.0.0.2 # Ping pi-c01
* All these pings over the `10.0.0.x` network should work.
- Verify Compute Node Internet Access (Initial Test):
- From
pi-c01
:
- From
ping -c 3 8.8.8.8 # Test internet IP reachability
ping -c 3 google.com # Test DNS resolution + internet reachability
* From `pi-c02`:
ping -c 3 1.1.1.1 # Test internet IP reachability (different target)
ping -c 3 cloudflare.com # Test DNS resolution + internet reachability
* These tests should now succeed, routing through `pi-head`. If not, re-check `nftables` rules (`sudo nft list ruleset`), IP forwarding (`sudo sysctl net.ipv4.ip_forward`), and routing tables (`ip route`) on all nodes.
- Disable Wi-Fi on Compute Nodes (
pi-c01
,pi-c02
):- This confirms they rely solely on
eth0
for all traffic. - On
pi-c01
:
- This confirms they rely solely on
sudo nmcli radio wifi off
nmcli radio wifi # Verify output shows 'disabled'
ip a show wlan0 # Verify interface is DOWN or has no IP
* **On `pi-c02`:**
sudo nmcli radio wifi off
nmcli radio wifi # Verify output shows 'disabled'
ip a show wlan0 # Verify interface is DOWN or has no IP
- Final Connectivity Test (Compute Nodes via
eth0
only):- Repeat the internet connectivity tests from Step 8 on both
pi-c01
andpi-c02
:
- Repeat the internet connectivity tests from Step 8 on both
# On pi-c01
ping -c 3 8.8.8.8
ping -c 3 google.com
# On pi-c02
ping -c 3 1.1.1.1
ping -c 3 cloudflare.com
* If these tests still succeed with Wi-Fi disabled, your network routing is configured correctly.
- Update Compute Nodes:
- Now that
pi-c01
andpi-c02
have verified internet access viapi-head
, ensure they are fully updated:
- Now that
# On pi-c01 AND pi-c02
sudo apt update
sudo apt full-upgrade -y
# Install common tools if you haven't already
sudo apt install -y vim git build-essential
Phase 2 Completion: At this point, your network should be fully configured according to the requirements. pi-head
acts as the gateway, and pi-c01
/pi-c02
rely solely on their Ethernet connection to the private network for all communication, including internet access routed through pi-head
. You can now proceed to Phase 3: Common Cluster Environment Setup.
Phase 3: Common Cluster Environment Setup
(Perform steps on all nodes unless specified)
- Configure Hostname Resolution (
/etc/hosts
):- Edit the hosts file on all three nodes:
sudo vim /etc/hosts
- Ensure the following lines exist (add them if missing, below the
127.0.0.1 localhost
line):
- Edit the hosts file on all three nodes:
127.0.1.1 <current_hostname> # This line is usually added by hostnamectl
# Cluster Nodes
10.0.0.1 pi-head
10.0.0.2 pi-c01
10.0.0.3 pi-c02
* **Test:** From any node, ping the others by hostname (e.g., `ping -c 1 pi-c01` from `pi-head`).
- Create Common Cluster User (
cuser
):- Crucially,
cuser
must have the same User ID (UID) and Group ID (GID) on all nodes. - On
pi-head
first:
- Crucially,
sudo adduser cuser
# Follow prompts to set password etc.
# Note the UID and GID displayed (e.g., uid=1001(cuser) gid=1001(cuser) groups=...)
# Optional: Add cuser to the sudo group if needed for administration tasks
# sudo usermod -aG sudo cuser
* **On `pi-c01` and `pi-c02`:**
* Get the UID and GID from `pi-head`. Use `id cuser` on `pi-head`. Let's assume it was `1001` for both UID and GID. **Replace `1001` below if yours is different.**
# Create the group first with the specific GID
sudo groupadd -g 1001 cuser
# Create the user with the specific UID and GID
sudo useradd -u 1001 -g 1001 -m -s /bin/bash cuser
# Set the password for the new user
sudo passwd cuser
# Optional: Add to sudo group (use the same groups as on pi-head if needed)
# sudo usermod -aG sudo cuser
* **Verify:** Run `id cuser` on **all three** nodes. Ensure the UID and GID match exactly.
- Setup Passwordless SSH for
cuser
:- Log in as
cuser
onpi-head
. You can usesu - cuser
if logged in as another user, or SSH directly:ssh cuser@pi-head
. - Generate SSH key pair (run as
cuser
):
- Log in as
# Accept default file location (~/.ssh/id_rsa), press Enter for empty passphrase
ssh-keygen -t rsa -b 4096
* **Copy the public key to all nodes (including `pi-head` itself):**
# Run as cuser from pi-head
ssh-copy-id cuser@pi-head
ssh-copy-id cuser@pi-c01
ssh-copy-id cuser@pi-c02
# Enter the password for 'cuser' when prompted for each node
* **Test:** Still as `cuser` on `pi-head`, try SSHing to each node without a password:
ssh pi-head date
ssh pi-c01 date
ssh pi-c02 date
# The first time connecting to each might ask "Are you sure you want to continue connecting (yes/no/[fingerprint])?". Type 'yes'.
# If it prompts for a password after the first connection, the key setup failed. Check permissions in ~/.ssh directories.
- Install and Configure NFS (Shared Filesystem):
- We’ll share
/clusterfs
frompi-head
to be used by all nodes. Do this as the primary user, notcuser
- On
pi-head
(NFS Server):
- We’ll share
sudo apt update
sudo apt install -y nfs-kernel-server
sudo mkdir -p /clusterfs
# Option 1: Allow anyone to write (simple for cluster user)
sudo chown nobody:nogroup /clusterfs
sudo chmod 777 /clusterfs
# Option 2: Restrict to cuser (better security, requires consistent UID/GID)
# sudo chown cuser:cuser /clusterfs
# sudo chmod 770 /clusterfs # Or 750 if group members only need read
# Edit the NFS exports file
sudo nano /etc/exports
# Add this line to allow access from the private 10.0.0.x network:
# Use 'no_root_squash' carefully if you need root access over NFS
/clusterfs 10.0.0.0/24(rw,sync,no_subtree_check)
# Activate the exports
sudo exportfs -ra
# Restart and enable the NFS server service
sudo systemctl restart nfs-kernel-server
sudo systemctl enable nfs-kernel-server
* **On `pi-c01` and `pi-c02` (NFS Clients):**
sudo apt update
sudo apt install -y nfs-common
sudo mkdir -p /clusterfs
# Add the mount to /etc/fstab for automatic mounting on boot
sudo nano /etc/fstab
# Add this line at the end:
pi-head:/clusterfs /clusterfs nfs defaults,auto,nofail 0 0
# Mount all filesystems defined in fstab (including the new one)
sudo mount -a
# Verify the mount was successful
df -h | grep /clusterfs
# Check mount options (optional)
mount | grep /clusterfs
* From `pi-head` as `cuser`: `touch /clusterfs/test_head.txt`
* From `pi-c01` as `cuser`: `ls /clusterfs` (should see `test_head.txt`)
* From `pi-c02` as `cuser`: `touch /clusterfs/test_c02.txt`
* From `pi-head` as `cuser`: `ls /clusterfs` (should see both files)
- Install and Configure NTP (Time Synchronization): Accurate time is essential for SLURM.
- Install
chrony
on all nodes:
- Install
sudo apt update
sudo apt install -y chrony
* Ensure it's enabled and running on **all nodes**:
sudo systemctl enable --now chrony
* `chrony` will automatically use internet time sources. Since all nodes now have internet (directly or via `pi-head`), this should work.
* **Verify sync status** (might take a minute or two after starting):
# Run on all nodes
chronyc sources
# Look for lines starting with '^*' (synced server) or '^+ (acceptable server).
timedatectl status | grep "NTP service"
# Should show 'active'.
Phase 4: Install and Configure SLURM & Munge
- Install Munge (Authentication Service): Munge provides secure authentication between SLURM daemons. Install on all nodes.
sudo apt update
sudo apt install -y munge libmunge-dev libmunge2
- Create and Distribute Munge Key: A shared secret key must be identical on all nodes.
- On
pi-head
ONLY:
- On
# Stop munge service if running
sudo systemctl stop munge
# Create the key (as root or using sudo)
sudo dd if=/dev/urandom of=/etc/munge/munge.key bs=1 count=1024
# Set correct ownership and permissions
sudo chown munge:munge /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key
* **Securely copy the key from `pi-head` to compute nodes:**
# Run these commands on pi-head (you might need root@ if sudo access via SSH is restricted)
sudo scp /etc/munge/munge.key pi-c01:/etc/munge/munge.key
sudo scp /etc/munge/munge.key pi-c02:/etc/munge/munge.key
# Enter the SSH password for the user on pi-c01/pi-c02 when prompted (e.g., your initial setup user if using sudo scp, or root password if using root@).
* **On `pi-c01` and `pi-c02`:**
# Ensure munge is stopped
sudo systemctl stop munge
# Set correct ownership and permissions on the copied key
sudo chown munge:munge /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key
- Start and Enable Munge Service: On all nodes:
sudo systemctl start munge
sudo systemctl enable munge
# Verify status
sudo systemctl status munge
- Test Munge Communication:
- From
pi-head
:
- From
# Test local encoding/decoding
munge -n | unmunge
# Test head -> c01
munge -n | ssh pi-c01 unmunge
# Test head -> c02
munge -n | ssh pi-c02 unmunge
# Test c01 -> head (round trip)
ssh pi-c01 munge -n | unmunge
* All tests should return a `STATUS: Success (...)` line. If not, check `munge.key` consistency, permissions, and `munged` service status. Also check `/var/log/munge/munged.log`.
- Install SLURM: Install the SLURM workload manager packages on all nodes.
sudo apt update
sudo apt install -y slurm-wlm slurm-wlm-doc # slurm-wlm pulls in slurmd, slurmctld etc.
- Configure SLURM (
slurm.conf
):- Create the configuration file on
pi-head
first. A minimal configuration is below. - Generate a node list helper (optional but good practice):
- Create the configuration file on
# On pi-head
scontrol show config | grep ClusterName # Find default if any
* Edit the main config file: `sudo vim /etc/slurm/slurm.conf`
* Replace the **entire content** with the following.
* **Adjust `RealMemory`**: `free -m` shows total memory in MiB. Leave some (~200-300MB) for the OS. For an 8GB Pi (approx 7850MB usable), `7600` is a safe starting point.
* **CPUs**: RPi 5 has 4 cores.
# /etc/slurm/slurm.conf
# Basic SLURM configuration for pi-cluster
ClusterName=pi-cluster
SlurmctldHost=pi-head #(Or use IP 10.0.0.1)
# SlurmctldHost=pi-head(10.0.0.1) # Optional: Specify both
MpiDefault=none
ProctrackType=proctrack/cgroup
ReturnToService=1
SlurmctldPidFile=/run/slurmctld.pid
SlurmdPidFile=/run/slurmd.pid
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/var/spool/slurmctld
SlurmdSpoolDir=/var/spool/slurmd
SwitchType=switch/none
TaskPlugin=task/cgroup
# LOGGING
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdLogFile=/var/log/slurm/slurmd.log
JobCompType=jobcomp/none # No job completion logging for basic setup
# TIMERS
SlurmctldTimeout=120
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_tres # Use cons_tres for memory tracking
SelectTypeParameters=CR_Core_Memory # Track Cores and Memory
# NODES - Adjust RealMemory based on your Pi 5 8GB (~7600 is conservative)
NodeName=pi-head NodeAddr=10.0.0.1 CPUs=4 RealMemory=7600 State=UNKNOWN
NodeName=pi-c01 NodeAddr=10.0.0.2 CPUs=4 RealMemory=7600 State=UNKNOWN
NodeName=pi-c02 NodeAddr=10.0.0.3 CPUs=4 RealMemory=7600 State=UNKNOWN
# PARTITION
PartitionName=rpi_part Nodes=pi-head,pi-c01,pi-c02 Default=YES MaxTime=INFINITE State=UP Oversubscribe=NO
* Create the SLURM log and spool directories **on all nodes**:
sudo mkdir /var/log/slurm /var/spool/slurmctld /var/spool/slurmd
# SLURM typically runs daemons as 'slurm' user/group created during package install
sudo chown slurm:slurm /var/log/slurm /var/spool/slurmctld /var/spool/slurmd
sudo chmod 755 /var/log/slurm /var/spool/slurmctld /var/spool/slurmd
# Verify user exists
id slurm
# On pi-head
sudo scp /etc/slurm/slurm.conf pi-c01:/etc/slurm/slurm.conf
sudo scp /etc/slurm/slurm.conf pi-c02:/etc/slurm/slurm.conf
- Configure Cgroup Plugin (
cgroup.conf
): Needed for resource constraint (ProctrackType=proctrack/cgroup
,TaskPlugin=task/cgroup
,SelectType=select/cons_tres
).- Create
/etc/slurm/cgroup.conf
onpi-head
first:sudo vim /etc/slurm/cgroup.conf
- Add the following content:
- Create
# /etc/slurm/cgroup.conf
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm/cgroup"
ConstrainCores=yes
ConstrainDevices=yes
ConstrainRAMSpace=yes
# If using systemd, TaskAffinity should generally be no
TaskAffinity=no
* Create the `CgroupReleaseAgentDir` **on all nodes**:
sudo mkdir -p /etc/slurm/cgroup
sudo chown slurm:slurm /etc/slurm/cgroup # Or root:root might be needed depending on systemd interactions
* Copy `cgroup.conf` to compute nodes:
# On pi-head
sudo scp /etc/slurm/cgroup.conf pi-c01:/etc/slurm/cgroup.conf
sudo scp /etc/slurm/cgroup.conf pi-c02:/etc/slurm/cgroup.conf
- Start SLURM Services:
- On
pi-head
(Controller):
- On
sudo systemctl enable slurmctld.service
sudo systemctl start slurmctld.service
# Check status immediately
sudo systemctl status slurmctld.service
journalctl -u slurmctld.service | tail -n 20 # Check logs
* **On ALL nodes (Compute Daemons - including `pi-head`):**
sudo systemctl enable slurmd.service
sudo systemctl start slurmd.service
# Check status
sudo systemctl status slurmd.service
# Check logs on each node
tail -n 20 /var/log/slurm/slurmd.log
- Verify SLURM Cluster Status:
- Wait ~10-15 seconds for nodes to register. Run on
pi-head
:
- Wait ~10-15 seconds for nodes to register. Run on
sinfo
# Expected output:
# PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
# rpi_part* up infinite 3 idle pi-head,pi-cp0[1-2]
# (State might be 'unk' or 'down' initially, or 'mix' if nodes are registering)
scontrol show node
# Check details for each node. Look for 'State=IDLE'. If 'State=DOWN' or 'State=DRAINED', check logs:
# - /var/log/slurm/slurmctld.log on pi-head
# - /var/log/slurm/slurmd.log on the affected node(s)
# If nodes are down/drained due to initial errors that are now fixed:
# sudo scontrol update nodename=pi-head,pi-c01,pi-c02 state=resume
* Time synchronization errors between nodes. (Fix with `chrony`)
* Munge authentication errors. (Check `munge.key` and `munged` service)
* Firewall blocking ports `6817` (slurmctld) or `6818` (slurmd). (`iptables` on pi-head shouldn't block the 10.0.0.x network, but check if `ufw` or other firewalls are active).
* Incorrect hostnames or IP addresses in `slurm.conf` (`SlurmctldHost`, `NodeName`, `NodeAddr`). Use the `10.0.0.x` addresses for `NodeAddr`.
* Incorrect permissions or non-existent spool/log directories (`/var/spool/slurm*`, `/var/log/slurm`).
* `slurmd` fails to start due to resource limits or cgroup issues. Check `journalctl -u slurmd` and `dmesg`.
Phase 5: Testing the SLURM Cluster
(Run these commands as cuser
on pi-head
)
- Login as
cuser
:
su - cuser
# Or: ssh cuser@pi-head
cd /clusterfs # Work in the shared filesystem if desired
- Run a Simple Command Interactively:
srun hostname
# Runs 'hostname' on one available node in the default partition.
- Run Command on Specific Number of Nodes:
# Run hostname on 2 different nodes, 1 task per node
srun --nodes=2 --ntasks-per-node=1 hostname | sort
# Should show two different hostnames (e.g., pi-c01, pi-c02 or pi-head, pi-c01)
- Submit a Simple Batch Job:
- Create a job script file, e.g.,
/clusterfs/cuser/hello.sh
(ensure/clusterfs/cuser
exists and is writable bycuser
):
- Create a job script file, e.g.,
#!/bin/bash
#SBATCH --job-name=hello # Job name
#SBATCH --output=hello_job_%j.out # Standard output file (%j = job ID)
#SBATCH --error=hello_job_%j.err # Standard error file
#SBATCH --nodes=3 # Request all 3 nodes
#SBATCH --ntasks-per-node=2 # Request 2 tasks (processes) per node (total 6)
#SBATCH --cpus-per-task=1 # Request 1 CPU core per task
#SBATCH --partition=rpi_part # Specify partition (optional if default)
#SBATCH --time=00:05:00 # Time limit (5 minutes)
echo "Job running on nodes:"
srun hostname | sort # Use srun within sbatch to launch parallel tasks
echo "Tasks started at: $(date)"
sleep 20 # Simulate some work
echo "Tasks finished at: $(date)"
* Make the script executable: `chmod +x /clusterfs/cuser/hello.sh`
* Submit the job from the directory containing the script:
sbatch hello.sh
# Should print: Submitted batch job <JOB_ID>
* Check the queue:
squeue
# Shows running or pending jobs
watch squeue # Monitor queue updates
sinfo
# Should show nodes in 'alloc' or 'mix' state.
* Once the job finishes (disappears from `squeue`), check the output files (`hello_job_<JOB_ID>.out` and `.err`) in the submission directory:
cat hello_job_<JOB_ID>.out
# Should show hostnames from all 3 nodes, likely repeated if ntasks > nnodes * ntasks-per-node setting is used correctly.
# In this example, it should list pi-head, pi-c01, pi-c02 each twice.
Congratulations!
You should now have a functional 3-node Raspberry Pi 5 SLURM cluster. The compute nodes (pi-c01
, pi-c02
) use the head node (pi-head
) as a gateway for internet access, while all cluster communication happens over the private 10.0.0.x
network.
Next Steps & Considerations
- Install MPI: Install OpenMPI or MPICH (
sudo apt install -y openmpi-bin libopenmpi-dev
on all nodes) to run parallel MPI applications. Update SLURM’sMpiDefault=pmix
or configure MPI properly if needed. - Shared Software Stack: Install compilers, libraries, and applications needed for your HPC tasks onto the shared NFS filesystem (
/clusterfs
) so they are accessible from all nodes without needing installation everywhere. Modules systems like Lmod can help manage this. - Monitoring: Set up monitoring tools like
htop
,glances
, or more comprehensive systems like Prometheus + Grafana or Ganglia to observe cluster load and resource usage. - SLURM Tuning: Explore more advanced
slurm.conf
options: resource limits (memory, cores per job/user), Quality of Service (QoS), fair-share scheduling, job arrays. - SLURM Accounting: For tracking resource usage over time, set up the SLURM accounting database (
slurmdbd
) which requires installing and configuring a database (like MariaDB/MySQL). - Security: Review
iptables
rules, harden SSH (/etc/ssh/sshd_config
), and consider user permissions carefully. - Backup: Back up your
slurm.conf
,munge.key
, and important data on/clusterfs
.
Enjoy your mini HPC cluster!