Tutorial: Setting Up a 3-Node Raspberry Pi 5 SLURM Cluster (Rev3)
Created: 2025-04-13 18:59:37 | Last updated: 2025-04-13 18:59:37 | Status: Public
This tutorial guides you through setting up a small High-Performance Computing (HPC) cluster using three Raspberry Pi 5 devices, SLURM Workload Manager, and a specific network configuration involving both Wi-Fi and a private Ethernet network.
Cluster Configuration:
- Nodes: 3 x Raspberry Pi 5 (8GB RAM recommended)
- OS: Raspberry Pi OS Bookworm (64-bit recommended)
- Boot: From SSDs
- Cluster User:
cuser
- Networking:
pi-head
:- WLAN (
wlan0
): Connects to your main router via Wi-Fi, gets192.168.1.20
via DHCP reservation (Gateway:192.168.1.1
). Provides internet access. - Ethernet (
eth0
): Connects to private switch, static IP10.0.0.1/24
.
- WLAN (
pi-c01
:- Ethernet (
eth0
): Connects to private switch, static IP10.0.0.2/24
. Gateway viapi-head
(10.0.0.1
).
- Ethernet (
pi-c02
:- Ethernet (
eth0
): Connects to private switch, static IP10.0.0.3/24
. Gateway viapi-head
(10.0.0.1
).
- Ethernet (
- SLURM: Basic setup (
slurmctld
,slurmd
,munge
).
Prerequisites
- Hardware:
- 3 x Raspberry Pi 5 (8GB RAM)
- 3 x NVMe SSDs (or SATA SSDs with appropriate adapters) compatible with RPi 5 boot.
- 3 x Reliable Power Supplies for RPi 5 (5V/5A recommended).
- 1 x Gigabit Ethernet Switch (unmanaged is fine).
- 3 x Ethernet Cables.
- Access to your existing Wi-Fi network and router admin interface (for DHCP reservation).
- Software:
- Raspberry Pi Imager tool.
- Raspberry Pi OS Bookworm (64-bit recommended) flashed onto each SSD.
- Initial Setup:
- Ensure each Pi boots correctly from its SSD.
- Complete the initial Raspberry Pi OS setup wizard (create the initial user - this is NOT
cuser
yet, set locale, keyboard, etc.). - Enable SSH on each Pi:
sudo raspi-config
-> Interface Options -> SSH -> Enable. - Connect
pi-head
to your Wi-Fi network. - Configure the DHCP reservation on your OpenWRT router to assign
192.168.1.20
topi-head
’s WLAN MAC address. Verifypi-head
gets this IP (ip a show wlan0
). - Physically connect all three Pis to the Gigabit switch using Ethernet cables.
Phase 1: Basic OS Configuration & Hostnames
(Perform these steps on each Pi, adjusting hostnames accordingly. You’ll need SSH access.)
- Login: SSH into each Pi using the initial user you created during setup.
- Set Hostnames:
- On the first Pi (intended as head node):
sudo hostnamectl set-hostname pi-head
* On the second Pi (compute node 1):
sudo hostnamectl set-hostname pi-c01
* On the third Pi (compute node 2):
sudo hostnamectl set-hostname pi-c02
* Reboot each Pi (`sudo reboot`) or log out and log back in for the change to take effect in your shell prompt and network identity.
- Update System (
pi-head
only for now):- Ensure
pi-head
has internet via Wi-Fi. - SSH into
pi-head
:
- Ensure
sudo apt update
sudo apt full-upgrade -y
sudo apt install -y vim git build-essential # Essential tools
* *Note: We will update `pi-c01` and `pi-c02` after setting up network routing.*
Phase 2: Network Configuration (Revised)
Goal: Configure network interfaces on all three Raspberry Pis. pi-head
will connect to your home network/internet via Wi-Fi (wlan0
) and to the private cluster network via Ethernet (eth0
). pi-c01
and pi-c02
will connect only to the private cluster network via Ethernet (eth0
) and use pi-head
as their gateway to reach the internet. We will use nmcli
for interface configuration and nftables
for firewall/NAT on pi-head
.
Recap of Target Configuration:
pi-head
:wlan0
:192.168.1.20
(via DHCP reservation), Gateway192.168.1.1
(Internet Access)eth0
:10.0.0.1/24
(Static, Private Network)
pi-c01
:eth0
:10.0.0.2/24
(Static, Private Network), Gateway10.0.0.1
wlan0
: Disabled
pi-c02
:eth0
:10.0.0.3/24
(Static, Private Network), Gateway10.0.0.1
wlan0
: Disabled
Steps:
- Verify
pi-head
WLAN Connection:- SSH into
pi-head
. - Confirm it received the correct IP address from your router’s DHCP reservation and has a default route via your main gateway:
- SSH into
ip addr show wlan0
# Look for 'inet 192.168.1.20/XX ...' (XX is your subnet mask, often 24)
ip route show default
# Should show 'default via 192.168.1.1 dev wlan0 ...'
* If the IP or route is incorrect, double-check your router's DHCP reservation settings and ensure `pi-head`'s Wi-Fi is connected to the correct network.
- Configure
pi-head
Ethernet (eth0
- Private Network):- Still on
pi-head
. - Identify the Ethernet interface name (usually
eth0
):ip a
- Add a NetworkManager connection profile for
eth0
with the static private IP. We explicitly set no gateway and mark it as never the default route:
- Still on
# Replace 'eth0' if your interface name is different
sudo nmcli connection add type ethernet con-name 'static-eth0' ifname eth0 ip4 10.0.0.1/24
# Critical: Prevent this interface from ever becoming the default route
sudo nmcli connection modify 'static-eth0' ipv4.gateway '' # Ensure no gateway is set
sudo nmcli connection modify 'static-eth0' ipv4.never-default yes # Prevent it from being the default route
sudo nmcli connection modify 'static-eth0' connection.autoconnect yes # Connect automatically
# Bring the connection up (may happen automatically)
sudo nmcli connection up 'static-eth0'
* **Verify `pi-head` Network State:**
ip addr show eth0
# Should show 'inet 10.0.0.1/24 ...'
ip route show default
# Should STILL show 'default via 192.168.1.1 dev wlan0 ...'
- Configure
pi-c01
Ethernet (eth0
- Private Network):- SSH into
pi-c01
. (Use temporary keyboard/monitor or connecteth0
temporarily to main network if needed for first access). - Identify the Ethernet interface name (usually
eth0
):ip a
- Add the static IP configuration, setting
pi-head
(10.0.0.1
) as the gateway and providing DNS servers:
- SSH into
# Replace 'eth0' if needed
sudo nmcli connection add type ethernet con-name 'static-eth0' ifname eth0 ip4 10.0.0.2/24 gw4 10.0.0.1
# Set DNS servers (e.g., Google DNS and Cloudflare DNS)
# These requests will be routed via pi-head
sudo nmcli connection modify 'static-eth0' ipv4.dns "8.8.8.8 1.1.1.1"
sudo nmcli connection modify 'static-eth0' ipv4.ignore-auto-dns yes # Use only the specified DNS
sudo nmcli connection modify 'static-eth0' connection.autoconnect yes
# Bring the connection up
sudo nmcli connection up 'static-eth0'
* **Verify `pi-c01` Network State:**
ip addr show eth0
# Should show 'inet 10.0.0.2/24 ...'
ip route show default
# Should show 'default via 10.0.0.1 dev eth0 ...'
cat /etc/resolv.conf
# Should show 'nameserver 8.8.8.8' and 'nameserver 1.1.1.1'
- Configure
pi-c02
Ethernet (eth0
- Private Network):- SSH into
pi-c02
. - Identify the Ethernet interface name (usually
eth0
):ip a
- Add the static IP configuration, similar to
pi-c01
:
- SSH into
# Replace 'eth0' if needed
sudo nmcli connection add type ethernet con-name 'static-eth0' ifname eth0 ip4 10.0.0.3/24 gw4 10.0.0.1
# Set DNS servers
sudo nmcli connection modify 'static-eth0' ipv4.dns "8.8.8.8 1.1.1.1"
sudo nmcli connection modify 'static-eth0' ipv4.ignore-auto-dns yes
sudo nmcli connection modify 'static-eth0' connection.autoconnect yes
# Bring the connection up
sudo nmcli connection up 'static-eth0'
* **Verify `pi-c02` Network State:**
ip addr show eth0
# Should show 'inet 10.0.0.3/24 ...'
ip route show default
# Should show 'default via 10.0.0.1 dev eth0 ...'
cat /etc/resolv.conf
# Should show nameservers 8.8.8.8 and 1.1.1.1
- Enable IP Forwarding and Configure
nftables
NAT/Firewall onpi-head
:- SSH back into
pi-head
. - Enable kernel IP forwarding:
- SSH back into
echo 'net.ipv4.ip_forward=1' | sudo tee /etc/sysctl.d/99-ip_forward.conf
sudo sysctl -p /etc/sysctl.d/99-ip_forward.conf # Apply immediately
sudo sysctl net.ipv4.ip_forward # Verify output is '= 1'
* **Install `nftables` (if not already present):**
sudo apt update
sudo apt install -y nftables
* **Create `nftables` Configuration:**
* Backup the default config: `sudo cp /etc/nftables.conf /etc/nftables.conf.bak`
* Edit the configuration file: `sudo vim /etc/nftables.conf`
* **Replace the entire content** with this ruleset (adjust `wlan0`/`eth0` if needed):
#!/usr/sbin/nft -f
# Flush the entire previous ruleset
flush ruleset
# Table for IPv4 NAT
table ip nat {
chain postrouting {
type nat hook postrouting priority 100; policy accept;
# Masquerade traffic from private network (eth0) going OUT via wlan0
oifname "wlan0" ip saddr 10.0.0.0/24 masquerade comment "NAT cluster traffic to WAN"
}
# Optional: Add prerouting rules here if needed for port forwarding INTO the cluster
}
# Table for IPv4/IPv6 Filtering
table inet filter {
chain input {
type filter hook input priority 0; policy accept;
# Basic stateful firewall for input traffic to pi-head
ct state established,related accept
# Allow loopback traffic
iifname "lo" accept
# Allow SSH (port 22) - Recommended! Add source IP ranges if possible
tcp dport 22 accept
# Allow ICMP (ping)
icmp type echo-request accept
# Allow traffic from cluster nodes (optional, if needed for services hosted on pi-head)
iifname "eth0" ip saddr 10.0.0.0/24 accept comment "Allow traffic from cluster nodes"
# Uncomment below to drop other traffic instead of accept-all policy
# drop
}
chain forward {
type filter hook forward priority 0; policy drop; # Default: Drop forwarded traffic
# Allow established/related connections coming back IN from WAN (wlan0) to LAN (eth0)
iifname "wlan0" oifname "eth0" ct state related,established accept comment "Allow established WAN to LAN"
# Allow NEW and established connections going OUT from LAN (eth0) to WAN (wlan0)
iifname "eth0" oifname "wlan0" ip saddr 10.0.0.0/24 accept comment "Allow LAN to WAN"
}
chain output {
type filter hook output priority 0; policy accept;
# Basic stateful firewall for output traffic from pi-head
ct state established,related accept
# Allow loopback traffic
oifname "lo" accept
# Uncomment below to drop other traffic instead of accept-all policy
# drop
}
}
* **Apply and Persist `nftables` Rules:**
sudo nft -f /etc/nftables.conf # Apply the ruleset, check for errors
sudo systemctl enable nftables.service # Make rules persistent on boot
sudo systemctl restart nftables.service # Restart service to load rules definitively
sudo systemctl status nftables.service # Check service status
sudo nft list ruleset # Review the active ruleset
- Troubleshoot SSH Slowness on
pi-head
(Potential Fix):- Slow SSH logins are often caused by the SSH server trying to perform a reverse DNS lookup on the connecting client’s IP address, which can time out if not configured correctly.
- On
pi-head
:
# Edit the SSH server configuration file
sudo vim /etc/ssh/sshd_config
* Find the line `#UseDNS yes` or `UseDNS yes`. Uncomment it if needed, and change `yes` to `no`:
UseDNS no
* Save the file and restart the SSH service:
sudo systemctl restart sshd
* Try SSHing into `pi-head` again from your workstation. If the login is now significantly faster, this was likely the cause.
- Test Basic Network Connectivity:
- From
pi-head
:
- From
ping -c 2 10.0.0.2 # Ping pi-c01
ping -c 2 10.0.0.3 # Ping pi-c02
* From `pi-c01`:
ping -c 2 10.0.0.1 # Ping pi-head
ping -c 2 10.0.0.3 # Ping pi-c02
* From `pi-c02`:
ping -c 2 10.0.0.1 # Ping pi-head
ping -c 2 10.0.0.2 # Ping pi-c01
* All these pings over the `10.0.0.x` network should work.
- Verify Compute Node Internet Access (Initial Test):
- From
pi-c01
:
- From
ping -c 3 8.8.8.8 # Test internet IP reachability
ping -c 3 google.com # Test DNS resolution + internet reachability
* From `pi-c02`:
ping -c 3 1.1.1.1 # Test internet IP reachability (different target)
ping -c 3 cloudflare.com # Test DNS resolution + internet reachability
* These tests should now succeed, routing through `pi-head`. If not, re-check `nftables` rules (`sudo nft list ruleset`), IP forwarding (`sudo sysctl net.ipv4.ip_forward`), and routing tables (`ip route`) on all nodes.
- Disable Wi-Fi on Compute Nodes (
pi-c01
,pi-c02
):- This confirms they rely solely on
eth0
for all traffic. - On
pi-c01
:
- This confirms they rely solely on
sudo nmcli radio wifi off
nmcli radio wifi # Verify output shows 'disabled'
ip a show wlan0 # Verify interface is DOWN or has no IP
* **On `pi-c02`:**
sudo nmcli radio wifi off
nmcli radio wifi # Verify output shows 'disabled'
ip a show wlan0 # Verify interface is DOWN or has no IP
- Final Connectivity Test (Compute Nodes via
eth0
only):- Repeat the internet connectivity tests from Step 8 on both
pi-c01
andpi-c02
:
- Repeat the internet connectivity tests from Step 8 on both
# On pi-c01
ping -c 3 8.8.8.8
ping -c 3 google.com
# On pi-c02
ping -c 3 1.1.1.1
ping -c 3 cloudflare.com
* If these tests still succeed with Wi-Fi disabled, your network routing is configured correctly.
- Update Compute Nodes:
- Now that
pi-c01
andpi-c02
have verified internet access viapi-head
, ensure they are fully updated:
- Now that
# On pi-c01 AND pi-c02
sudo apt update
sudo apt full-upgrade -y
# Install common tools if you haven't already
sudo apt install -y vim git build-essential
Phase 2 Completion: At this point, your network should be fully configured according to the requirements. pi-head
acts as the gateway, and pi-c01
/pi-c02
rely solely on their Ethernet connection to the private network for all communication, including internet access routed through pi-head
. You can now proceed to Phase 3: Common Cluster Environment Setup.
Phase 3: Common Cluster Environment Setup
(Perform steps on all nodes unless specified)
- Configure Hostname Resolution (
/etc/hosts
):- Edit the hosts file on all three nodes:
sudo vim /etc/hosts
- Ensure the following lines exist (add them if missing, below the
127.0.0.1 localhost
line):
- Edit the hosts file on all three nodes:
127.0.1.1 <current_hostname> # This line is usually added by hostnamectl
# Cluster Nodes
10.0.0.1 pi-head
10.0.0.2 pi-c01
10.0.0.3 pi-c02
* **Test:** From any node, ping the others by hostname (e.g., `ping -c 1 pi-c01` from `pi-head`).
- Create Common Cluster User (
cuser
):- Crucially,
cuser
must have the same User ID (UID) and Group ID (GID) on all nodes. - On
pi-head
first:
- Crucially,
sudo adduser cuser
# Follow prompts to set password etc.
# Note the UID and GID displayed (e.g., uid=1001(cuser) gid=1001(cuser) groups=...)
# Optional: Add cuser to the sudo group if needed for administration tasks
# sudo usermod -aG sudo cuser
* **On `pi-c01` and `pi-c02`:**
* Get the UID and GID from `pi-head`. Use `id cuser` on `pi-head`. Let's assume it was `1001` for both UID and GID. **Replace `1001` below if yours is different.**
# Create the group first with the specific GID
sudo groupadd -g 1001 cuser
# Create the user with the specific UID and GID
sudo useradd -u 1001 -g 1001 -m -s /bin/bash cuser
# Set the password for the new user
sudo passwd cuser
# Optional: Add to sudo group (use the same groups as on pi-head if needed)
# sudo usermod -aG sudo cuser
* **Verify:** Run `id cuser` on **all three** nodes. Ensure the UID and GID match exactly.
- Setup Passwordless SSH for
cuser
:- Log in as
cuser
onpi-head
. You can usesu - cuser
if logged in as another user, or SSH directly:ssh cuser@pi-head
. - Generate SSH key pair (run as
cuser
):
- Log in as
# Accept default file location (~/.ssh/id_rsa), press Enter for empty passphrase
ssh-keygen -t rsa -b 4096
* **Copy the public key to all nodes (including `pi-head` itself):**
# Run as cuser from pi-head
ssh-copy-id cuser@pi-head
ssh-copy-id cuser@pi-c01
ssh-copy-id cuser@pi-c02
# Enter the password for 'cuser' when prompted for each node
* **Test:** Still as `cuser` on `pi-head`, try SSHing to each node without a password:
ssh pi-head date
ssh pi-c01 date
ssh pi-c02 date
# The first time connecting to each might ask "Are you sure you want to continue connecting (yes/no/[fingerprint])?". Type 'yes'.
# If it prompts for a password after the first connection, the key setup failed. Check permissions in ~/.ssh directories.
- Install and Configure NFS (Shared Filesystem):
- We’ll share
/clusterfs
frompi-head
to be used by all nodes. Do this as the primary user, notcuser
- On
pi-head
(NFS Server):
- We’ll share
sudo apt update
sudo apt install -y nfs-kernel-server
sudo mkdir -p /clusterfs
# Option 1: Allow anyone to write (simple for cluster user)
sudo chown nobody:nogroup /clusterfs
sudo chmod 777 /clusterfs
# Option 2: Restrict to cuser (better security, requires consistent UID/GID)
# sudo chown cuser:cuser /clusterfs
# sudo chmod 770 /clusterfs # Or 750 if group members only need read
# Edit the NFS exports file
sudo nano /etc/exports
# Add this line to allow access from the private 10.0.0.x network:
# Use 'no_root_squash' carefully if you need root access over NFS
/clusterfs 10.0.0.0/24(rw,sync,no_subtree_check)
# Activate the exports
sudo exportfs -ra
# Restart and enable the NFS server service
sudo systemctl restart nfs-kernel-server
sudo systemctl enable nfs-kernel-server
* **On `pi-c01` and `pi-c02` (NFS Clients):**
sudo apt update
sudo apt install -y nfs-common
sudo mkdir -p /clusterfs
# Add the mount to /etc/fstab for automatic mounting on boot
sudo nano /etc/fstab
# Add this line at the end:
pi-head:/clusterfs /clusterfs nfs defaults,auto,nofail 0 0
# Mount all filesystems defined in fstab (including the new one)
sudo mount -a
# Verify the mount was successful
df -h | grep /clusterfs
# Check mount options (optional)
mount | grep /clusterfs
* From `pi-head` as `cuser`: `touch /clusterfs/test_head.txt`
* From `pi-c01` as `cuser`: `ls /clusterfs` (should see `test_head.txt`)
* From `pi-c02` as `cuser`: `touch /clusterfs/test_c02.txt`
* From `pi-head` as `cuser`: `ls /clusterfs` (should see both files)
- Install and Configure NTP (Time Synchronization): Accurate time is essential for SLURM.
- Install
chrony
on all nodes:
- Install
sudo apt update
sudo apt install -y chrony
* Ensure it's enabled and running on **all nodes**:
sudo systemctl enable --now chrony
* `chrony` will automatically use internet time sources. Since all nodes now have internet (directly or via `pi-head`), this should work.
* **Verify sync status** (might take a minute or two after starting):
# Run on all nodes
chronyc sources
# Look for lines starting with '^*' (synced server) or '^+ (acceptable server).
timedatectl status | grep "NTP service"
# Should show 'active'.
Phase 4: Install and Configure SLURM & Munge (Revised for RPi)
(Perform steps on nodes as indicated. Ensure you are logged in as your administrative user - piadmin
in these examples - who has sudo
privileges.)
- Install Munge (Authentication Service): Munge provides secure authentication between SLURM daemons. Install on all nodes.
# Run on pi-head, pi-c01, and pi-c02
sudo apt update
sudo apt install -y munge libmunge-dev libmunge2
- Create Munge Key: A shared secret key must be generated on one node.
- On
pi-head
ONLY:
- On
# STILL ON PI-HEAD, logged in as piadmin
sudo systemctl stop munge # Stop service if running
# Create the key
sudo dd if=/dev/urandom of=/etc/munge/munge.key bs=1 count=1024
# Set correct ownership and permissions
sudo chown munge:munge /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key
echo "Munge key created on pi-head."
- Securely Distribute Munge Key: Copy the key from
pi-head
to compute nodes using a temporary location, then move it withsudo
remotely.-
Run these command blocks from
pi-head
, logged in aspiadmin
:- For
pi-c01
:
- For
-
# ON PI-HEAD, as 'piadmin'
echo "Copying munge.key to pi-c01:/tmp/..."
sudo scp /etc/munge/munge.key piadmin@pi-c01:/tmp/munge.key.tmp
# Enter piadmin's password for pi-c01 if prompted by scp
echo "Connecting to pi-c01 to move munge.key and set permissions..."
ssh -t piadmin@pi-c01 << EOF
sudo systemctl stop munge # Ensure service is stopped before replacing key
sudo mv /tmp/munge.key.tmp /etc/munge/munge.key
sudo chown munge:munge /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key
echo "--- Verification on pi-c01 ---"
sudo ls -l /etc/munge/munge.key # Needs sudo to view details
echo "--- Done on pi-c01 ---"
EOF
# You will likely be prompted for piadmin's password for pi-c01 here by sudo
* **For `pi-c02`:**
# ON PI-HEAD, as 'piadmin'
echo "Copying munge.key to pi-c02:/tmp/..."
sudo scp /etc/munge/munge.key piadmin@pi-c02:/tmp/munge.key.tmp
# Enter piadmin's password for pi-c02 if prompted by scp
echo "Connecting to pi-c02 to move munge.key and set permissions..."
ssh -t piadmin@pi-c02 << EOF
sudo systemctl stop munge # Ensure service is stopped before replacing key
sudo mv /tmp/munge.key.tmp /etc/munge/munge.key
sudo chown munge:munge /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key
echo "--- Verification on pi-c02 ---"
sudo ls -l /etc/munge/munge.key # Needs sudo to view details
echo "--- Done on pi-c02 ---"
EOF
# You will likely be prompted for piadmin's password for pi-c02 here by sudo
- Start and Enable Munge Service: On all nodes:
# Run on pi-head, pi-c01, and pi-c02
sudo systemctl start munge
sudo systemctl enable munge
# Verify status
sudo systemctl status munge
# Check the status is active (running) on all three nodes.
- Test Munge Communication:
- From
pi-head
(aspiadmin
orcuser
):
- From
# Test local encoding/decoding
munge -n | unmunge
# Test head -> c01 (may need passwordless SSH for cuser setup first, or run as piadmin)
munge -n | ssh pi-c01 unmunge
# Test head -> c02
munge -n | ssh pi-c02 unmunge
# Test c01 -> head (round trip)
ssh pi-c01 munge -n | unmunge
* All tests should return a `STATUS: Success (...)` line. If not, double-check `munge.key` consistency (e.g., file size), permissions (`sudo ls -l /etc/munge/munge.key` on all nodes), and `munged` service status (`sudo systemctl status munge` on all nodes). Also check `/var/log/munge/munged.log`.
- Install SLURM: Install the SLURM workload manager packages on all nodes.
# Run on pi-head, pi-c01, and pi-c02
sudo apt update
sudo apt install -y slurm-wlm slurm-wlm-doc # slurm-wlm pulls in slurmd, slurmctld etc.
- Configure SLURM (
slurm.conf
):- Create the configuration file on
pi-head
first. - Edit the main config file using
nano
:sudo nano /etc/slurm/slurm.conf
- Replace the entire content with the following.
- Adjust
RealMemory
: Usefree -m
to see total memory in MiB. Leave some (~200-300MB) for the OS. For an 8GB Pi (approx 7850MB usable),7600
is a reasonable starting point. - CPUs: RPi 5 has 4 cores.
- Adjust
- Create the configuration file on
# /etc/slurm/slurm.conf
# Basic SLURM configuration for pi-cluster
ClusterName=pi-cluster
SlurmctldHost=pi-head #(Or use IP 10.0.0.1)
# SlurmctldHost=pi-head(10.0.0.1) # Optional: Specify both
MpiDefault=none
ProctrackType=proctrack/cgroup
ReturnToService=1
SlurmctldPidFile=/run/slurmctld.pid
SlurmdPidFile=/run/slurmd.pid
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/var/spool/slurmctld
SlurmdSpoolDir=/var/spool/slurmd
SwitchType=switch/none
TaskPlugin=task/cgroup
# LOGGING
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdLogFile=/var/log/slurm/slurmd.log
JobCompType=jobcomp/none # No job completion logging for basic setup
# TIMERS
SlurmctldTimeout=120
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_tres # Use cons_tres for memory tracking
SelectTypeParameters=CR_Core_Memory # Track Cores and Memory
# NODES - Adjust RealMemory based on your Pi 5 8GB (~7600 is conservative)
NodeName=pi-head NodeAddr=10.0.0.1 CPUs=4 RealMemory=7600 State=UNKNOWN
NodeName=pi-c01 NodeAddr=10.0.0.2 CPUs=4 RealMemory=7600 State=UNKNOWN
NodeName=pi-c02 NodeAddr=10.0.0.3 CPUs=4 RealMemory=7600 State=UNKNOWN
# PARTITION
PartitionName=rpi_part Nodes=pi-head,pi-c01,pi-c02 Default=YES MaxTime=INFINITE State=UP Oversubscribe=NO
* Save the file in `nano` (Ctrl+O, Enter) and exit (Ctrl+X).
* Create the SLURM log and spool directories **on all nodes**:
# Run on pi-head, pi-c01, and pi-c02
sudo mkdir -p /var/log/slurm /var/spool/slurmctld /var/spool/slurmd
# Verify slurm user/group exists (created by package install)
id slurm
# Set ownership to the 'slurm' user/group
sudo chown slurm:slurm /var/log/slurm /var/spool/slurmctld /var/spool/slurmd
sudo chmod 755 /var/log/slurm /var/spool/slurmctld /var/spool/slurmd
* Copy the `slurm.conf` file from `pi-head` to the compute nodes using the two-step method:
* **Run these command blocks *from* `pi-head`, logged in as `piadmin`:**
* **For `pi-c01`:**
# ON PI-HEAD, as 'piadmin'
echo "Copying slurm.conf to pi-c01:/tmp/..."
sudo scp /etc/slurm/slurm.conf piadmin@pi-c01:/tmp/slurm.conf
# Enter piadmin's password for pi-c01 if prompted by scp
echo "Connecting to pi-c01 to move slurm.conf and set permissions..."
ssh -t piadmin@pi-c01 << EOF
sudo mv /tmp/slurm.conf /etc/slurm/slurm.conf
sudo chown root:root /etc/slurm/slurm.conf # slurm.conf owned by root
sudo chmod 644 /etc/slurm/slurm.conf # Read access for all
echo "--- Verification on pi-c01 ---"
ls -l /etc/slurm/slurm.conf
echo "--- Done on pi-c01 ---"
EOF
# You will likely be prompted for piadmin's password for pi-c01 here by sudo
* **For `pi-c02`:**
# ON PI-HEAD, as 'piadmin'
echo "Copying slurm.conf to pi-c02:/tmp/..."
sudo scp /etc/slurm/slurm.conf piadmin@pi-c02:/tmp/slurm.conf
# Enter piadmin's password for pi-c02 if prompted by scp
echo "Connecting to pi-c02 to move slurm.conf and set permissions..."
ssh -t piadmin@pi-c02 << EOF
sudo mv /tmp/slurm.conf /etc/slurm/slurm.conf
sudo chown root:root /etc/slurm/slurm.conf # slurm.conf owned by root
sudo chmod 644 /etc/slurm/slurm.conf # Read access for all
echo "--- Verification on pi-c02 ---"
ls -l /etc/slurm/slurm.conf
echo "--- Done on pi-c02 ---"
EOF
# You will likely be prompted for piadmin's password for pi-c02 here by sudo
- Configure Cgroup Plugin (
cgroup.conf
): Needed for resource constraint (ProctrackType=proctrack/cgroup
,TaskPlugin=task/cgroup
,SelectType=select/cons_tres
).- Create
/etc/slurm/cgroup.conf
onpi-head
first:sudo nano /etc/slurm/cgroup.conf
- Add the following content:
- Create
# /etc/slurm/cgroup.conf
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm/cgroup"
ConstrainCores=yes
ConstrainDevices=yes
ConstrainRAMSpace=yes
# If using systemd (default on RPi OS Bookworm), TaskAffinity should generally be no
TaskAffinity=no
* Save the file (Ctrl+O, Enter) and exit (Ctrl+X).
* Create the `CgroupReleaseAgentDir` **on all nodes**:
# Run on pi-head, pi-c01, and pi-c02
sudo mkdir -p /etc/slurm/cgroup
# Ownership might vary, slurm:slurm or root:root can work. Start with slurm:slurm.
sudo chown slurm:slurm /etc/slurm/cgroup
* Copy `cgroup.conf` from `pi-head` to compute nodes using the two-step method:
* **Run these command blocks *from* `pi-head`, logged in as `piadmin`:**
* **For `pi-c01`:**
# ON PI-HEAD, as 'piadmin'
echo "Copying cgroup.conf to pi-c01:/tmp/..."
sudo scp /etc/slurm/cgroup.conf piadmin@pi-c01:/tmp/cgroup.conf
# Enter piadmin's password for pi-c01 if prompted by scp
echo "Connecting to pi-c01 to move cgroup.conf and set permissions..."
ssh -t piadmin@pi-c01 << EOF
sudo mv /tmp/cgroup.conf /etc/slurm/cgroup.conf
sudo chown root:root /etc/slurm/cgroup.conf # cgroup.conf owned by root
sudo chmod 644 /etc/slurm/cgroup.conf # Read access for all
echo "--- Verification on pi-c01 ---"
ls -l /etc/slurm/cgroup.conf
echo "--- Done on pi-c01 ---"
EOF
# You will likely be prompted for piadmin's password for pi-c01 here by sudo
* **For `pi-c02`:**
# ON PI-HEAD, as 'piadmin'
echo "Copying cgroup.conf to pi-c02:/tmp/..."
sudo scp /etc/slurm/cgroup.conf piadmin@pi-c02:/tmp/cgroup.conf
# Enter piadmin's password for pi-c02 if prompted by scp
echo "Connecting to pi-c02 to move cgroup.conf and set permissions..."
ssh -t piadmin@pi-c02 << EOF
sudo mv /tmp/cgroup.conf /etc/slurm/cgroup.conf
sudo chown root:root /etc/slurm/cgroup.conf # cgroup.conf owned by root
sudo chmod 644 /etc/slurm/cgroup.conf # Read access for all
echo "--- Verification on pi-c02 ---"
ls -l /etc/slurm/cgroup.conf
echo "--- Done on pi-c02 ---"
EOF
# You will likely be prompted for piadmin's password for pi-c02 here by sudo
- Start SLURM Services:
- On
pi-head
(Controller):
- On
# ON PI-HEAD, as piadmin
sudo systemctl enable slurmctld.service
sudo systemctl start slurmctld.service
# Check status immediately
sudo systemctl status slurmctld.service
# Check logs if status isn't active (running)
journalctl -u slurmctld.service | tail -n 30
tail -n 30 /var/log/slurm/slurmctld.log
* **On ALL nodes (Compute Daemons - including `pi-head`):**
# Run ON pi-head, pi-c01, AND pi-c02, as piadmin
sudo systemctl enable slurmd.service
sudo systemctl start slurmd.service
# Check status on each node
sudo systemctl status slurmd.service
# Check logs on each node if status isn't active (running)
tail -n 30 /var/log/slurm/slurmd.log
- Verify SLURM Cluster Status:
- Wait ~10-20 seconds for nodes to register. Run on
pi-head
(aspiadmin
orcuser
):
- Wait ~10-20 seconds for nodes to register. Run on
sinfo
# Expected output (might take a moment to show 'idle'):
# PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
# rpi_part* up infinite 3 idle pi-head,pi-c0[1-2]
# (State might be 'unk', 'down', 'drain' initially, or 'mix' if nodes are registering/alloc)
scontrol show node
# Check details for each node. Look for 'State=IDLE'. If 'State=DOWN' or 'State=DRAINED(Reason=...)', check logs:
# - /var/log/slurm/slurmctld.log on pi-head (essential for controller decisions)
# - /var/log/slurm/slurmd.log on the affected node(s) (essential for why a node isn't running)
# If nodes are down/drained due to initial errors that are now fixed:
# sudo scontrol update nodename=pi-head,pi-c01,pi-c02 state=resume
# Then check 'sinfo' again after a few seconds.
* **Common Issues & Log Checks:**
* **Time Sync:** `slurmctld` log might show "Time skew detected". Ensure `chrony` is running and synced on all nodes (`chronyc sources`).
* **Munge Auth:** `slurmctld` or `slurmd` logs might show "Invalid credential" or "Munge authentication failed". Re-verify Munge setup (key, permissions, service status, `munge -n | ssh ... unmunge` tests).
* **Network/Ports:** `slurmctld` log might show "Unable to contact slurmd" or `slurmd` log might show "Unable to contact slurmctld". Check firewall rules (if any beyond the `nftables` NAT setup) and ensure nodes can ping each other by hostname and IP (`10.0.0.x`). Ensure `slurm.conf` has correct `SlurmctldHost`, `NodeName`, `NodeAddr`.
* **Config Errors:** Logs might report invalid parameters in `slurm.conf` or `cgroup.conf`.
* **Spool/Log Dirs:** Logs might show permission errors writing to `/var/spool/slurm*` or `/var/log/slurm`. Verify ownership (`slurm:slurm`) and permissions (`755`).
* **Cgroup Issues:** `slurmd` logs might show errors related to cgroups. Ensure `TaskPlugin=task/cgroup` and `ProctrackType=proctrack/cgroup` are set, `cgroup.conf` is present and correct, and the `CgroupReleaseAgentDir` exists.
Phase 5: Testing the SLURM Cluster
(Run these commands as cuser
on pi-head
)
- Login as
cuser
:
su - cuser
# Or: ssh cuser@pi-head
cd /clusterfs # Work in the shared filesystem if desired
- Run a Simple Command Interactively:
srun hostname
# Runs 'hostname' on one available node in the default partition.
- Run Command on Specific Number of Nodes:
# Run hostname on 2 different nodes, 1 task per node
srun --nodes=2 --ntasks-per-node=1 hostname | sort
# Should show two different hostnames (e.g., pi-c01, pi-c02 or pi-head, pi-c01)
- Submit a Simple Batch Job:
- Create a job script file, e.g.,
/clusterfs/cuser/hello.sh
(ensure/clusterfs/cuser
exists and is writable bycuser
):
- Create a job script file, e.g.,
#!/bin/bash
#SBATCH --job-name=hello # Job name
#SBATCH --output=hello_job_%j.out # Standard output file (%j = job ID)
#SBATCH --error=hello_job_%j.err # Standard error file
#SBATCH --nodes=3 # Request all 3 nodes
#SBATCH --ntasks-per-node=2 # Request 2 tasks (processes) per node (total 6)
#SBATCH --cpus-per-task=1 # Request 1 CPU core per task
#SBATCH --partition=rpi_part # Specify partition (optional if default)
#SBATCH --time=00:05:00 # Time limit (5 minutes)
echo "Job running on nodes:"
srun hostname | sort # Use srun within sbatch to launch parallel tasks
echo "Tasks started at: $(date)"
sleep 20 # Simulate some work
echo "Tasks finished at: $(date)"
* Make the script executable: `chmod +x /clusterfs/cuser/hello.sh`
* Submit the job from the directory containing the script:
sbatch hello.sh
# Should print: Submitted batch job <JOB_ID>
* Check the queue:
squeue
# Shows running or pending jobs
watch squeue # Monitor queue updates
sinfo
# Should show nodes in 'alloc' or 'mix' state.
* Once the job finishes (disappears from `squeue`), check the output files (`hello_job_<JOB_ID>.out` and `.err`) in the submission directory:
cat hello_job_<JOB_ID>.out
# Should show hostnames from all 3 nodes, likely repeated if ntasks > nnodes * ntasks-per-node setting is used correctly.
# In this example, it should list pi-head, pi-c01, pi-c02 each twice.
Congratulations!
You should now have a functional 3-node Raspberry Pi 5 SLURM cluster. The compute nodes (pi-c01
, pi-c02
) use the head node (pi-head
) as a gateway for internet access, while all cluster communication happens over the private 10.0.0.x
network.
Next Steps & Considerations
- Install MPI: Install OpenMPI or MPICH (
sudo apt install -y openmpi-bin libopenmpi-dev
on all nodes) to run parallel MPI applications. Update SLURM’sMpiDefault=pmix
or configure MPI properly if needed. - Shared Software Stack: Install compilers, libraries, and applications needed for your HPC tasks onto the shared NFS filesystem (
/clusterfs
) so they are accessible from all nodes without needing installation everywhere. Modules systems like Lmod can help manage this. - Monitoring: Set up monitoring tools like
htop
,glances
, or more comprehensive systems like Prometheus + Grafana or Ganglia to observe cluster load and resource usage. - SLURM Tuning: Explore more advanced
slurm.conf
options: resource limits (memory, cores per job/user), Quality of Service (QoS), fair-share scheduling, job arrays. - SLURM Accounting: For tracking resource usage over time, set up the SLURM accounting database (
slurmdbd
) which requires installing and configuring a database (like MariaDB/MySQL). - Security: Review
iptables
rules, harden SSH (/etc/ssh/sshd_config
), and consider user permissions carefully. - Backup: Back up your
slurm.conf
,munge.key
, and important data on/clusterfs
.
Enjoy your mini HPC cluster!