Tutorial: Setting Up a 3-Node Raspberry Pi 5 SLURM Cluster (Rev3)
Created: 2025-04-15 03:33:03 | Last updated: 2025-04-15 03:33:03 | Status: Public
This tutorial guides you through setting up a small High-Performance Computing (HPC) cluster using three Raspberry Pi 5 devices, SLURM Workload Manager, and a specific network configuration involving both Wi-Fi and a private Ethernet network.
Cluster Configuration:
- Nodes: 3 x Raspberry Pi 5 (8GB RAM recommended)
- OS: Raspberry Pi OS Bookworm (64-bit recommended)
- Boot: From SSDs
- Cluster User:
cuser
- Networking:
pi-head
:- WLAN (
wlan0
): Connects to your main router via Wi-Fi, gets192.168.1.20
via DHCP reservation (Gateway:192.168.1.1
). Provides internet access. - Ethernet (
eth0
): Connects to private switch, static IP10.0.0.1/24
.
- WLAN (
pi-c01
:- Ethernet (
eth0
): Connects to private switch, static IP10.0.0.2/24
. Gateway viapi-head
(10.0.0.1
).
- Ethernet (
pi-c02
:- Ethernet (
eth0
): Connects to private switch, static IP10.0.0.3/24
. Gateway viapi-head
(10.0.0.1
).
- Ethernet (
- SLURM: Basic setup (
slurmctld
,slurmd
,munge
).
Prerequisites
- Hardware:
- 3 x Raspberry Pi 5 (8GB RAM)
- 3 x NVMe SSDs (or SATA SSDs with appropriate adapters) compatible with RPi 5 boot.
- 3 x Reliable Power Supplies for RPi 5 (5V/5A recommended).
- 1 x Gigabit Ethernet Switch (unmanaged is fine).
- 3 x Ethernet Cables.
- Access to your existing Wi-Fi network and router admin interface (for DHCP reservation).
- Software:
- Raspberry Pi Imager tool.
- Raspberry Pi OS Bookworm (64-bit recommended) flashed onto each SSD.
- Initial Setup:
- Ensure each Pi boots correctly from its SSD.
- Complete the initial Raspberry Pi OS setup wizard (create the initial user - this is NOT
cuser
yet, set locale, keyboard, etc.). - Enable SSH on each Pi:
sudo raspi-config
-> Interface Options -> SSH -> Enable. - Connect
pi-head
to your Wi-Fi network. - Configure the DHCP reservation on your OpenWRT router to assign
192.168.1.20
topi-head
’s WLAN MAC address. Verifypi-head
gets this IP (ip a show wlan0
). - Physically connect all three Pis to the Gigabit switch using Ethernet cables.
Phase 1: Basic OS Configuration & Hostnames
(Perform these steps on each Pi, adjusting hostnames accordingly. You’ll need SSH access.)
- Login: SSH into each Pi using the initial user you created during setup.
- Set Hostnames:
- On the first Pi (intended as head node):
sudo hostnamectl set-hostname pi-head
* On the second Pi (compute node 1):
sudo hostnamectl set-hostname pi-c01
* On the third Pi (compute node 2):
sudo hostnamectl set-hostname pi-c02
* Reboot each Pi (`sudo reboot`) or log out and log back in for the change to take effect in your shell prompt and network identity.
- Update System (
pi-head
only for now):- Ensure
pi-head
has internet via Wi-Fi. - SSH into
pi-head
:
- Ensure
sudo apt update
sudo apt full-upgrade -y
sudo apt install -y vim git build-essential # Essential tools
* *Note: We will update `pi-c01` and `pi-c02` after setting up network routing.*
Phase 2: Network Configuration (Revised)
Goal: Configure network interfaces on all three Raspberry Pis. pi-head
will connect to your home network/internet via Wi-Fi (wlan0
) and to the private cluster network via Ethernet (eth0
). pi-c01
and pi-c02
will connect only to the private cluster network via Ethernet (eth0
) and use pi-head
as their gateway to reach the internet. We will use nmcli
for interface configuration and nftables
for firewall/NAT on pi-head
.
Recap of Target Configuration:
pi-head
:wlan0
:192.168.1.20
(via DHCP reservation), Gateway192.168.1.1
(Internet Access)eth0
:10.0.0.1/24
(Static, Private Network)
pi-c01
:eth0
:10.0.0.2/24
(Static, Private Network), Gateway10.0.0.1
wlan0
: Disabled
pi-c02
:eth0
:10.0.0.3/24
(Static, Private Network), Gateway10.0.0.1
wlan0
: Disabled
Steps:
- Verify
pi-head
WLAN Connection:- SSH into
pi-head
. - Confirm it received the correct IP address from your router’s DHCP reservation and has a default route via your main gateway:
- SSH into
ip addr show wlan0
# Look for 'inet 192.168.1.20/XX ...' (XX is your subnet mask, often 24)
ip route show default
# Should show 'default via 192.168.1.1 dev wlan0 ...'
* If the IP or route is incorrect, double-check your router's DHCP reservation settings and ensure `pi-head`'s Wi-Fi is connected to the correct network.
- Configure
pi-head
Ethernet (eth0
- Private Network):- Still on
pi-head
. - Identify the Ethernet interface name (usually
eth0
):ip a
- Add a NetworkManager connection profile for
eth0
with the static private IP. We explicitly set no gateway and mark it as never the default route:
- Still on
# Replace 'eth0' if your interface name is different
sudo nmcli connection add type ethernet con-name 'static-eth0' ifname eth0 ip4 10.0.0.1/24
# Critical: Prevent this interface from ever becoming the default route
sudo nmcli connection modify 'static-eth0' ipv4.gateway '' # Ensure no gateway is set
sudo nmcli connection modify 'static-eth0' ipv4.never-default yes # Prevent it from being the default route
sudo nmcli connection modify 'static-eth0' connection.autoconnect yes # Connect automatically
# Bring the connection up (may happen automatically)
sudo nmcli connection up 'static-eth0'
* **Verify `pi-head` Network State:**
ip addr show eth0
# Should show 'inet 10.0.0.1/24 ...'
ip route show default
# Should STILL show 'default via 192.168.1.1 dev wlan0 ...'
- Configure
pi-c01
Ethernet (eth0
- Private Network):- SSH into
pi-c01
. (Use temporary keyboard/monitor or connecteth0
temporarily to main network if needed for first access). - Identify the Ethernet interface name (usually
eth0
):ip a
- Add the static IP configuration, setting
pi-head
(10.0.0.1
) as the gateway and providing DNS servers:
- SSH into
# Replace 'eth0' if needed
sudo nmcli connection add type ethernet con-name 'static-eth0' ifname eth0 ip4 10.0.0.2/24 gw4 10.0.0.1
# Set DNS servers (e.g., Google DNS and Cloudflare DNS)
# These requests will be routed via pi-head
sudo nmcli connection modify 'static-eth0' ipv4.dns "8.8.8.8 1.1.1.1"
sudo nmcli connection modify 'static-eth0' ipv4.ignore-auto-dns yes # Use only the specified DNS
sudo nmcli connection modify 'static-eth0' connection.autoconnect yes
# Bring the connection up
sudo nmcli connection up 'static-eth0'
* **Verify `pi-c01` Network State:**
ip addr show eth0
# Should show 'inet 10.0.0.2/24 ...'
ip route show default
# Should show 'default via 10.0.0.1 dev eth0 ...'
cat /etc/resolv.conf
# Should show 'nameserver 8.8.8.8' and 'nameserver 1.1.1.1'
- Configure
pi-c02
Ethernet (eth0
- Private Network):- SSH into
pi-c02
. - Identify the Ethernet interface name (usually
eth0
):ip a
- Add the static IP configuration, similar to
pi-c01
:
- SSH into
# Replace 'eth0' if needed
sudo nmcli connection add type ethernet con-name 'static-eth0' ifname eth0 ip4 10.0.0.3/24 gw4 10.0.0.1
# Set DNS servers
sudo nmcli connection modify 'static-eth0' ipv4.dns "8.8.8.8 1.1.1.1"
sudo nmcli connection modify 'static-eth0' ipv4.ignore-auto-dns yes
sudo nmcli connection modify 'static-eth0' connection.autoconnect yes
# Bring the connection up
sudo nmcli connection up 'static-eth0'
* **Verify `pi-c02` Network State:**
ip addr show eth0
# Should show 'inet 10.0.0.3/24 ...'
ip route show default
# Should show 'default via 10.0.0.1 dev eth0 ...'
cat /etc/resolv.conf
# Should show nameservers 8.8.8.8 and 1.1.1.1
- Enable IP Forwarding and Configure
nftables
NAT/Firewall onpi-head
:- SSH back into
pi-head
. - Enable kernel IP forwarding:
- SSH back into
echo 'net.ipv4.ip_forward=1' | sudo tee /etc/sysctl.d/99-ip_forward.conf
sudo sysctl -p /etc/sysctl.d/99-ip_forward.conf # Apply immediately
sudo sysctl net.ipv4.ip_forward # Verify output is '= 1'
* **Install `nftables` (if not already present):**
sudo apt update
sudo apt install -y nftables
* **Create `nftables` Configuration:**
* Backup the default config: `sudo cp /etc/nftables.conf /etc/nftables.conf.bak`
* Edit the configuration file: `sudo vim /etc/nftables.conf`
* **Replace the entire content** with this ruleset (adjust `wlan0`/`eth0` if needed):
#!/usr/sbin/nft -f
# Flush the entire previous ruleset
flush ruleset
# Table for IPv4 NAT
table ip nat {
chain postrouting {
type nat hook postrouting priority 100; policy accept;
# Masquerade traffic from private network (eth0) going OUT via wlan0
oifname "wlan0" ip saddr 10.0.0.0/24 masquerade comment "NAT cluster traffic to WAN"
}
# Optional: Add prerouting rules here if needed for port forwarding INTO the cluster
}
# Table for IPv4/IPv6 Filtering
table inet filter {
chain input {
type filter hook input priority 0; policy accept;
# Basic stateful firewall for input traffic to pi-head
ct state established,related accept
# Allow loopback traffic
iifname "lo" accept
# Allow SSH (port 22) - Recommended! Add source IP ranges if possible
tcp dport 22 accept
# Allow ICMP (ping)
icmp type echo-request accept
# Allow traffic from cluster nodes (optional, if needed for services hosted on pi-head)
iifname "eth0" ip saddr 10.0.0.0/24 accept comment "Allow traffic from cluster nodes"
# Uncomment below to drop other traffic instead of accept-all policy
# drop
}
chain forward {
type filter hook forward priority 0; policy drop; # Default: Drop forwarded traffic
# Allow established/related connections coming back IN from WAN (wlan0) to LAN (eth0)
iifname "wlan0" oifname "eth0" ct state related,established accept comment "Allow established WAN to LAN"
# Allow NEW and established connections going OUT from LAN (eth0) to WAN (wlan0)
iifname "eth0" oifname "wlan0" ip saddr 10.0.0.0/24 accept comment "Allow LAN to WAN"
}
chain output {
type filter hook output priority 0; policy accept;
# Basic stateful firewall for output traffic from pi-head
ct state established,related accept
# Allow loopback traffic
oifname "lo" accept
# Uncomment below to drop other traffic instead of accept-all policy
# drop
}
}
* **Apply and Persist `nftables` Rules:**
sudo nft -f /etc/nftables.conf # Apply the ruleset, check for errors
sudo systemctl enable nftables.service # Make rules persistent on boot
sudo systemctl restart nftables.service # Restart service to load rules definitively
sudo systemctl status nftables.service # Check service status
sudo nft list ruleset # Review the active ruleset
- Troubleshoot SSH Slowness on
pi-head
(Potential Fix):- Slow SSH logins are often caused by the SSH server trying to perform a reverse DNS lookup on the connecting client’s IP address, which can time out if not configured correctly.
- On
pi-head
:
# Edit the SSH server configuration file
sudo vim /etc/ssh/sshd_config
* Find the line `#UseDNS yes` or `UseDNS yes`. Uncomment it if needed, and change `yes` to `no`:
UseDNS no
* Save the file and restart the SSH service:
sudo systemctl restart sshd
* Try SSHing into `pi-head` again from your workstation. If the login is now significantly faster, this was likely the cause.
- Test Basic Network Connectivity:
- From
pi-head
:
- From
ping -c 2 10.0.0.2 # Ping pi-c01
ping -c 2 10.0.0.3 # Ping pi-c02
* From `pi-c01`:
ping -c 2 10.0.0.1 # Ping pi-head
ping -c 2 10.0.0.3 # Ping pi-c02
* From `pi-c02`:
ping -c 2 10.0.0.1 # Ping pi-head
ping -c 2 10.0.0.2 # Ping pi-c01
* All these pings over the `10.0.0.x` network should work.
- Verify Compute Node Internet Access (Initial Test):
- From
pi-c01
:
- From
ping -c 3 8.8.8.8 # Test internet IP reachability
ping -c 3 google.com # Test DNS resolution + internet reachability
* From `pi-c02`:
ping -c 3 1.1.1.1 # Test internet IP reachability (different target)
ping -c 3 cloudflare.com # Test DNS resolution + internet reachability
* These tests should now succeed, routing through `pi-head`. If not, re-check `nftables` rules (`sudo nft list ruleset`), IP forwarding (`sudo sysctl net.ipv4.ip_forward`), and routing tables (`ip route`) on all nodes.
- Disable Wi-Fi on Compute Nodes (
pi-c01
,pi-c02
):- This confirms they rely solely on
eth0
for all traffic. - On
pi-c01
:
- This confirms they rely solely on
sudo nmcli radio wifi off
nmcli radio wifi # Verify output shows 'disabled'
ip a show wlan0 # Verify interface is DOWN or has no IP
* **On `pi-c02`:**
sudo nmcli radio wifi off
nmcli radio wifi # Verify output shows 'disabled'
ip a show wlan0 # Verify interface is DOWN or has no IP
- Final Connectivity Test (Compute Nodes via
eth0
only):- Repeat the internet connectivity tests from Step 8 on both
pi-c01
andpi-c02
:
- Repeat the internet connectivity tests from Step 8 on both
# On pi-c01
ping -c 3 8.8.8.8
ping -c 3 google.com
# On pi-c02
ping -c 3 1.1.1.1
ping -c 3 cloudflare.com
* If these tests still succeed with Wi-Fi disabled, your network routing is configured correctly.
- Update Compute Nodes:
- Now that
pi-c01
andpi-c02
have verified internet access viapi-head
, ensure they are fully updated:
- Now that
# On pi-c01 AND pi-c02
sudo apt update
sudo apt full-upgrade -y
# Install common tools if you haven't already
sudo apt install -y vim git build-essential
Phase 2 Completion: At this point, your network should be fully configured according to the requirements. pi-head
acts as the gateway, and pi-c01
/pi-c02
rely solely on their Ethernet connection to the private network for all communication, including internet access routed through pi-head
. You can now proceed to Phase 3: Common Cluster Environment Setup.
Phase 3: Common Cluster Environment Setup
(Perform steps on all nodes unless specified)
- Configure Hostname Resolution (
/etc/hosts
):- Edit the hosts file on all three nodes:
sudo vim /etc/hosts
- Ensure the following lines exist (add them if missing, below the
127.0.0.1 localhost
line):
- Edit the hosts file on all three nodes:
127.0.1.1 <current_hostname> # This line is usually added by hostnamectl
# Cluster Nodes
10.0.0.1 pi-head
10.0.0.2 pi-c01
10.0.0.3 pi-c02
* **Test:** From any node, ping the others by hostname (e.g., `ping -c 1 pi-c01` from `pi-head`).
- Create Common Cluster User (
cuser
):- Crucially,
cuser
must have the same User ID (UID) and Group ID (GID) on all nodes. - On
pi-head
first:
- Crucially,
sudo adduser cuser
# Follow prompts to set password etc.
# Note the UID and GID displayed (e.g., uid=1001(cuser) gid=1001(cuser) groups=...)
# Optional: Add cuser to the sudo group if needed for administration tasks
# sudo usermod -aG sudo cuser
* **On `pi-c01` and `pi-c02`:**
* Get the UID and GID from `pi-head`. Use `id cuser` on `pi-head`. Let's assume it was `1001` for both UID and GID. **Replace `1001` below if yours is different.**
# Create the group first with the specific GID
sudo groupadd -g 1001 cuser
# Create the user with the specific UID and GID
sudo useradd -u 1001 -g 1001 -m -s /bin/bash cuser
# Set the password for the new user
sudo passwd cuser
# Optional: Add to sudo group (use the same groups as on pi-head if needed)
# sudo usermod -aG sudo cuser
* **Verify:** Run `id cuser` on **all three** nodes. Ensure the UID and GID match exactly.
- Setup Passwordless SSH for
cuser
:- Log in as
cuser
onpi-head
. You can usesu - cuser
if logged in as another user, or SSH directly:ssh cuser@pi-head
. - Generate SSH key pair (run as
cuser
):
- Log in as
# Accept default file location (~/.ssh/id_rsa), press Enter for empty passphrase
ssh-keygen -t rsa -b 4096
* **Copy the public key to all nodes (including `pi-head` itself):**
# Run as cuser from pi-head
ssh-copy-id cuser@pi-head
ssh-copy-id cuser@pi-c01
ssh-copy-id cuser@pi-c02
# Enter the password for 'cuser' when prompted for each node
* **Test:** Still as `cuser` on `pi-head`, try SSHing to each node without a password:
ssh pi-head date
ssh pi-c01 date
ssh pi-c02 date
# The first time connecting to each might ask "Are you sure you want to continue connecting (yes/no/[fingerprint])?". Type 'yes'.
# If it prompts for a password after the first connection, the key setup failed. Check permissions in ~/.ssh directories.
- Install and Configure NFS (Shared Filesystem):
- We’ll share
/clusterfs
frompi-head
to be used by all nodes. Do this as the primary user, notcuser
- On
pi-head
(NFS Server):
- We’ll share
sudo apt update
sudo apt install -y nfs-kernel-server
sudo mkdir -p /clusterfs
# Option 1: Allow anyone to write (simple for cluster user)
sudo chown nobody:nogroup /clusterfs
sudo chmod 777 /clusterfs
# Option 2: Restrict to cuser (better security, requires consistent UID/GID)
# sudo chown cuser:cuser /clusterfs
# sudo chmod 770 /clusterfs # Or 750 if group members only need read
# Edit the NFS exports file
sudo nano /etc/exports
# Add this line to allow access from the private 10.0.0.x network:
# Use 'no_root_squash' carefully if you need root access over NFS
/clusterfs 10.0.0.0/24(rw,sync,no_subtree_check)
# Activate the exports
sudo exportfs -ra
# Restart and enable the NFS server service
sudo systemctl restart nfs-kernel-server
sudo systemctl enable nfs-kernel-server
* **On `pi-c01` and `pi-c02` (NFS Clients):**
sudo apt update
sudo apt install -y nfs-common
sudo mkdir -p /clusterfs
# Add the mount to /etc/fstab for automatic mounting on boot
sudo nano /etc/fstab
# Add this line at the end:
pi-head:/clusterfs /clusterfs nfs defaults,auto,nofail 0 0
# Mount all filesystems defined in fstab (including the new one)
sudo mount -a
# Verify the mount was successful
df -h | grep /clusterfs
# Check mount options (optional)
mount | grep /clusterfs
* From `pi-head` as `cuser`: `touch /clusterfs/test_head.txt`
* From `pi-c01` as `cuser`: `ls /clusterfs` (should see `test_head.txt`)
* From `pi-c02` as `cuser`: `touch /clusterfs/test_c02.txt`
* From `pi-head` as `cuser`: `ls /clusterfs` (should see both files)
- Install and Configure NTP (Time Synchronization): Accurate time is essential for SLURM.
- Install
chrony
on all nodes:
- Install
sudo apt update
sudo apt install -y chrony
* Ensure it's enabled and running on **all nodes**:
sudo systemctl enable --now chrony
* `chrony` will automatically use internet time sources. Since all nodes now have internet (directly or via `pi-head`), this should work.
* **Verify sync status** (might take a minute or two after starting):
# Run on all nodes
chronyc sources
# Look for lines starting with '^*' (synced server) or '^+ (acceptable server).
timedatectl status | grep "NTP service"
# Should show 'active'.
Okay, absolutely. Based on the troubleshooting and successful resolution, here are the revised Phase 4 and Phase 5 sections incorporating the necessary fixes and explanations.
You can directly replace the original Phase 4 and Phase 5 in your tutorial markdown file with the following text.
(REVISED SECTION START)
Phase 4: Install and Configure SLURM & Munge (Revised for RPi & SLURM 22.05.8)
(Perform steps on nodes as indicated. Ensure you are logged in as your administrative user - davids
or piadmin
- who has sudo
privileges.)
- Install Munge (Authentication Service): Munge provides secure authentication between SLURM daemons. Install on all nodes.
# Run on pi-head, pi-c01, and pi-c02
sudo apt update
sudo apt install -y munge libmunge-dev libmunge2
- Create Munge Key: A shared secret key must be generated on one node.
- On
pi-head
ONLY:
- On
# STILL ON PI-HEAD, logged in as piadmin/davids
sudo systemctl stop munge # Stop service if running
# Create the key
sudo dd if=/dev/urandom of=/etc/munge/munge.key bs=1 count=1024
# Set correct ownership and permissions
sudo chown munge:munge /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key
echo "Munge key created on pi-head."
- Securely Distribute Munge Key: Copy the key from
pi-head
to compute nodes using a temporary location, then move it withsudo
remotely. Use your actual admin username (davids
orpiadmin
).-
Run these command blocks from
pi-head
, logged in as your admin user:- For
pi-c01
:
- For
-
# ON PI-HEAD, as admin user (e.g., davids)
echo "Copying munge.key to pi-c01:/tmp/..."
sudo scp /etc/munge/munge.key davids@pi-c01:/tmp/munge.key.tmp
# Enter admin user's password for pi-c01 if prompted by scp
echo "Connecting to pi-c01 to move munge.key and set permissions..."
ssh -t davids@pi-c01 << EOF
sudo systemctl stop munge # Ensure service is stopped before replacing key
sudo mv /tmp/munge.key.tmp /etc/munge/munge.key
sudo chown munge:munge /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key
echo "--- Verification on pi-c01 ---"
sudo ls -l /etc/munge/munge.key # Needs sudo to view details
echo "--- Done on pi-c01 ---"
EOF
# You will likely be prompted for admin user's password for pi-c01 here by sudo
* **For `pi-c02`:**
# ON PI-HEAD, as admin user (e.g., davids)
echo "Copying munge.key to pi-c02:/tmp/..."
sudo scp /etc/munge/munge.key davids@pi-c02:/tmp/munge.key.tmp
# Enter admin user's password for pi-c02 if prompted by scp
echo "Connecting to pi-c02 to move munge.key and set permissions..."
ssh -t davids@pi-c02 << EOF
sudo systemctl stop munge # Ensure service is stopped before replacing key
sudo mv /tmp/munge.key.tmp /etc/munge/munge.key
sudo chown munge:munge /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key
echo "--- Verification on pi-c02 ---"
sudo ls -l /etc/munge/munge.key # Needs sudo to view details
echo "--- Done on pi-c02 ---"
EOF
# You will likely be prompted for admin user's password for pi-c02 here by sudo
- Start and Enable Munge Service: On all nodes:
# Run on pi-head, pi-c01, and pi-c02 (as admin user)
sudo systemctl start munge
sudo systemctl enable munge
# Verify status
sudo systemctl status munge
# Check the status is active (running) on all three nodes.
- Test Munge Communication:
- From
pi-head
(as admin user orcuser
):
- From
# Test local encoding/decoding
munge -n | unmunge
# Test head -> c01 (use correct hostname and ensure passwordless SSH for cuser or run as admin)
munge -n | ssh pi-c01 unmunge
# Test head -> c02
munge -n | ssh pi-c02 unmunge
# Test c01 -> head (round trip)
ssh pi-c01 munge -n | unmunge
* All tests should return a `STATUS: Success (...)` line. If not, double-check `munge.key` consistency, permissions, and service status. Check `/var/log/munge/munged.log`.
- Install SLURM: Install the SLURM workload manager packages on all nodes.
# Run on pi-head, pi-c01, and pi-c02 (as admin user)
sudo apt update
sudo apt install -y slurm-wlm slurm-wlm-doc # slurm-wlm pulls in slurmd, slurmctld etc.
- Configure SLURM (
slurm.conf
):- Create the configuration file on
pi-head
first. - Edit the main config file:
sudo nano /etc/slurm/slurm.conf
- Replace the entire content with the following.
- Adjust
RealMemory
: Usefree -m
to see total memory in MiB. Leave some (~300-500MB) for the OS. For an 8GB Pi (approx 7850MB usable),7500
is a reasonable starting point. - CPUs: RPi 5 has 4 cores.
- Adjust
- Create the configuration file on
# /etc/slurm/slurm.conf
# Basic SLURM configuration for pi-cluster
ClusterName=pi-cluster
SlurmctldHost=pi-head #(Or use IP 10.0.0.1)
# SlurmctldHost=pi-head(10.0.0.1) # Optional: Specify both
MpiDefault=none # IMPORTANT: Keep as none unless MPI is fully configured.
ProctrackType=proctrack/cgroup
ReturnToService=1
SlurmctldPidFile=/run/slurmctld.pid
SlurmdPidFile=/run/slurmd.pid
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/var/spool/slurmctld
SlurmdSpoolDir=/var/spool/slurmd
SwitchType=switch/none
TaskPlugin=task/cgroup # Enable task cgroup plugin
# LOGGING
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdLogFile=/var/log/slurm/slurmd.log
JobCompType=jobcomp/none # No job completion logging for basic setup
# TIMERS
SlurmctldTimeout=120
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_tres # Use cons_tres for resource tracking (CPU, Mem)
SelectTypeParameters=CR_Core_Memory # Track Cores and Memory explicitly
# NODES - Adjust RealMemory based on your Pi 5 8GB (~7500 is conservative)
NodeName=pi-head NodeAddr=10.0.0.1 CPUs=4 RealMemory=7500 State=UNKNOWN
NodeName=pi-c01 NodeAddr=10.0.0.2 CPUs=4 RealMemory=7500 State=UNKNOWN
NodeName=pi-c02 NodeAddr=10.0.0.3 CPUs=4 RealMemory=7500 State=UNKNOWN
# PARTITION
PartitionName=rpi_part Nodes=pi-head,pi-c01,pi-c02 Default=YES MaxTime=INFINITE State=UP Oversubscribe=NO
* Save the file (Ctrl+O, Enter) and exit (Ctrl+X).
* Create the SLURM log and spool directories **on all nodes**:
# Run on pi-head, pi-c01, and pi-c02 (as admin user)
sudo mkdir -p /var/log/slurm /var/spool/slurmctld /var/spool/slurmd
# Verify slurm user/group exists (created by package install)
id slurm
# Set ownership to the 'slurm' user/group
sudo chown slurm:slurm /var/log/slurm /var/spool/slurmctld /var/spool/slurmd
sudo chmod 755 /var/log/slurm /var/spool/slurmctld /var/spool/slurmd
* Copy the `slurm.conf` file from `pi-head` to the compute nodes using the two-step method (replace `davids` with your admin username):
* **Run these command blocks *from* `pi-head`, logged in as your admin user:**
* **For `pi-c01`:**
# ON PI-HEAD, as admin user (e.g., davids)
echo "Copying slurm.conf to pi-c01:/tmp/..."
sudo scp /etc/slurm/slurm.conf davids@pi-c01:/tmp/slurm.conf
# Enter admin user's password for pi-c01 if prompted by scp
echo "Connecting to pi-c01 to move slurm.conf and set permissions..."
ssh -t davids@pi-c01 << EOF
sudo mv /tmp/slurm.conf /etc/slurm/slurm.conf
sudo chown root:root /etc/slurm/slurm.conf # slurm.conf owned by root
sudo chmod 644 /etc/slurm/slurm.conf # Read access for all
echo "--- Verification on pi-c01 ---"
ls -l /etc/slurm/slurm.conf
echo "--- Done on pi-c01 ---"
EOF
# You will likely be prompted for admin user's password for pi-c01 here by sudo
* **For `pi-c02`:**
# ON PI-HEAD, as admin user (e.g., davids)
echo "Copying slurm.conf to pi-c02:/tmp/..."
sudo scp /etc/slurm/slurm.conf davids@pi-c02:/tmp/slurm.conf
# Enter admin user's password for pi-c02 if prompted by scp
echo "Connecting to pi-c02 to move slurm.conf and set permissions..."
ssh -t davids@pi-c02 << EOF
sudo mv /tmp/slurm.conf /etc/slurm/slurm.conf
sudo chown root:root /etc/slurm/slurm.conf # slurm.conf owned by root
sudo chmod 644 /etc/slurm/slurm.conf # Read access for all
echo "--- Verification on pi-c02 ---"
ls -l /etc/slurm/slurm.conf
echo "--- Done on pi-c02 ---"
EOF
# You will likely be prompted for admin user's password for pi-c02 here by sudo
- Configure Cgroup Plugin (
cgroup.conf
- Corrected for SLURM 22.05.8): Needed for resource constraints (TaskPlugin=task/cgroup
,SelectType=select/cons_tres
).- Create
/etc/slurm/cgroup.conf
onpi-head
first:sudo nano /etc/slurm/cgroup.conf
- Add the following content. Note that
CgroupReleaseAgentDir
andTaskAffinity
must be commented out or removed as they are not recognized or are obsolete in SLURM 22.05.8 and cause fatal parsing errors if present.
- Create
# /etc/slurm/cgroup.conf
CgroupAutomount=yes
# CgroupReleaseAgentDir="/etc/slurm/cgroup" # Obsolete/Removed in this SLURM version
ConstrainCores=yes
ConstrainDevices=yes
ConstrainRAMSpace=yes # Needed for memory enforcement
# TaskAffinity=no # Unrecognized key in this SLURM version's cgroup.conf
* Save the file (Ctrl+O, Enter) and exit (Ctrl+X).
* Copy the **corrected** `cgroup.conf` from `pi-head` to compute nodes using the two-step method (replace `davids` with your admin username):
* **Run these command blocks *from* `pi-head`, logged in as your admin user:**
* **For `pi-c01`:**
# ON PI-HEAD, as admin user (e.g., davids)
echo "Copying cgroup.conf to pi-c01:/tmp/..."
sudo scp /etc/slurm/cgroup.conf davids@pi-c01:/tmp/cgroup.conf
# Enter admin user's password for pi-c01 if prompted by scp
echo "Connecting to pi-c01 to move cgroup.conf and set permissions..."
ssh -t davids@pi-c01 << EOF
sudo mv /tmp/cgroup.conf /etc/slurm/cgroup.conf
sudo chown root:root /etc/slurm/cgroup.conf # cgroup.conf owned by root
sudo chmod 644 /etc/slurm/cgroup.conf # Read access for all
echo "--- Verification on pi-c01 ---"
ls -l /etc/slurm/cgroup.conf
echo "--- Done on pi-c01 ---"
EOF
# You will likely be prompted for admin user's password for pi-c01 here by sudo
* **For `pi-c02`:**
# ON PI-HEAD, as admin user (e.g., davids)
echo "Copying cgroup.conf to pi-c02:/tmp/..."
sudo scp /etc/slurm/cgroup.conf davids@pi-c02:/tmp/cgroup.conf
# Enter admin user's password for pi-c02 if prompted by scp
echo "Connecting to pi-c02 to move cgroup.conf and set permissions..."
ssh -t davids@pi-c02 << EOF
sudo mv /tmp/cgroup.conf /etc/slurm/cgroup.conf
sudo chown root:root /etc/slurm/cgroup.conf # cgroup.conf owned by root
sudo chmod 644 /etc/slurm/cgroup.conf # Read access for all
echo "--- Verification on pi-c02 ---"
ls -l /etc/slurm/cgroup.conf
echo "--- Done on pi-c02 ---"
EOF
# You will likely be prompted for admin user's password for pi-c02 here by sudo
- Enable Memory Cgroup Controller (Kernel Parameter - CRITICAL FIX): To allow SLURM’s
task/cgroup
plugin to enforce memory limits (ConstrainRAMSpace=yes
), the kernel’s memory cgroup controller must be enabled. This requires a kernel command line change and a reboot.- On EACH node (
pi-head
,pi-c01
,pi-c02
):- Log in as your administrative user (e.g.,
davids
). - Edit the boot command line file:
- Log in as your administrative user (e.g.,
- On EACH node (
sudo nano /boot/firmware/cmdline.txt
* Go to the **very end** of the single line of text.
* Add a space, then append these two parameters exactly: `cgroup_enable=memory cgroup_memory=1`
* **Ensure these are added to the existing line with a space before them, and DO NOT create a new line.**
* Save the file (Ctrl+O, Enter) and exit (Ctrl+X).
* **Reboot ALL nodes:** This is required for the kernel parameter change to take effect.
# Run on pi-head (as admin user) to reboot compute nodes first:
ssh davids@pi-c01 'sudo reboot'
ssh davids@pi-c02 'sudo reboot'
# Then reboot the head node:
sudo reboot
* Wait several minutes for all nodes to fully reboot and reconnect to the network.
- Start SLURM Services (After Reboot):
- On
pi-head
(Controller):
- On
# SSH back into PI-HEAD as admin user after reboot
sudo systemctl enable slurmctld.service
sudo systemctl start slurmctld.service
# Check status immediately
sudo systemctl status slurmctld.service --no-pager
# Check logs if status isn't active (running)
# journalctl -u slurmctld.service -n 30 --no-pager
# sudo tail -n 30 /var/log/slurm/slurmctld.log
* **On ALL nodes (Compute Daemons - including `pi-head`):**
# Run ON pi-head (as admin user)
sudo systemctl enable slurmd.service
sudo systemctl start slurmd.service
sudo systemctl status slurmd.service --no-pager
# Run ON pi-c01 (remotely from pi-head, as admin user)
ssh davids@pi-c01 'sudo systemctl enable slurmd.service && sudo systemctl start slurmd.service && sudo systemctl status slurmd.service --no-pager'
# Run ON pi-c02 (remotely from pi-head, as admin user)
ssh davids@pi-c02 'sudo systemctl enable slurmd.service && sudo systemctl start slurmd.service && sudo systemctl status slurmd.service --no-pager'
# Check logs on any node if slurmd fails to start (e.g., ssh davids@pi-c01 'sudo tail -n 30 /var/log/slurm/slurmd.log')
- Verify SLURM Cluster Status:
- Wait ~15-30 seconds after confirming all services are running for nodes to register. Run on
pi-head
(as admin user orcuser
):
- Wait ~15-30 seconds after confirming all services are running for nodes to register. Run on
sinfo
# Expected output (all nodes should eventually be 'idle'):
# PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
# rpi_part* up infinite 3 idle pi-c[01-02],pi-head
scontrol show node
# Check details for each node. Look for 'State=IDLE'.
* **Troubleshooting Initial `down` State:** Sometimes, after a reboot or restart, nodes might initially appear as `down` in `sinfo` because `slurmctld` didn't hear back from `slurmd` quickly enough. If `slurmd` is confirmed running on the node (via `systemctl status`), you can often bring it back online using:
# Run on pi-head as admin user
sudo scontrol update nodename=<name_of_down_node> state=resume
# Example: sudo scontrol update nodename=pi-c01 state=resume
# Then check 'sinfo' again after a few seconds.
* **Log Check Reminder:** Check `/var/log/slurm/slurmctld.log` on `pi-head` and `/var/log/slurm/slurmd.log` on the specific node if nodes remain `down` or `unk` (unknown). The non-fatal `mpi/pmix_v4: ... can not load PMIx library` errors in the logs are expected with `MpiDefault=none` and can be ignored for basic operation.
Phase 5: Testing the SLURM Cluster
(This phase assumes you have completed Phase 4, all SLURM services are running, and sinfo
shows all nodes as idle
. Run these commands as cuser
on pi-head
)
- Login as
cuser
:
# On pi-head
su - cuser
# Or: ssh cuser@pi-head
# Optional: change to shared filesystem
cd /clusterfs
# Ensure user cuser has write permissions here or in a subdirectory like /clusterfs/cuser
# mkdir -p /clusterfs/cuser && cd /clusterfs/cuser
- Run a Simple Command Interactively:
# Runs 'hostname' on one available node in the default partition.
srun hostname
# Should print the hostname of one of the nodes (e.g., pi-head, pi-c01, or pi-c02)
- Run Command on Specific Number of Nodes:
# Run hostname on 2 different nodes, 1 task per node
srun --nodes=2 --ntasks-per-node=1 hostname | sort
# Should show two different hostnames (e.g., pi-c01, pi-c02 or pi-head, pi-c01)
- Submit a Simple Batch Job:
- Create a job script file, e.g.,
/clusterfs/cuser/hello.sh
(ensure/clusterfs/cuser
exists and is writable bycuser
):
- Create a job script file, e.g.,
# Create the directory if it doesn't exist
mkdir -p /clusterfs/cuser
# Create the script file using nano or vim
nano /clusterfs/cuser/hello.sh
* Paste the following content into the editor:
#!/bin/bash
#SBATCH --job-name=hello_rpi # Job name
#SBATCH --output=hello_job_%j.out # Standard output file (%j = job ID)
#SBATCH --error=hello_job_%j.err # Standard error file
#SBATCH --nodes=3 # Request all 3 nodes
#SBATCH --ntasks-per-node=2 # Request 2 tasks (processes) per node (total 6)
#SBATCH --cpus-per-task=1 # Request 1 CPU core per task (slurm handles affinity)
#SBATCH --mem-per-cpu=100M # Optional: Request memory per allocated CPU
#SBATCH --partition=rpi_part # Specify partition (optional if default)
#SBATCH --time=00:05:00 # Time limit (5 minutes)
echo "Job ID: $SLURM_JOB_ID running on nodes:"
# srun hostname | sort # Use srun within sbatch to launch parallel tasks across allocated resources
# More reliable way to get unique nodes allocated to the job:
echo $SLURM_JOB_NODELIST
echo "Tasks started at: $(date)"
# Simple workload: Print hostname and sleep
srun bash -c 'echo "Hello from $(hostname) (Task $SLURM_PROCID of $SLURM_NTASKS)"; sleep 10'
echo "Tasks finished at: $(date)"
* Save the file (Ctrl+O, Enter) and exit (Ctrl+X in nano).
* Make the script executable: `chmod +x /clusterfs/cuser/hello.sh`
* Submit the job from the directory containing the script (or specify the full path):
# Ensure you are in /clusterfs/cuser or use the full path
sbatch hello.sh
# Should print: Submitted batch job <JOB_ID>
* Check the queue:
squeue
# Shows running (R) or pending (PD) jobs. Should be running quickly.
watch squeue # Monitor queue updates automatically
* Check node status while job runs:
sinfo
# Should show nodes in 'alloc' or 'mix' state.
* Once the job finishes (disappears from `squeue`), check the output files (`hello_job_<JOB_ID>.out` and `.err`) in the submission directory (`/clusterfs/cuser`):
cat hello_job_<JOB_ID>.out
# Should show the job ID, the nodelist, start/end times, and "Hello from ..." lines from each of the 6 tasks run across the 3 nodes.
cat hello_job_<JOB_ID>.err
# Should ideally be empty.
Congratulations!
You should now have a fully functional 3-node Raspberry Pi 5 SLURM cluster, with working resource constraints (CPU and Memory via Cgroups) and networking correctly configured.
Next Steps & Considerations
- Install MPI: Install OpenMPI or MPICH (
sudo apt install -y openmpi-bin libopenmpi-dev
on all nodes) to run parallel MPI applications. Explore SLURM’s MPI integration (e.g.,--mpi=
options for srun, potentially installinglibpmix-dev
if using PMIx-aware MPI). - Shared Software Stack: Install compilers, libraries, and applications needed for your HPC tasks onto the shared NFS filesystem (
/clusterfs
) so they are accessible from all nodes without needing installation everywhere. Modules systems like Lmod can help manage this. - Monitoring: Set up monitoring tools like
htop
,glances
, or more comprehensive systems like Prometheus + Grafana or Ganglia to observe cluster load and resource usage. - SLURM Tuning: Explore more advanced
slurm.conf
options: resource limits (memory, cores per job/user), Quality of Service (QoS), fair-share scheduling, job arrays. - SLURM Accounting: For tracking resource usage over time, set up the SLURM accounting database (
slurmdbd
) which requires installing and configuring a database (like MariaDB/MySQL). - Security: Review firewall rules (
sudo nft list ruleset
), harden SSH (/etc/ssh/sshd_config
), and consider user permissions carefully. - Backup: Back up your
slurm.conf
,cgroup.conf
,munge.key
, and important data on/clusterfs
.
Enjoy your mini HPC cluster!
(REVISED SECTION END)