Tutorial: Setting Up a 3-Node Raspberry Pi 5 SLURM Cluster (Rev3)

Created: 2025-04-15 03:33:03 | Last updated: 2025-04-15 03:33:03 | Status: Public

This tutorial guides you through setting up a small High-Performance Computing (HPC) cluster using three Raspberry Pi 5 devices, SLURM Workload Manager, and a specific network configuration involving both Wi-Fi and a private Ethernet network.

Cluster Configuration:

  • Nodes: 3 x Raspberry Pi 5 (8GB RAM recommended)
  • OS: Raspberry Pi OS Bookworm (64-bit recommended)
  • Boot: From SSDs
  • Cluster User: cuser
  • Networking:
    • pi-head:
      • WLAN (wlan0): Connects to your main router via Wi-Fi, gets 192.168.1.20 via DHCP reservation (Gateway: 192.168.1.1). Provides internet access.
      • Ethernet (eth0): Connects to private switch, static IP 10.0.0.1/24.
    • pi-c01:
      • Ethernet (eth0): Connects to private switch, static IP 10.0.0.2/24. Gateway via pi-head (10.0.0.1).
    • pi-c02:
      • Ethernet (eth0): Connects to private switch, static IP 10.0.0.3/24. Gateway via pi-head (10.0.0.1).
  • SLURM: Basic setup (slurmctld, slurmd, munge).

Prerequisites

  1. Hardware:
    • 3 x Raspberry Pi 5 (8GB RAM)
    • 3 x NVMe SSDs (or SATA SSDs with appropriate adapters) compatible with RPi 5 boot.
    • 3 x Reliable Power Supplies for RPi 5 (5V/5A recommended).
    • 1 x Gigabit Ethernet Switch (unmanaged is fine).
    • 3 x Ethernet Cables.
    • Access to your existing Wi-Fi network and router admin interface (for DHCP reservation).
  2. Software:
    • Raspberry Pi Imager tool.
    • Raspberry Pi OS Bookworm (64-bit recommended) flashed onto each SSD.
  3. Initial Setup:
    • Ensure each Pi boots correctly from its SSD.
    • Complete the initial Raspberry Pi OS setup wizard (create the initial user - this is NOT cuser yet, set locale, keyboard, etc.).
    • Enable SSH on each Pi: sudo raspi-config -> Interface Options -> SSH -> Enable.
    • Connect pi-head to your Wi-Fi network.
    • Configure the DHCP reservation on your OpenWRT router to assign 192.168.1.20 to pi-head’s WLAN MAC address. Verify pi-head gets this IP (ip a show wlan0).
    • Physically connect all three Pis to the Gigabit switch using Ethernet cables.

Phase 1: Basic OS Configuration & Hostnames

(Perform these steps on each Pi, adjusting hostnames accordingly. You’ll need SSH access.)

  1. Login: SSH into each Pi using the initial user you created during setup.
  2. Set Hostnames:
    • On the first Pi (intended as head node):
        sudo hostnamectl set-hostname pi-head
*   On the second Pi (compute node 1):
        sudo hostnamectl set-hostname pi-c01
*   On the third Pi (compute node 2):
        sudo hostnamectl set-hostname pi-c02
*   Reboot each Pi (`sudo reboot`) or log out and log back in for the change to take effect in your shell prompt and network identity.
  1. Update System (pi-head only for now):
    • Ensure pi-head has internet via Wi-Fi.
    • SSH into pi-head:
        sudo apt update
        sudo apt full-upgrade -y
        sudo apt install -y vim git build-essential # Essential tools
*   *Note: We will update `pi-c01` and `pi-c02` after setting up network routing.*

Phase 2: Network Configuration (Revised)

Goal: Configure network interfaces on all three Raspberry Pis. pi-head will connect to your home network/internet via Wi-Fi (wlan0) and to the private cluster network via Ethernet (eth0). pi-c01 and pi-c02 will connect only to the private cluster network via Ethernet (eth0) and use pi-head as their gateway to reach the internet. We will use nmcli for interface configuration and nftables for firewall/NAT on pi-head.

Recap of Target Configuration:

  • pi-head:
    • wlan0: 192.168.1.20 (via DHCP reservation), Gateway 192.168.1.1 (Internet Access)
    • eth0: 10.0.0.1/24 (Static, Private Network)
  • pi-c01:
    • eth0: 10.0.0.2/24 (Static, Private Network), Gateway 10.0.0.1
    • wlan0: Disabled
  • pi-c02:
    • eth0: 10.0.0.3/24 (Static, Private Network), Gateway 10.0.0.1
    • wlan0: Disabled

Steps:

  1. Verify pi-head WLAN Connection:
    • SSH into pi-head.
    • Confirm it received the correct IP address from your router’s DHCP reservation and has a default route via your main gateway:
        ip addr show wlan0
        # Look for 'inet 192.168.1.20/XX ...' (XX is your subnet mask, often 24)

        ip route show default
        # Should show 'default via 192.168.1.1 dev wlan0 ...'
*   If the IP or route is incorrect, double-check your router's DHCP reservation settings and ensure `pi-head`'s Wi-Fi is connected to the correct network.
  1. Configure pi-head Ethernet (eth0 - Private Network):
    • Still on pi-head.
    • Identify the Ethernet interface name (usually eth0): ip a
    • Add a NetworkManager connection profile for eth0 with the static private IP. We explicitly set no gateway and mark it as never the default route:
        # Replace 'eth0' if your interface name is different
        sudo nmcli connection add type ethernet con-name 'static-eth0' ifname eth0 ip4 10.0.0.1/24

        # Critical: Prevent this interface from ever becoming the default route
        sudo nmcli connection modify 'static-eth0' ipv4.gateway '' # Ensure no gateway is set
        sudo nmcli connection modify 'static-eth0' ipv4.never-default yes # Prevent it from being the default route
        sudo nmcli connection modify 'static-eth0' connection.autoconnect yes # Connect automatically

        # Bring the connection up (may happen automatically)
        sudo nmcli connection up 'static-eth0'
*   **Verify `pi-head` Network State:**
        ip addr show eth0
        # Should show 'inet 10.0.0.1/24 ...'

        ip route show default
        # Should STILL show 'default via 192.168.1.1 dev wlan0 ...'
  1. Configure pi-c01 Ethernet (eth0 - Private Network):
    • SSH into pi-c01. (Use temporary keyboard/monitor or connect eth0 temporarily to main network if needed for first access).
    • Identify the Ethernet interface name (usually eth0): ip a
    • Add the static IP configuration, setting pi-head (10.0.0.1) as the gateway and providing DNS servers:
        # Replace 'eth0' if needed
        sudo nmcli connection add type ethernet con-name 'static-eth0' ifname eth0 ip4 10.0.0.2/24 gw4 10.0.0.1

        # Set DNS servers (e.g., Google DNS and Cloudflare DNS)
        # These requests will be routed via pi-head
        sudo nmcli connection modify 'static-eth0' ipv4.dns "8.8.8.8 1.1.1.1"
        sudo nmcli connection modify 'static-eth0' ipv4.ignore-auto-dns yes # Use only the specified DNS
        sudo nmcli connection modify 'static-eth0' connection.autoconnect yes

        # Bring the connection up
        sudo nmcli connection up 'static-eth0'
*   **Verify `pi-c01` Network State:**
        ip addr show eth0
        # Should show 'inet 10.0.0.2/24 ...'

        ip route show default
        # Should show 'default via 10.0.0.1 dev eth0 ...'

        cat /etc/resolv.conf
        # Should show 'nameserver 8.8.8.8' and 'nameserver 1.1.1.1'
  1. Configure pi-c02 Ethernet (eth0 - Private Network):
    • SSH into pi-c02.
    • Identify the Ethernet interface name (usually eth0): ip a
    • Add the static IP configuration, similar to pi-c01:
        # Replace 'eth0' if needed
        sudo nmcli connection add type ethernet con-name 'static-eth0' ifname eth0 ip4 10.0.0.3/24 gw4 10.0.0.1

        # Set DNS servers
        sudo nmcli connection modify 'static-eth0' ipv4.dns "8.8.8.8 1.1.1.1"
        sudo nmcli connection modify 'static-eth0' ipv4.ignore-auto-dns yes
        sudo nmcli connection modify 'static-eth0' connection.autoconnect yes

        # Bring the connection up
        sudo nmcli connection up 'static-eth0'
*   **Verify `pi-c02` Network State:**
        ip addr show eth0
        # Should show 'inet 10.0.0.3/24 ...'

        ip route show default
        # Should show 'default via 10.0.0.1 dev eth0 ...'

        cat /etc/resolv.conf
        # Should show nameservers 8.8.8.8 and 1.1.1.1
  1. Enable IP Forwarding and Configure nftables NAT/Firewall on pi-head:
    • SSH back into pi-head.
    • Enable kernel IP forwarding:
        echo 'net.ipv4.ip_forward=1' | sudo tee /etc/sysctl.d/99-ip_forward.conf
        sudo sysctl -p /etc/sysctl.d/99-ip_forward.conf # Apply immediately
        sudo sysctl net.ipv4.ip_forward # Verify output is '= 1'
*   **Install `nftables` (if not already present):**
        sudo apt update
        sudo apt install -y nftables
*   **Create `nftables` Configuration:**
    *   Backup the default config: `sudo cp /etc/nftables.conf /etc/nftables.conf.bak`
    *   Edit the configuration file: `sudo vim /etc/nftables.conf`
    *   **Replace the entire content** with this ruleset (adjust `wlan0`/`eth0` if needed):
            #!/usr/sbin/nft -f

            # Flush the entire previous ruleset
            flush ruleset

            # Table for IPv4 NAT
            table ip nat {
                chain postrouting {
                    type nat hook postrouting priority 100; policy accept;
                    # Masquerade traffic from private network (eth0) going OUT via wlan0
                    oifname "wlan0" ip saddr 10.0.0.0/24 masquerade comment "NAT cluster traffic to WAN"
                }
                # Optional: Add prerouting rules here if needed for port forwarding INTO the cluster
            }

            # Table for IPv4/IPv6 Filtering
            table inet filter {
                chain input {
                    type filter hook input priority 0; policy accept;
                    # Basic stateful firewall for input traffic to pi-head
                    ct state established,related accept
                    # Allow loopback traffic
                    iifname "lo" accept
                    # Allow SSH (port 22) - Recommended! Add source IP ranges if possible
                    tcp dport 22 accept
                    # Allow ICMP (ping)
                    icmp type echo-request accept
                    # Allow traffic from cluster nodes (optional, if needed for services hosted on pi-head)
                    iifname "eth0" ip saddr 10.0.0.0/24 accept comment "Allow traffic from cluster nodes"

                    # Uncomment below to drop other traffic instead of accept-all policy
                    # drop
                }

                chain forward {
                    type filter hook forward priority 0; policy drop; # Default: Drop forwarded traffic

                    # Allow established/related connections coming back IN from WAN (wlan0) to LAN (eth0)
                    iifname "wlan0" oifname "eth0" ct state related,established accept comment "Allow established WAN to LAN"

                    # Allow NEW and established connections going OUT from LAN (eth0) to WAN (wlan0)
                    iifname "eth0" oifname "wlan0" ip saddr 10.0.0.0/24 accept comment "Allow LAN to WAN"
                }

                chain output {
                    type filter hook output priority 0; policy accept;
                    # Basic stateful firewall for output traffic from pi-head
                    ct state established,related accept
                    # Allow loopback traffic
                    oifname "lo" accept

                    # Uncomment below to drop other traffic instead of accept-all policy
                    # drop
                }
            }
*   **Apply and Persist `nftables` Rules:**
        sudo nft -f /etc/nftables.conf # Apply the ruleset, check for errors
        sudo systemctl enable nftables.service # Make rules persistent on boot
        sudo systemctl restart nftables.service # Restart service to load rules definitively
        sudo systemctl status nftables.service # Check service status
        sudo nft list ruleset # Review the active ruleset
  1. Troubleshoot SSH Slowness on pi-head (Potential Fix):
    • Slow SSH logins are often caused by the SSH server trying to perform a reverse DNS lookup on the connecting client’s IP address, which can time out if not configured correctly.
    • On pi-head:
        # Edit the SSH server configuration file
        sudo vim /etc/ssh/sshd_config
*   Find the line `#UseDNS yes` or `UseDNS yes`. Uncomment it if needed, and change `yes` to `no`:
        UseDNS no
*   Save the file and restart the SSH service:
        sudo systemctl restart sshd
*   Try SSHing into `pi-head` again from your workstation. If the login is now significantly faster, this was likely the cause.
  1. Test Basic Network Connectivity:
    • From pi-head:
        ping -c 2 10.0.0.2 # Ping pi-c01
        ping -c 2 10.0.0.3 # Ping pi-c02
*   From `pi-c01`:
        ping -c 2 10.0.0.1 # Ping pi-head
        ping -c 2 10.0.0.3 # Ping pi-c02
*   From `pi-c02`:
        ping -c 2 10.0.0.1 # Ping pi-head
        ping -c 2 10.0.0.2 # Ping pi-c01
*   All these pings over the `10.0.0.x` network should work.
  1. Verify Compute Node Internet Access (Initial Test):
    • From pi-c01:
        ping -c 3 8.8.8.8      # Test internet IP reachability
        ping -c 3 google.com   # Test DNS resolution + internet reachability
*   From `pi-c02`:
        ping -c 3 1.1.1.1      # Test internet IP reachability (different target)
        ping -c 3 cloudflare.com # Test DNS resolution + internet reachability
*   These tests should now succeed, routing through `pi-head`. If not, re-check `nftables` rules (`sudo nft list ruleset`), IP forwarding (`sudo sysctl net.ipv4.ip_forward`), and routing tables (`ip route`) on all nodes.
  1. Disable Wi-Fi on Compute Nodes (pi-c01, pi-c02):
    • This confirms they rely solely on eth0 for all traffic.
    • On pi-c01:
        sudo nmcli radio wifi off
        nmcli radio wifi # Verify output shows 'disabled'
        ip a show wlan0 # Verify interface is DOWN or has no IP
*   **On `pi-c02`:**
        sudo nmcli radio wifi off
        nmcli radio wifi # Verify output shows 'disabled'
        ip a show wlan0 # Verify interface is DOWN or has no IP
  1. Final Connectivity Test (Compute Nodes via eth0 only):
    • Repeat the internet connectivity tests from Step 8 on both pi-c01 and pi-c02:
        # On pi-c01
        ping -c 3 8.8.8.8
        ping -c 3 google.com

        # On pi-c02
        ping -c 3 1.1.1.1
        ping -c 3 cloudflare.com
*   If these tests still succeed with Wi-Fi disabled, your network routing is configured correctly.
  1. Update Compute Nodes:
    • Now that pi-c01 and pi-c02 have verified internet access via pi-head, ensure they are fully updated:
        # On pi-c01 AND pi-c02
        sudo apt update
        sudo apt full-upgrade -y
        # Install common tools if you haven't already
        sudo apt install -y vim git build-essential

Phase 2 Completion: At this point, your network should be fully configured according to the requirements. pi-head acts as the gateway, and pi-c01/pi-c02 rely solely on their Ethernet connection to the private network for all communication, including internet access routed through pi-head. You can now proceed to Phase 3: Common Cluster Environment Setup.


Phase 3: Common Cluster Environment Setup

(Perform steps on all nodes unless specified)

  1. Configure Hostname Resolution (/etc/hosts):
    • Edit the hosts file on all three nodes: sudo vim /etc/hosts
    • Ensure the following lines exist (add them if missing, below the 127.0.0.1 localhost line):
        127.0.1.1       <current_hostname> # This line is usually added by hostnamectl

        # Cluster Nodes
        10.0.0.1    pi-head
        10.0.0.2    pi-c01
        10.0.0.3    pi-c02
*   **Test:** From any node, ping the others by hostname (e.g., `ping -c 1 pi-c01` from `pi-head`).
  1. Create Common Cluster User (cuser):
    • Crucially, cuser must have the same User ID (UID) and Group ID (GID) on all nodes.
    • On pi-head first:
        sudo adduser cuser
        # Follow prompts to set password etc.
        # Note the UID and GID displayed (e.g., uid=1001(cuser) gid=1001(cuser) groups=...)
        # Optional: Add cuser to the sudo group if needed for administration tasks
        # sudo usermod -aG sudo cuser
*   **On `pi-c01` and `pi-c02`:**
    *   Get the UID and GID from `pi-head`. Use `id cuser` on `pi-head`. Let's assume it was `1001` for both UID and GID. **Replace `1001` below if yours is different.**
        # Create the group first with the specific GID
        sudo groupadd -g 1001 cuser
        # Create the user with the specific UID and GID
        sudo useradd -u 1001 -g 1001 -m -s /bin/bash cuser
        # Set the password for the new user
        sudo passwd cuser
        # Optional: Add to sudo group (use the same groups as on pi-head if needed)
        # sudo usermod -aG sudo cuser
*   **Verify:** Run `id cuser` on **all three** nodes. Ensure the UID and GID match exactly.
  1. Setup Passwordless SSH for cuser:
    • Log in as cuser on pi-head. You can use su - cuser if logged in as another user, or SSH directly: ssh cuser@pi-head.
    • Generate SSH key pair (run as cuser):
        # Accept default file location (~/.ssh/id_rsa), press Enter for empty passphrase
        ssh-keygen -t rsa -b 4096
*   **Copy the public key to all nodes (including `pi-head` itself):**
        # Run as cuser from pi-head
        ssh-copy-id cuser@pi-head
        ssh-copy-id cuser@pi-c01
        ssh-copy-id cuser@pi-c02
        # Enter the password for 'cuser' when prompted for each node
*   **Test:** Still as `cuser` on `pi-head`, try SSHing to each node without a password:
        ssh pi-head date
        ssh pi-c01 date
        ssh pi-c02 date
        # The first time connecting to each might ask "Are you sure you want to continue connecting (yes/no/[fingerprint])?". Type 'yes'.
        # If it prompts for a password after the first connection, the key setup failed. Check permissions in ~/.ssh directories.
  1. Install and Configure NFS (Shared Filesystem):
    • We’ll share /clusterfs from pi-head to be used by all nodes. Do this as the primary user, not cuser
    • On pi-head (NFS Server):
        sudo apt update
        sudo apt install -y nfs-kernel-server
        sudo mkdir -p /clusterfs
        # Option 1: Allow anyone to write (simple for cluster user)
        sudo chown nobody:nogroup /clusterfs
        sudo chmod 777 /clusterfs
        # Option 2: Restrict to cuser (better security, requires consistent UID/GID)
        # sudo chown cuser:cuser /clusterfs
        # sudo chmod 770 /clusterfs # Or 750 if group members only need read
        # Edit the NFS exports file
        sudo nano /etc/exports
        # Add this line to allow access from the private 10.0.0.x network:
        # Use 'no_root_squash' carefully if you need root access over NFS
        /clusterfs    10.0.0.0/24(rw,sync,no_subtree_check)
        # Activate the exports
        sudo exportfs -ra
        # Restart and enable the NFS server service
        sudo systemctl restart nfs-kernel-server
        sudo systemctl enable nfs-kernel-server
*   **On `pi-c01` and `pi-c02` (NFS Clients):**
        sudo apt update
        sudo apt install -y nfs-common
        sudo mkdir -p /clusterfs
        # Add the mount to /etc/fstab for automatic mounting on boot
        sudo nano /etc/fstab
        # Add this line at the end:
        pi-head:/clusterfs    /clusterfs   nfs    defaults,auto,nofail    0    0
        # Mount all filesystems defined in fstab (including the new one)
        sudo mount -a
        # Verify the mount was successful
        df -h | grep /clusterfs
        # Check mount options (optional)
        mount | grep /clusterfs
    *   From `pi-head` as `cuser`: `touch /clusterfs/test_head.txt`
    *   From `pi-c01` as `cuser`: `ls /clusterfs` (should see `test_head.txt`)
    *   From `pi-c02` as `cuser`: `touch /clusterfs/test_c02.txt`
    *   From `pi-head` as `cuser`: `ls /clusterfs` (should see both files)
  1. Install and Configure NTP (Time Synchronization): Accurate time is essential for SLURM.
    • Install chrony on all nodes:
        sudo apt update
        sudo apt install -y chrony
*   Ensure it's enabled and running on **all nodes**:
        sudo systemctl enable --now chrony
*   `chrony` will automatically use internet time sources. Since all nodes now have internet (directly or via `pi-head`), this should work.
*   **Verify sync status** (might take a minute or two after starting):
        # Run on all nodes
        chronyc sources
        # Look for lines starting with '^*' (synced server) or '^+ (acceptable server).
        timedatectl status | grep "NTP service"
        # Should show 'active'.

Okay, absolutely. Based on the troubleshooting and successful resolution, here are the revised Phase 4 and Phase 5 sections incorporating the necessary fixes and explanations.

You can directly replace the original Phase 4 and Phase 5 in your tutorial markdown file with the following text.


(REVISED SECTION START)

Phase 4: Install and Configure SLURM & Munge (Revised for RPi & SLURM 22.05.8)

(Perform steps on nodes as indicated. Ensure you are logged in as your administrative user - davids or piadmin - who has sudo privileges.)

  1. Install Munge (Authentication Service): Munge provides secure authentication between SLURM daemons. Install on all nodes.
    # Run on pi-head, pi-c01, and pi-c02
    sudo apt update
    sudo apt install -y munge libmunge-dev libmunge2
  1. Create Munge Key: A shared secret key must be generated on one node.
    • On pi-head ONLY:
        # STILL ON PI-HEAD, logged in as piadmin/davids
        sudo systemctl stop munge # Stop service if running
        # Create the key
        sudo dd if=/dev/urandom of=/etc/munge/munge.key bs=1 count=1024
        # Set correct ownership and permissions
        sudo chown munge:munge /etc/munge/munge.key
        sudo chmod 400 /etc/munge/munge.key
        echo "Munge key created on pi-head."
  1. Securely Distribute Munge Key: Copy the key from pi-head to compute nodes using a temporary location, then move it with sudo remotely. Use your actual admin username (davids or piadmin).
    • Run these command blocks from pi-head, logged in as your admin user:

      • For pi-c01:
            # ON PI-HEAD, as admin user (e.g., davids)
            echo "Copying munge.key to pi-c01:/tmp/..."
            sudo scp /etc/munge/munge.key davids@pi-c01:/tmp/munge.key.tmp
            # Enter admin user's password for pi-c01 if prompted by scp

            echo "Connecting to pi-c01 to move munge.key and set permissions..."
            ssh -t davids@pi-c01 << EOF
            sudo systemctl stop munge # Ensure service is stopped before replacing key
            sudo mv /tmp/munge.key.tmp /etc/munge/munge.key
            sudo chown munge:munge /etc/munge/munge.key
            sudo chmod 400 /etc/munge/munge.key
            echo "--- Verification on pi-c01 ---"
            sudo ls -l /etc/munge/munge.key # Needs sudo to view details
            echo "--- Done on pi-c01 ---"
            EOF
            # You will likely be prompted for admin user's password for pi-c01 here by sudo
    *   **For `pi-c02`:**
            # ON PI-HEAD, as admin user (e.g., davids)
            echo "Copying munge.key to pi-c02:/tmp/..."
            sudo scp /etc/munge/munge.key davids@pi-c02:/tmp/munge.key.tmp
            # Enter admin user's password for pi-c02 if prompted by scp

            echo "Connecting to pi-c02 to move munge.key and set permissions..."
            ssh -t davids@pi-c02 << EOF
            sudo systemctl stop munge # Ensure service is stopped before replacing key
            sudo mv /tmp/munge.key.tmp /etc/munge/munge.key
            sudo chown munge:munge /etc/munge/munge.key
            sudo chmod 400 /etc/munge/munge.key
            echo "--- Verification on pi-c02 ---"
            sudo ls -l /etc/munge/munge.key # Needs sudo to view details
            echo "--- Done on pi-c02 ---"
            EOF
            # You will likely be prompted for admin user's password for pi-c02 here by sudo
  1. Start and Enable Munge Service: On all nodes:
    # Run on pi-head, pi-c01, and pi-c02 (as admin user)
    sudo systemctl start munge
    sudo systemctl enable munge
    # Verify status
    sudo systemctl status munge
    # Check the status is active (running) on all three nodes.
  1. Test Munge Communication:
    • From pi-head (as admin user or cuser):
        # Test local encoding/decoding
        munge -n | unmunge
        # Test head -> c01 (use correct hostname and ensure passwordless SSH for cuser or run as admin)
        munge -n | ssh pi-c01 unmunge
        # Test head -> c02
        munge -n | ssh pi-c02 unmunge
        # Test c01 -> head (round trip)
        ssh pi-c01 munge -n | unmunge
*   All tests should return a `STATUS: Success (...)` line. If not, double-check `munge.key` consistency, permissions, and service status. Check `/var/log/munge/munged.log`.
  1. Install SLURM: Install the SLURM workload manager packages on all nodes.
    # Run on pi-head, pi-c01, and pi-c02 (as admin user)
    sudo apt update
    sudo apt install -y slurm-wlm slurm-wlm-doc # slurm-wlm pulls in slurmd, slurmctld etc.
  1. Configure SLURM (slurm.conf):
    • Create the configuration file on pi-head first.
    • Edit the main config file: sudo nano /etc/slurm/slurm.conf
    • Replace the entire content with the following.
      • Adjust RealMemory: Use free -m to see total memory in MiB. Leave some (~300-500MB) for the OS. For an 8GB Pi (approx 7850MB usable), 7500 is a reasonable starting point.
      • CPUs: RPi 5 has 4 cores.
        # /etc/slurm/slurm.conf
        # Basic SLURM configuration for pi-cluster
        ClusterName=pi-cluster
        SlurmctldHost=pi-head #(Or use IP 10.0.0.1)
        # SlurmctldHost=pi-head(10.0.0.1) # Optional: Specify both
        MpiDefault=none           # IMPORTANT: Keep as none unless MPI is fully configured.
        ProctrackType=proctrack/cgroup
        ReturnToService=1
        SlurmctldPidFile=/run/slurmctld.pid
        SlurmdPidFile=/run/slurmd.pid
        SlurmctldPort=6817
        SlurmdPort=6818
        AuthType=auth/munge
        StateSaveLocation=/var/spool/slurmctld
        SlurmdSpoolDir=/var/spool/slurmd
        SwitchType=switch/none
        TaskPlugin=task/cgroup      # Enable task cgroup plugin
        # LOGGING
        SlurmctldLogFile=/var/log/slurm/slurmctld.log
        SlurmdLogFile=/var/log/slurm/slurmd.log
        JobCompType=jobcomp/none # No job completion logging for basic setup
        # TIMERS
        SlurmctldTimeout=120
        SlurmdTimeout=300
        InactiveLimit=0
        MinJobAge=300
        KillWait=30
        Waittime=0
        # SCHEDULING
        SchedulerType=sched/backfill
        SelectType=select/cons_tres # Use cons_tres for resource tracking (CPU, Mem)
        SelectTypeParameters=CR_Core_Memory # Track Cores and Memory explicitly
        # NODES - Adjust RealMemory based on your Pi 5 8GB (~7500 is conservative)
        NodeName=pi-head NodeAddr=10.0.0.1 CPUs=4 RealMemory=7500 State=UNKNOWN
        NodeName=pi-c01 NodeAddr=10.0.0.2 CPUs=4 RealMemory=7500 State=UNKNOWN
        NodeName=pi-c02 NodeAddr=10.0.0.3 CPUs=4 RealMemory=7500 State=UNKNOWN
        # PARTITION
        PartitionName=rpi_part Nodes=pi-head,pi-c01,pi-c02 Default=YES MaxTime=INFINITE State=UP Oversubscribe=NO
    *   Save the file (Ctrl+O, Enter) and exit (Ctrl+X).
*   Create the SLURM log and spool directories **on all nodes**:
        # Run on pi-head, pi-c01, and pi-c02 (as admin user)
        sudo mkdir -p /var/log/slurm /var/spool/slurmctld /var/spool/slurmd
        # Verify slurm user/group exists (created by package install)
        id slurm
        # Set ownership to the 'slurm' user/group
        sudo chown slurm:slurm /var/log/slurm /var/spool/slurmctld /var/spool/slurmd
        sudo chmod 755 /var/log/slurm /var/spool/slurmctld /var/spool/slurmd
*   Copy the `slurm.conf` file from `pi-head` to the compute nodes using the two-step method (replace `davids` with your admin username):
    *   **Run these command blocks *from* `pi-head`, logged in as your admin user:**

        *   **For `pi-c01`:**
                # ON PI-HEAD, as admin user (e.g., davids)
                echo "Copying slurm.conf to pi-c01:/tmp/..."
                sudo scp /etc/slurm/slurm.conf davids@pi-c01:/tmp/slurm.conf
                # Enter admin user's password for pi-c01 if prompted by scp

                echo "Connecting to pi-c01 to move slurm.conf and set permissions..."
                ssh -t davids@pi-c01 << EOF
                sudo mv /tmp/slurm.conf /etc/slurm/slurm.conf
                sudo chown root:root /etc/slurm/slurm.conf # slurm.conf owned by root
                sudo chmod 644 /etc/slurm/slurm.conf     # Read access for all
                echo "--- Verification on pi-c01 ---"
                ls -l /etc/slurm/slurm.conf
                echo "--- Done on pi-c01 ---"
                EOF
                # You will likely be prompted for admin user's password for pi-c01 here by sudo
        *   **For `pi-c02`:**
                # ON PI-HEAD, as admin user (e.g., davids)
                echo "Copying slurm.conf to pi-c02:/tmp/..."
                sudo scp /etc/slurm/slurm.conf davids@pi-c02:/tmp/slurm.conf
                # Enter admin user's password for pi-c02 if prompted by scp

                echo "Connecting to pi-c02 to move slurm.conf and set permissions..."
                ssh -t davids@pi-c02 << EOF
                sudo mv /tmp/slurm.conf /etc/slurm/slurm.conf
                sudo chown root:root /etc/slurm/slurm.conf # slurm.conf owned by root
                sudo chmod 644 /etc/slurm/slurm.conf     # Read access for all
                echo "--- Verification on pi-c02 ---"
                ls -l /etc/slurm/slurm.conf
                echo "--- Done on pi-c02 ---"
                EOF
                # You will likely be prompted for admin user's password for pi-c02 here by sudo
  1. Configure Cgroup Plugin (cgroup.conf - Corrected for SLURM 22.05.8): Needed for resource constraints (TaskPlugin=task/cgroup, SelectType=select/cons_tres).
    • Create /etc/slurm/cgroup.conf on pi-head first: sudo nano /etc/slurm/cgroup.conf
    • Add the following content. Note that CgroupReleaseAgentDir and TaskAffinity must be commented out or removed as they are not recognized or are obsolete in SLURM 22.05.8 and cause fatal parsing errors if present.
        # /etc/slurm/cgroup.conf
        CgroupAutomount=yes
        # CgroupReleaseAgentDir="/etc/slurm/cgroup"  # Obsolete/Removed in this SLURM version
        ConstrainCores=yes
        ConstrainDevices=yes
        ConstrainRAMSpace=yes                      # Needed for memory enforcement
        # TaskAffinity=no                          # Unrecognized key in this SLURM version's cgroup.conf
    *   Save the file (Ctrl+O, Enter) and exit (Ctrl+X).
*   Copy the **corrected** `cgroup.conf` from `pi-head` to compute nodes using the two-step method (replace `davids` with your admin username):
    *   **Run these command blocks *from* `pi-head`, logged in as your admin user:**

        *   **For `pi-c01`:**
                # ON PI-HEAD, as admin user (e.g., davids)
                echo "Copying cgroup.conf to pi-c01:/tmp/..."
                sudo scp /etc/slurm/cgroup.conf davids@pi-c01:/tmp/cgroup.conf
                # Enter admin user's password for pi-c01 if prompted by scp

                echo "Connecting to pi-c01 to move cgroup.conf and set permissions..."
                ssh -t davids@pi-c01 << EOF
                sudo mv /tmp/cgroup.conf /etc/slurm/cgroup.conf
                sudo chown root:root /etc/slurm/cgroup.conf # cgroup.conf owned by root
                sudo chmod 644 /etc/slurm/cgroup.conf     # Read access for all
                echo "--- Verification on pi-c01 ---"
                ls -l /etc/slurm/cgroup.conf
                echo "--- Done on pi-c01 ---"
                EOF
                # You will likely be prompted for admin user's password for pi-c01 here by sudo
        *   **For `pi-c02`:**
                # ON PI-HEAD, as admin user (e.g., davids)
                echo "Copying cgroup.conf to pi-c02:/tmp/..."
                sudo scp /etc/slurm/cgroup.conf davids@pi-c02:/tmp/cgroup.conf
                # Enter admin user's password for pi-c02 if prompted by scp

                echo "Connecting to pi-c02 to move cgroup.conf and set permissions..."
                ssh -t davids@pi-c02 << EOF
                sudo mv /tmp/cgroup.conf /etc/slurm/cgroup.conf
                sudo chown root:root /etc/slurm/cgroup.conf # cgroup.conf owned by root
                sudo chmod 644 /etc/slurm/cgroup.conf     # Read access for all
                echo "--- Verification on pi-c02 ---"
                ls -l /etc/slurm/cgroup.conf
                echo "--- Done on pi-c02 ---"
                EOF
                # You will likely be prompted for admin user's password for pi-c02 here by sudo
  1. Enable Memory Cgroup Controller (Kernel Parameter - CRITICAL FIX): To allow SLURM’s task/cgroup plugin to enforce memory limits (ConstrainRAMSpace=yes), the kernel’s memory cgroup controller must be enabled. This requires a kernel command line change and a reboot.
    • On EACH node (pi-head, pi-c01, pi-c02):
      • Log in as your administrative user (e.g., davids).
      • Edit the boot command line file:
            sudo nano /boot/firmware/cmdline.txt
    *   Go to the **very end** of the single line of text.
    *   Add a space, then append these two parameters exactly: `cgroup_enable=memory cgroup_memory=1`
    *   **Ensure these are added to the existing line with a space before them, and DO NOT create a new line.**
    *   Save the file (Ctrl+O, Enter) and exit (Ctrl+X).
*   **Reboot ALL nodes:** This is required for the kernel parameter change to take effect.
        # Run on pi-head (as admin user) to reboot compute nodes first:
        ssh davids@pi-c01 'sudo reboot'
        ssh davids@pi-c02 'sudo reboot'

        # Then reboot the head node:
        sudo reboot
*   Wait several minutes for all nodes to fully reboot and reconnect to the network.
  1. Start SLURM Services (After Reboot):
    • On pi-head (Controller):
        # SSH back into PI-HEAD as admin user after reboot
        sudo systemctl enable slurmctld.service
        sudo systemctl start slurmctld.service
        # Check status immediately
        sudo systemctl status slurmctld.service --no-pager
        # Check logs if status isn't active (running)
        # journalctl -u slurmctld.service -n 30 --no-pager
        # sudo tail -n 30 /var/log/slurm/slurmctld.log
*   **On ALL nodes (Compute Daemons - including `pi-head`):**
        # Run ON pi-head (as admin user)
        sudo systemctl enable slurmd.service
        sudo systemctl start slurmd.service
        sudo systemctl status slurmd.service --no-pager

        # Run ON pi-c01 (remotely from pi-head, as admin user)
        ssh davids@pi-c01 'sudo systemctl enable slurmd.service && sudo systemctl start slurmd.service && sudo systemctl status slurmd.service --no-pager'

        # Run ON pi-c02 (remotely from pi-head, as admin user)
        ssh davids@pi-c02 'sudo systemctl enable slurmd.service && sudo systemctl start slurmd.service && sudo systemctl status slurmd.service --no-pager'

        # Check logs on any node if slurmd fails to start (e.g., ssh davids@pi-c01 'sudo tail -n 30 /var/log/slurm/slurmd.log')
  1. Verify SLURM Cluster Status:
    • Wait ~15-30 seconds after confirming all services are running for nodes to register. Run on pi-head (as admin user or cuser):
        sinfo
        # Expected output (all nodes should eventually be 'idle'):
        # PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
        # rpi_part*    up   infinite      3   idle pi-c[01-02],pi-head

        scontrol show node
        # Check details for each node. Look for 'State=IDLE'.
*   **Troubleshooting Initial `down` State:** Sometimes, after a reboot or restart, nodes might initially appear as `down` in `sinfo` because `slurmctld` didn't hear back from `slurmd` quickly enough. If `slurmd` is confirmed running on the node (via `systemctl status`), you can often bring it back online using:
        # Run on pi-head as admin user
        sudo scontrol update nodename=<name_of_down_node> state=resume
        # Example: sudo scontrol update nodename=pi-c01 state=resume
        # Then check 'sinfo' again after a few seconds.
*   **Log Check Reminder:** Check `/var/log/slurm/slurmctld.log` on `pi-head` and `/var/log/slurm/slurmd.log` on the specific node if nodes remain `down` or `unk` (unknown). The non-fatal `mpi/pmix_v4: ... can not load PMIx library` errors in the logs are expected with `MpiDefault=none` and can be ignored for basic operation.

Phase 5: Testing the SLURM Cluster

(This phase assumes you have completed Phase 4, all SLURM services are running, and sinfo shows all nodes as idle. Run these commands as cuser on pi-head)

  1. Login as cuser:
    # On pi-head
    su - cuser
    # Or: ssh cuser@pi-head

    # Optional: change to shared filesystem
    cd /clusterfs
    # Ensure user cuser has write permissions here or in a subdirectory like /clusterfs/cuser
    # mkdir -p /clusterfs/cuser && cd /clusterfs/cuser
  1. Run a Simple Command Interactively:
    # Runs 'hostname' on one available node in the default partition.
    srun hostname
    # Should print the hostname of one of the nodes (e.g., pi-head, pi-c01, or pi-c02)
  1. Run Command on Specific Number of Nodes:
    # Run hostname on 2 different nodes, 1 task per node
    srun --nodes=2 --ntasks-per-node=1 hostname | sort
    # Should show two different hostnames (e.g., pi-c01, pi-c02 or pi-head, pi-c01)
  1. Submit a Simple Batch Job:
    • Create a job script file, e.g., /clusterfs/cuser/hello.sh (ensure /clusterfs/cuser exists and is writable by cuser):
        # Create the directory if it doesn't exist
        mkdir -p /clusterfs/cuser
        # Create the script file using nano or vim
        nano /clusterfs/cuser/hello.sh
*   Paste the following content into the editor:
        #!/bin/bash
        #SBATCH --job-name=hello_rpi    # Job name
        #SBATCH --output=hello_job_%j.out # Standard output file (%j = job ID)
        #SBATCH --error=hello_job_%j.err  # Standard error file
        #SBATCH --nodes=3                 # Request all 3 nodes
        #SBATCH --ntasks-per-node=2       # Request 2 tasks (processes) per node (total 6)
        #SBATCH --cpus-per-task=1         # Request 1 CPU core per task (slurm handles affinity)
        #SBATCH --mem-per-cpu=100M        # Optional: Request memory per allocated CPU
        #SBATCH --partition=rpi_part      # Specify partition (optional if default)
        #SBATCH --time=00:05:00           # Time limit (5 minutes)

        echo "Job ID: $SLURM_JOB_ID running on nodes:"
        # srun hostname | sort # Use srun within sbatch to launch parallel tasks across allocated resources
        # More reliable way to get unique nodes allocated to the job:
        echo $SLURM_JOB_NODELIST

        echo "Tasks started at: $(date)"
        # Simple workload: Print hostname and sleep
        srun bash -c 'echo "Hello from $(hostname) (Task $SLURM_PROCID of $SLURM_NTASKS)"; sleep 10'
        echo "Tasks finished at: $(date)"
*   Save the file (Ctrl+O, Enter) and exit (Ctrl+X in nano).
*   Make the script executable: `chmod +x /clusterfs/cuser/hello.sh`
*   Submit the job from the directory containing the script (or specify the full path):
        # Ensure you are in /clusterfs/cuser or use the full path
        sbatch hello.sh
        # Should print: Submitted batch job <JOB_ID>
*   Check the queue:
        squeue
        # Shows running (R) or pending (PD) jobs. Should be running quickly.
        watch squeue # Monitor queue updates automatically
*   Check node status while job runs:
        sinfo
        # Should show nodes in 'alloc' or 'mix' state.
*   Once the job finishes (disappears from `squeue`), check the output files (`hello_job_<JOB_ID>.out` and `.err`) in the submission directory (`/clusterfs/cuser`):
        cat hello_job_<JOB_ID>.out
        # Should show the job ID, the nodelist, start/end times, and "Hello from ..." lines from each of the 6 tasks run across the 3 nodes.
        cat hello_job_<JOB_ID>.err
        # Should ideally be empty.

Congratulations!

You should now have a fully functional 3-node Raspberry Pi 5 SLURM cluster, with working resource constraints (CPU and Memory via Cgroups) and networking correctly configured.

Next Steps & Considerations

  • Install MPI: Install OpenMPI or MPICH (sudo apt install -y openmpi-bin libopenmpi-dev on all nodes) to run parallel MPI applications. Explore SLURM’s MPI integration (e.g., --mpi= options for srun, potentially installing libpmix-dev if using PMIx-aware MPI).
  • Shared Software Stack: Install compilers, libraries, and applications needed for your HPC tasks onto the shared NFS filesystem (/clusterfs) so they are accessible from all nodes without needing installation everywhere. Modules systems like Lmod can help manage this.
  • Monitoring: Set up monitoring tools like htop, glances, or more comprehensive systems like Prometheus + Grafana or Ganglia to observe cluster load and resource usage.
  • SLURM Tuning: Explore more advanced slurm.conf options: resource limits (memory, cores per job/user), Quality of Service (QoS), fair-share scheduling, job arrays.
  • SLURM Accounting: For tracking resource usage over time, set up the SLURM accounting database (slurmdbd) which requires installing and configuring a database (like MariaDB/MySQL).
  • Security: Review firewall rules (sudo nft list ruleset), harden SSH (/etc/ssh/sshd_config), and consider user permissions carefully.
  • Backup: Back up your slurm.conf, cgroup.conf, munge.key, and important data on /clusterfs.

Enjoy your mini HPC cluster!

(REVISED SECTION END)