Training LORA for Pixel Art Tilemaps with SD 1.5 on Windows (RTX 2080 Ti)

Created: 2025-02-26 14:23:05 | Last updated: 2025-02-26 14:23:05 | Status: Public

This guide outlines the step-by-step process for training a LORA (Low-Rank Adaptation) model specifically for pixel art tilemaps using Stable Diffusion 1.5 on a Windows system with an NVIDIA RTX 2080 Ti GPU.

Environment Setup

  1. Install Python 3.10 (recommended for compatibility)
  2. Install Git from https://git-scm.com/download/win
  3. Create a new conda environment:
conda create -n sd-lora python=3.10
conda activate sd-lora

Installing Kohya SS Training GUI

git clone https://github.com/bmaltais/kohya_ss.git
cd kohya_ss
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl

Dataset Preparation

Optimal Dataset Characteristics

  1. Consistency in style
    - All images should share a similar pixel art aesthetic
    - Consistent pixel size (e.g., all 16x16 tiles or all 32x32 tiles)
    - Similar color palette across images

  2. Diversity with focus
    - Include multiple variations of the same type of tilemap
    - Include various elements you want to generate (grass, water, paths, etc.)
    - Provide variety in arrangements while maintaining the core style

  3. Technical requirements
    - Clean, cropped images without watermarks or UI elements
    - PNG format with transparency where appropriate
    - Consistent resolution (square images - 512x512 or 256x256)

  4. Annotation
    - Each image paired with a detailed text file describing its content
    - Use consistent terminology across captions
    - Include specific descriptors like “16x16 grid,” “top-down view,” “forest tileset”

Tilesheets vs. Single Tiles

Use standard game asset tilesheets where tiles are laid out in a grid, as they:
- Show how different tiles relate to each other
- Pack more information into each training image
- Match real-world usage in game development
- Contain multiple terrain types and transition tiles in one organized example

Preparing Tilesheets

  1. Standardize image size
    - All training images must be the exact same dimensions
    - Recommended: 256×256 (faster training) or 512×512 (more detail)

  2. Handle rectangular tilesheets
    - Divide rectangular tilesets into multiple square segments
    - Ensure each segment contains complete tiles (don’t cut through tiles)
    - Add padding if needed to create complete squares

  3. Scale smaller tilesheets
    - Use nearest neighbor scaling to maintain pixel-perfect edges
    - Scale all smaller tilesheets to match your standard size

from PIL import Image
import os

input_dir = "small_tilesheets"
output_dir = "processed_tilesheets"
target_size = 512  # Your standard size

if not os.path.exists(output_dir):
    os.makedirs(output_dir)

for filename in os.listdir(input_dir):
    if filename.endswith((".png", ".jpg")):
        img = Image.open(os.path.join(input_dir, filename))

        # Calculate scaling factor
        width, height = img.size
        scale = max(1, target_size // max(width, height))

        # Only upscale if needed
        if scale > 1:
            new_size = (width * scale, height * scale)
            # Use nearest neighbor for pixel art
            img = img.resize(new_size, Image.NEAREST)

        img.save(os.path.join(output_dir, filename))
        print(f"Processed {filename}")
  1. Create text captions
    - Create a text file for each image with the same filename but .txt extension
    - Add descriptive captions like “pixel art forest tilemap, 16x16 grid, top-down view”

Training Configuration

  1. Network parameters
    - Network Rank: 4-8 (lower for pixel art)
    - Network Alpha: Same as rank

  2. Training parameters
    - Batch size: 1-2
    - Resolution: 512x512 (or 256x256 for faster training)
    - Max train epochs: 10-15
    - Learning rate: 1e-4
    - Optimizer: AdamW8bit

  3. Memory optimization
    - Use 8bit Adam: Yes
    - Gradient checkpointing: Yes
    - Cache latents: Yes
    - Enable xformers memory efficient attention

  4. Sample generation
    - Sample every N epochs: 1
    - Include typical prompts you’ll use

Training Time Expectations

  • Small dataset (20-30 images): 1-3 hours
  • Medium dataset (30-50 images): 3-6 hours
  • Larger datasets: 6+ hours

Factors affecting training time:
- Dataset size
- Training resolution
- Number of epochs
- Network rank
- Batch size

Testing Your LORA

  1. After training completes, find your LORA in the output folder
  2. Install A1111 Stable Diffusion Web UI:
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui
webui-user.bat
  1. Place your LORA file in the models/Lora folder
  2. Use prompts like:
    - “pixel art forest tilemap, top-down view
    - “retro game tileset, 8-bit fantasy dungeon, 16x16 grid
    - Adjust the weight (0.7) to control the influence

Optimization Tips

  1. For faster training:
    - Use fewer epochs (8-10 may be sufficient)
    - Use a lower network rank (4-6)
    - Train at 256x256 resolution

  2. For better quality:
    - Ensure all tiles align to a consistent grid
    - Include both tilesheets and some composed maps
    - Use a mix of tile types (terrain, objects, decorations)
    - Keep color palettes consistent across training images

  3. If encountering VRAM issues:
    - Reduce batch size to 1
    - Lower resolution further
    - Enable gradient checkpointing

Additional Resources

  • Consider fine-tuning from “Pixel Art Diffusion” as your base model instead of standard SD 1.5
  • For additional help with Kohya SS, refer to their GitHub repository
  • Stable Diffusion WebUI extensions for pixel art: “Tiled Diffusion” and “Tiled VAE” can help with generating larger tilemap compositions