Training LORA for Pixel Art Tilemaps with SD 1.5 on Windows (RTX 2080 Ti)

Created: 2025-02-26 14:23:05 | Last updated: 2025-02-26 14:23:05 | Status: Public

This guide outlines the step-by-step process for training a LORA (Low-Rank Adaptation) model specifically for pixel art tilemaps using Stable Diffusion 1.5 on a Windows system with an NVIDIA RTX 2080 Ti GPU.

Environment Setup

Install Python 3.10 (recommended for compatibility)
Install Git from https://git-scm.com/download/win
Create a new conda environment:

conda create -n sd-lora python=3.10
conda activate sd-lora

Installing Kohya SS Training GUI

git clone https://github.com/bmaltais/kohya_ss.git
cd kohya_ss
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl

Dataset Preparation

Optimal Dataset Characteristics

Consistency in style
- All images should share a similar pixel art aesthetic
- Consistent pixel size (e.g., all 16x16 tiles or all 32x32 tiles)
- Similar color palette across images
Diversity with focus
- Include multiple variations of the same type of tilemap
- Include various elements you want to generate (grass, water, paths, etc.)
- Provide variety in arrangements while maintaining the core style
Technical requirements
- Clean, cropped images without watermarks or UI elements
- PNG format with transparency where appropriate
- Consistent resolution (square images - 512x512 or 256x256)
Annotation
- Each image paired with a detailed text file describing its content
- Use consistent terminology across captions
- Include specific descriptors like “16x16 grid,” “top-down view,” “forest tileset”

Tilesheets vs. Single Tiles

Use standard game asset tilesheets where tiles are laid out in a grid, as they:
- Show how different tiles relate to each other
- Pack more information into each training image
- Match real-world usage in game development
- Contain multiple terrain types and transition tiles in one organized example

Preparing Tilesheets

Standardize image size
- All training images must be the exact same dimensions
- Recommended: 256×256 (faster training) or 512×512 (more detail)
Handle rectangular tilesheets
- Divide rectangular tilesets into multiple square segments
- Ensure each segment contains complete tiles (don’t cut through tiles)
- Add padding if needed to create complete squares
Scale smaller tilesheets
- Use nearest neighbor scaling to maintain pixel-perfect edges
- Scale all smaller tilesheets to match your standard size

from PIL import Image
import os

input_dir = "small_tilesheets"
output_dir = "processed_tilesheets"
target_size = 512  # Your standard size

if not os.path.exists(output_dir):
    os.makedirs(output_dir)

for filename in os.listdir(input_dir):
    if filename.endswith((".png", ".jpg")):
        img = Image.open(os.path.join(input_dir, filename))

        # Calculate scaling factor
        width, height = img.size
        scale = max(1, target_size // max(width, height))

        # Only upscale if needed
        if scale > 1:
            new_size = (width * scale, height * scale)
            # Use nearest neighbor for pixel art
            img = img.resize(new_size, Image.NEAREST)

        img.save(os.path.join(output_dir, filename))
        print(f"Processed {filename}")

Create text captions
- Create a text file for each image with the same filename but .txt extension
- Add descriptive captions like “pixel art forest tilemap, 16x16 grid, top-down view”

Training Configuration

Recommended Settings for RTX 2080 Ti

Network parameters
- Network Rank: 4-8 (lower for pixel art)
- Network Alpha: Same as rank
Training parameters
- Batch size: 1-2
- Resolution: 512x512 (or 256x256 for faster training)
- Max train epochs: 10-15
- Learning rate: 1e-4
- Optimizer: AdamW8bit
Memory optimization
- Use 8bit Adam: Yes
- Gradient checkpointing: Yes
- Cache latents: Yes
- Enable xformers memory efficient attention
Sample generation
- Sample every N epochs: 1
- Include typical prompts you’ll use

Training Time Expectations

Small dataset (20-30 images): 1-3 hours
Medium dataset (30-50 images): 3-6 hours
Larger datasets: 6+ hours

Factors affecting training time:
- Dataset size
- Training resolution
- Number of epochs
- Network rank
- Batch size

Testing Your LORA

After training completes, find your LORA in the output folder
Install A1111 Stable Diffusion Web UI:

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui
webui-user.bat

Place your LORA file in the models/Lora folder
Use prompts like:
- “pixel art forest tilemap, top-down view “
- “retro game tileset, 8-bit fantasy dungeon, 16x16 grid “
- Adjust the weight (0.7) to control the influence

Optimization Tips

For faster training:
- Use fewer epochs (8-10 may be sufficient)
- Use a lower network rank (4-6)
- Train at 256x256 resolution
For better quality:
- Ensure all tiles align to a consistent grid
- Include both tilesheets and some composed maps
- Use a mix of tile types (terrain, objects, decorations)
- Keep color palettes consistent across training images
If encountering VRAM issues:
- Reduce batch size to 1
- Lower resolution further
- Enable gradient checkpointing

Additional Resources

Consider fine-tuning from “Pixel Art Diffusion” as your base model instead of standard SD 1.5
For additional help with Kohya SS, refer to their GitHub repository
Stable Diffusion WebUI extensions for pixel art: “Tiled Diffusion” and “Tiled VAE” can help with generating larger tilemap compositions