Running Large Language Models on Android

Created: 2025-01-27 20:36:20 | Last updated: 2025-01-27 20:36:20 | Status: Public

Running Large Language Models on Android

This guide covers two methods for running LLMs on Android devices: llama.cpp and llamafile. Both methods use Termux, a terminal emulator for Android.

Prerequisites

  • Install Termux from F-Droid store (not Google Play Store, as that version isn’t maintained)
  • A Samsung or other Android device with sufficient RAM (8GB+ recommended)
  • Available storage space (varies by model size)

Method 1: llama.cpp

This method involves building llama.cpp from source.

Installation

# Install required packages
pkg update && pkg upgrade
pkg install git make clang python cmake

# Clone llama.cpp repository
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

# Build llama.cpp
make LLAMA_OPENBLAS=1

# Create a models directory
mkdir models
cd models

# Download a model (example with tiny-llama)
wget https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0/resolve/main/model.gguf

# Return to main directory
cd ..

Running the Model

./main -m ./models/model.gguf \
    --color \
    --ctx_size 2048 \
    -n 256 \
    -ins -p "### Human: Hello\n### Assistant:"

Configuration Options

  • --ctx_size: Adjust based on your device’s capabilities
  • -n: Number of tokens to generate
  • -ins: Enable interactive mode
  • You can adjust temperature and other parameters as needed

Llamafile is generally easier to use as it packages everything into a single executable file.

Installation

# Install required packages
pkg install curl coreutils

# Create a directory
mkdir ~/llamafile
cd ~/llamafile

# Download a llamafile (example with Mistral 7B)
curl -L https://huggingface.co/jartine/mistral-7b.llamafile/resolve/main/mistral-7b-v0.1-Q4_K_M.llamafile -o mistral.llamafile

# Make it executable
chmod +x mistral.llamafile

Running the Model

Terminal mode:

./mistral.llamafile --server false

Web server mode (accessible at http://localhost:8080):

./mistral.llamafile

Advantages of llamafile

  1. No compilation needed
  2. Includes model, weights, and dependencies in one file
  3. Can run in terminal or web server mode
  4. Better optimization for mobile processors
  5. Web interface is mobile-friendly

Tips and Recommendations

  1. Model Selection
    - Smaller models (7B parameters or less) work better on mobile devices
    - Quantized models (Q4, Q5) use less RAM
    - Consider TinyLlama or Mistral 7B for a good balance of performance and resource usage

  2. Performance Optimization
    - Close other apps before running the model
    - Monitor your device’s temperature
    - Adjust context size and other parameters based on your device’s capabilities

  3. Troubleshooting
    - If you get memory errors, try a smaller model or reduce the context size
    - Web server mode may be more stable than terminal mode
    - Keep your device plugged in during long sessions

Additional Resources

  • Llamafile repository: https://github.com/Mozilla-Ocho/llamafile
  • llama.cpp repository: https://github.com/ggerganov/llama.cpp
  • Termux documentation: https://termux.dev/