LocalAI - Local AI Inference API¶

Overview¶

The localai command manages the LocalAI service using Podman Quadlet containers. It provides an OpenAI-compatible API for running AI models locally with GPU acceleration.

Key Features:

OpenAI-compatible API endpoints
GPU-specific container images (auto-selected)
Multiple GPU support (NVIDIA, AMD, Intel)
Cross-pod DNS via bazzite-ai network

Quick Reference¶

Action	Command	Description
Config	`ujust localai config`	Configure LocalAI
Delete	`ujust localai delete`	Remove instance config and container
Logs	`ujust localai logs [--lines=N]`	View container logs
Restart	`ujust localai restart`	Restart server
Shell	`ujust localai shell [-- CMD]`	Open shell or execute command in container
Start	`ujust localai start`	Start LocalAI server
Status	`ujust localai status`	Show instance status
Stop	`ujust localai stop`	Stop LocalAI server
URL	`ujust localai url`	Show OpenAI-compatible API URL

Parameters¶

Parameter	Long Flag	Short	Default	Description
Port	`--port`	`-p`	`8080`	Host port for API
Image	`--image`	`-i`	(auto by GPU)	Container image
Tag	`--tag`	`-t`	`latest`	Image tag
Bind	`--bind`	`-b`	`127.0.0.1`	Bind address
Config Dir	`--config-dir`	`-c`	`~/.config/localai/1`	Config/models directory
Workspace	`--workspace-dir`	`-w`	(empty)	Workspace mount
GPU Type	`--gpu-type`	`-g`	`auto`	GPU type
Instance	`--instance`	`-n`	`1`	Instance number or `all`
Lines	`--lines`	`-l`	`50`	Log lines to show

GPU-Specific Images¶

LocalAI uses different container images optimized for each GPU type:

GPU Type	Image	Auto-Selected?
CPU (none)	`localai/localai:latest`	Yes
NVIDIA	`localai/localai:latest-gpu-nvidia-cuda-12`	Yes
AMD	`localai/localai:latest-gpu-hipblas`	Yes
Intel	`localai/localai:latest-gpu-intel`	Yes

The appropriate image is automatically selected based on detected GPU hardware.

Configuration¶

# Default configuration (auto-detects GPU, port 8080)
ujust localai config

# Custom port (long form)
ujust localai config --port=8081

# Custom port (short form)
ujust localai config -p 8081

# Network-wide access
ujust localai config --bind=0.0.0.0

# Force CPU image (ignore GPU)
ujust localai config --image=localai/localai:latest

# Combine parameters (long form)
ujust localai config --port=8081 --bind=0.0.0.0

# Combine parameters (short form)
ujust localai config -p 8081 -b 0.0.0.0

Update Existing Configuration¶

Running config when already configured updates the existing settings:

# Change only the bind address
ujust localai config --bind=0.0.0.0

# Update port without affecting other settings
ujust localai config --port=8082

Lifecycle Management¶

# Start LocalAI
ujust localai start

# Stop service
ujust localai stop

# Restart (apply config changes)
ujust localai restart

# View logs (default 50 lines)
ujust localai logs

# View more logs (long form)
ujust localai logs --lines=200

# View more logs (short form)
ujust localai logs -l 200

# Check status
ujust localai status

# Show API URL
ujust localai url

Multi-Instance Support¶

# Start all instances (long form)
ujust localai start --instance=all

# Start all instances (short form)
ujust localai start -n all

# Stop specific instance
ujust localai stop --instance=2

# Delete all instances
ujust localai delete --instance=all

Shell Access¶

# Interactive shell
ujust localai shell

# Run specific command (use -- separator)
ujust localai shell -- ls -la /models
ujust localai shell -- nvidia-smi

Network Architecture¶

LocalAI uses the bazzite-ai bridge network for cross-container DNS:

+-------------------+     DNS      +-------------------+
|   Open WebUI      | -----------> |     LocalAI       |
|   (openwebui)     |              |    (localai)      |
|   Port 3000       |              |   Port 8080       |
+-------------------+              +-------------------+
         |                                  |
         +------ bazzite-ai network --------+
                         |
+-------------------+    |    +-------------------+
|     Ollama        |----+----+     Jupyter       |
|    (ollama)       |         |    (jupyter)      |
|   Port 11434      |         |   Port 8888       |
+-------------------+         +-------------------+

Cross-Pod DNS:

LocalAI accessible as http://localai:8080 from other containers
Can replace Ollama as backend for OpenWebUI

API Endpoints (OpenAI-Compatible)¶

Endpoint	Description
`/v1/models`	List available models
`/v1/chat/completions`	Chat completions
`/v1/completions`	Text completions
`/v1/embeddings`	Generate embeddings
`/v1/images/generations`	Image generation
`/v1/audio/transcriptions`	Speech-to-text

Example API Usage¶

# List models
curl http://localhost:8080/v1/models

# Chat completion
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Model Storage¶

Path	Description
`~/.config/localai/<INSTANCE>/models`	Model files

Models persist across container restarts. Each instance has isolated storage.

Loading Models¶

Place model files (GGUF, GGML) in the models directory:

# Copy a model
cp my-model.gguf ~/.config/localai/1/models/

# Or download directly
curl -L -o ~/.config/localai/1/models/model.gguf \
  https://huggingface.co/.../model.gguf

Common Workflows¶

Initial Setup¶

# 1. Configure LocalAI (auto-detects GPU)
ujust localai config

# 2. Start the service
ujust localai start

# 3. Check the API
ujust localai url
# Output: http://127.0.0.1:8080

# 4. Test the API
curl http://localhost:8080/v1/models

Use with OpenWebUI¶

OpenWebUI can use LocalAI as an OpenAI-compatible backend:

# Start LocalAI
ujust localai start

# In OpenWebUI settings, add connection:
# URL: http://localai:8080/v1  (cross-pod DNS)
# Or: http://host.containers.internal:8080/v1  (from host)

Remote Access Setup¶

# Configure for network access
ujust localai config --bind=0.0.0.0

# Start the service
ujust localai start

# Or use Tailscale for secure access
ujust tailscale serve --service=localai

GPU Support¶

GPU is automatically detected and the appropriate image is selected:

GPU Type	Detection	Device Passthrough
NVIDIA	`nvidia-smi`	CDI (`nvidia.com/gpu=all`)
AMD	lspci	`/dev/dri` + `/dev/kfd`
Intel	lspci	`/dev/dri`

Check GPU in Container¶

# NVIDIA
ujust localai shell -- nvidia-smi

# Check GPU environment
ujust localai shell -- env | grep -i gpu

Troubleshooting¶

Service Won't Start¶

# Check status
ujust localai status

# View logs
ujust localai logs --lines=100

# Check image was pulled
podman images | grep localai

Common causes:

Port 8080 already in use
Container image not pulled
GPU driver issues

GPU Not Detected¶

NVIDIA:

# Check CDI configuration
nvidia-ctk cdi list

# Regenerate CDI spec
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

AMD:

# Check /dev/kfd exists
ls -la /dev/kfd

# Check ROCm
rocminfo

API Errors¶

# Test API endpoint
curl http://localhost:8080/v1/models

# Check logs for errors
ujust localai logs --lines=100

Clear Data and Start Fresh¶

# Delete everything
ujust localai delete --instance=all

# Reconfigure
ujust localai config
ujust localai start

Cross-References¶

Network peers: ollama, openwebui, jupyter, comfyui (all use bazzite-ai network)
Alternative: ollama (simpler model management, different API)
Client: openwebui (can use LocalAI as backend)
Docs: LocalAI Documentation

When to Use This Skill

Use when the user asks about:

"install localai", "setup local inference", "openai-compatible api"
"configure localai", "change port", "gpu acceleration"
"localai not working", "api error", "model loading"
"localai logs", "debug localai"
"delete localai", "uninstall"