LocalAI - Local AI Inference API¶
Overview¶
The localai command manages the LocalAI service using Podman Quadlet containers. It provides an OpenAI-compatible API for running AI models locally with GPU acceleration.
Key Features:
- OpenAI-compatible API endpoints
- GPU-specific container images (auto-selected)
- Multiple GPU support (NVIDIA, AMD, Intel)
- Cross-pod DNS via
bazzite-ainetwork
Quick Reference¶
| Action | Command | Description |
|---|---|---|
| Config | ujust localai config | Configure LocalAI |
| Delete | ujust localai delete | Remove instance config and container |
| Logs | ujust localai logs [--lines=N] | View container logs |
| Restart | ujust localai restart | Restart server |
| Shell | ujust localai shell [-- CMD] | Open shell or execute command in container |
| Start | ujust localai start | Start LocalAI server |
| Status | ujust localai status | Show instance status |
| Stop | ujust localai stop | Stop LocalAI server |
| URL | ujust localai url | Show OpenAI-compatible API URL |
Parameters¶
| Parameter | Long Flag | Short | Default | Description |
|---|---|---|---|---|
| Port | --port | -p | 8080 | Host port for API |
| Image | --image | -i | (auto by GPU) | Container image |
| Tag | --tag | -t | latest | Image tag |
| Bind | --bind | -b | 127.0.0.1 | Bind address |
| Config Dir | --config-dir | -c | ~/.config/localai/1 | Config/models directory |
| Workspace | --workspace-dir | -w | (empty) | Workspace mount |
| GPU Type | --gpu-type | -g | auto | GPU type |
| Instance | --instance | -n | 1 | Instance number or all |
| Lines | --lines | -l | 50 | Log lines to show |
GPU-Specific Images¶
LocalAI uses different container images optimized for each GPU type:
| GPU Type | Image | Auto-Selected? |
|---|---|---|
| CPU (none) | localai/localai:latest | Yes |
| NVIDIA | localai/localai:latest-gpu-nvidia-cuda-12 | Yes |
| AMD | localai/localai:latest-gpu-hipblas | Yes |
| Intel | localai/localai:latest-gpu-intel | Yes |
The appropriate image is automatically selected based on detected GPU hardware.
Configuration¶
# Default configuration (auto-detects GPU, port 8080)
ujust localai config
# Custom port (long form)
ujust localai config --port=8081
# Custom port (short form)
ujust localai config -p 8081
# Network-wide access
ujust localai config --bind=0.0.0.0
# Force CPU image (ignore GPU)
ujust localai config --image=localai/localai:latest
# Combine parameters (long form)
ujust localai config --port=8081 --bind=0.0.0.0
# Combine parameters (short form)
ujust localai config -p 8081 -b 0.0.0.0
Update Existing Configuration¶
Running config when already configured updates the existing settings:
# Change only the bind address
ujust localai config --bind=0.0.0.0
# Update port without affecting other settings
ujust localai config --port=8082
Lifecycle Management¶
# Start LocalAI
ujust localai start
# Stop service
ujust localai stop
# Restart (apply config changes)
ujust localai restart
# View logs (default 50 lines)
ujust localai logs
# View more logs (long form)
ujust localai logs --lines=200
# View more logs (short form)
ujust localai logs -l 200
# Check status
ujust localai status
# Show API URL
ujust localai url
Multi-Instance Support¶
# Start all instances (long form)
ujust localai start --instance=all
# Start all instances (short form)
ujust localai start -n all
# Stop specific instance
ujust localai stop --instance=2
# Delete all instances
ujust localai delete --instance=all
Shell Access¶
# Interactive shell
ujust localai shell
# Run specific command (use -- separator)
ujust localai shell -- ls -la /models
ujust localai shell -- nvidia-smi
Network Architecture¶
LocalAI uses the bazzite-ai bridge network for cross-container DNS:
+-------------------+ DNS +-------------------+
| Open WebUI | -----------> | LocalAI |
| (openwebui) | | (localai) |
| Port 3000 | | Port 8080 |
+-------------------+ +-------------------+
| |
+------ bazzite-ai network --------+
|
+-------------------+ | +-------------------+
| Ollama |----+----+ Jupyter |
| (ollama) | | (jupyter) |
| Port 11434 | | Port 8888 |
+-------------------+ +-------------------+
Cross-Pod DNS:
- LocalAI accessible as
http://localai:8080from other containers - Can replace Ollama as backend for OpenWebUI
API Endpoints (OpenAI-Compatible)¶
| Endpoint | Description |
|---|---|
/v1/models | List available models |
/v1/chat/completions | Chat completions |
/v1/completions | Text completions |
/v1/embeddings | Generate embeddings |
/v1/images/generations | Image generation |
/v1/audio/transcriptions | Speech-to-text |
Example API Usage¶
# List models
curl http://localhost:8080/v1/models
# Chat completion
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Model Storage¶
| Path | Description |
|---|---|
~/.config/localai/<INSTANCE>/models | Model files |
Models persist across container restarts. Each instance has isolated storage.
Loading Models¶
Place model files (GGUF, GGML) in the models directory:
# Copy a model
cp my-model.gguf ~/.config/localai/1/models/
# Or download directly
curl -L -o ~/.config/localai/1/models/model.gguf \
https://huggingface.co/.../model.gguf
Common Workflows¶
Initial Setup¶
# 1. Configure LocalAI (auto-detects GPU)
ujust localai config
# 2. Start the service
ujust localai start
# 3. Check the API
ujust localai url
# Output: http://127.0.0.1:8080
# 4. Test the API
curl http://localhost:8080/v1/models
Use with OpenWebUI¶
OpenWebUI can use LocalAI as an OpenAI-compatible backend:
# Start LocalAI
ujust localai start
# In OpenWebUI settings, add connection:
# URL: http://localai:8080/v1 (cross-pod DNS)
# Or: http://host.containers.internal:8080/v1 (from host)
Remote Access Setup¶
# Configure for network access
ujust localai config --bind=0.0.0.0
# Start the service
ujust localai start
# Or use Tailscale for secure access
ujust tailscale serve --service=localai
GPU Support¶
GPU is automatically detected and the appropriate image is selected:
| GPU Type | Detection | Device Passthrough |
|---|---|---|
| NVIDIA | nvidia-smi | CDI (nvidia.com/gpu=all) |
| AMD | lspci | /dev/dri + /dev/kfd |
| Intel | lspci | /dev/dri |
Check GPU in Container¶
# NVIDIA
ujust localai shell -- nvidia-smi
# Check GPU environment
ujust localai shell -- env | grep -i gpu
Troubleshooting¶
Service Won't Start¶
# Check status
ujust localai status
# View logs
ujust localai logs --lines=100
# Check image was pulled
podman images | grep localai
Common causes:
- Port 8080 already in use
- Container image not pulled
- GPU driver issues
GPU Not Detected¶
NVIDIA:
# Check CDI configuration
nvidia-ctk cdi list
# Regenerate CDI spec
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
AMD:
API Errors¶
# Test API endpoint
curl http://localhost:8080/v1/models
# Check logs for errors
ujust localai logs --lines=100
Clear Data and Start Fresh¶
# Delete everything
ujust localai delete --instance=all
# Reconfigure
ujust localai config
ujust localai start
Cross-References¶
- Network peers: ollama, openwebui, jupyter, comfyui (all use bazzite-ai network)
- Alternative:
ollama(simpler model management, different API) - Client:
openwebui(can use LocalAI as backend) - Docs: LocalAI Documentation
When to Use This Skill
Use when the user asks about:
- "install localai", "setup local inference", "openai-compatible api"
- "configure localai", "change port", "gpu acceleration"
- "localai not working", "api error", "model loading"
- "localai logs", "debug localai"
- "delete localai", "uninstall"