Hardware & GPU Guide

Complete guide to running Ada on different hardware configurations.

Supported Hardware Acceleration

Ollama (Ada’s LLM backend) supports multiple hardware acceleration modes:

Backend	Hardware	Status	Notes
CUDA	NVIDIA GPUs	✅ Native	Best supported, widest model compatibility
ROCm	AMD GPUs	✅ Native	Excellent performance, requires ROCm 5.7+
Metal	Apple Silicon	✅ Native	M1/M2/M3/M4 Macs, very efficient
Vulkan	Cross-platform	⚠️ Experimental	Opt-in, may be unstable
CPU	Any CPU	✅ Fallback	Works everywhere, but slow

Quick Start by Hardware

NVIDIA GPUs (CUDA)

Prerequisites:

NVIDIA GPU with compute capability 6.0+ (Pascal or newer)
NVIDIA drivers installed
NVIDIA Container Toolkit (for Docker)

Setup:

# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Using CUDA with Ada:

# Use the cuda profile (already configured in compose.profiles.yaml)
docker compose --profile cuda up -d

That’s it! Ada ships with CUDA support pre-configured. No manual compose.yaml editing needed.

Environment variables:

# .env
OLLAMA_GPU_DRIVER=cuda
CUDA_VISIBLE_DEVICES=0  # Limit to specific GPU(s)

AMD GPUs (ROCm)

Prerequisites:

AMD GPU with ROCm support (RDNA2+: RX 6000/7000 series, or MI series)
ROCm 5.7 or newer installed on host
/dev/kfd and /dev/dri devices accessible

Supported AMD GPUs:

Consumer: RX 6000 series, RX 7000 series (RDNA2/3)
Professional: Radeon Pro, MI100/MI200 series
Check ROCm compatibility list

Setup:

# Install ROCm (Ubuntu/Debian)
sudo apt-get update
wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/focal/amdgpu-install_5.7.50700-1_all.deb
sudo apt-get install ./amdgpu-install_5.7.50700-1_all.deb
sudo amdgpu-install --usecase=rocm

# Verify installation
rocminfo
rocm-smi

Using ROCm with Ada:

# Use the rocm profile (already configured in compose.profiles.yaml)
docker compose --profile rocm up -d

# Optional environment variables (in .env):
# HIP_VISIBLE_DEVICES=0              # Limit to specific GPU
# HSA_OVERRIDE_GFX_VERSION=10.3.0   # ISA override for older GPUs

That’s it! Ada ships with ROCm support pre-configured.

Troubleshooting:

If models fail to load: Check HSA_OVERRIDE_GFX_VERSION (find your GPU’s gfx version with rocminfo | grep gfx)
If Docker can’t see GPU: Ensure user is in video and render groups
Performance issues: Update to latest ROCm version

Apple Silicon (Metal)

Prerequisites:

Mac with M1, M2, M3, or M4 chip
macOS 12.3 or later

Setup:

# Install Ollama natively (recommended, better than Docker)
curl -fsSL https://ollama.com/install.sh | sh

# Or use Docker (slower due to virtualization)
docker pull ollama/ollama
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Notes:

Native Metal acceleration is automatic
Unified memory makes smaller Macs (8GB) usable for 7B models
16GB+ recommended for larger models
Docker performance is reduced due to virtualization overhead

Vulkan (Experimental)

Status: Opt-in experimental feature (as of Ollama v0.13+)

Supported hardware:

Any GPU with Vulkan 1.2+ support
Useful for Intel GPUs, older AMD GPUs, or non-standard setups

Setup:

# Set environment variable to enable Vulkan
OLLAMA_VULKAN=1 ollama serve

Limitations:

Not all models may work
Performance may be lower than CUDA/ROCm/Metal
Experimental - expect bugs

CPU-Only Mode

Use cases:

No GPU available
Testing/development
Low-power environments

Performance expectations:

7B models: 1-5 tokens/second (depending on CPU)
13B+ models: Very slow, may need quantized models (Q4)
Memory: RAM = model size + 2-4GB overhead

Setup:

# Ollama automatically falls back to CPU if no GPU detected
# No special configuration needed

# Optional: Limit CPU threads
OLLAMA_NUM_THREADS=8 ollama serve

Optimization tips:

Use smaller models (1B-7B)
Use heavily quantized models (Q4_0, Q4_K_M)
Close other applications to free RAM
Consider using swap if RAM constrained (will be very slow)

Hardware Recommendations

Budget Build (~$500)

GPU: Used RX 6600/6700 XT (8-12GB VRAM, ~$200-300)
CPU: AMD Ryzen 5 5600 or Intel i5-12400
RAM: 16GB DDR4
Storage: 500GB SSD
Power: 550W PSU
Use case: 7B-13B models, good for experimentation

Mid-Range Build (~$1200)

GPU: RTX 4060 Ti 16GB or RX 7800 XT (16GB VRAM)
CPU: AMD Ryzen 7 5700X3D or Intel i7-13700
RAM: 32GB DDR4/DDR5
Storage: 1TB NVMe SSD
Power: 750W PSU
Use case: 13B-30B models, solid daily driver

High-End Build (~$3000)

GPU: RTX 4090 (24GB VRAM) or RX 7900 XTX (24GB)
CPU: AMD Ryzen 9 7950X or Intel i9-13900K
RAM: 64GB DDR5
Storage: 2TB NVMe SSD (PCIe 4.0+)
Power: 1000W PSU
Use case: 30B-70B models, professional use

Ultra Budget (<$100 + existing hardware)

Option 1: Raspberry Pi 5 (8GB) + AI HAT (~$100)
- Very slow, 1B-3B models only
- Great for learning/tinkering
Option 2: Used office PC + used GPU
- Dell OptiPlex (~$50) + RX 580 (~$40)
- 7B models workable
Option 3: Orange Pi 5 Plus (~$80)
- NPU support, 3B models

For single-board computers: See Ada on Single-Board Computers (SBCs) for detailed recommendations on Raspberry Pi, Orange Pi, Rock Pi, and other ARM boards.

Model Size vs VRAM Requirements

Model Size	Quantization	VRAM Needed	Recommended GPU
1B	Q4	~1GB	Any GPU, even integrated
3B	Q4	~2GB	GTX 1050, RX 560
7B	Q4	~4GB	GTX 1660, RX 5600
7B	Q8/FP16	~8GB	RTX 3060, RX 6600
13B	Q4	~8GB	RTX 3060, RX 6600
13B	Q8/FP16	~16GB	RTX 4060 Ti 16GB, RX 7800 XT
30B	Q4	~16GB	RTX 4060 Ti 16GB, RX 7800 XT
70B	Q4	~40GB	Multiple GPUs or A100/H100

Note: Add 2-4GB overhead for system/context

Performance Tuning

Environment Variables

# General
OLLAMA_NUM_PARALLEL=1           # Concurrent requests (increase for multi-user)
OLLAMA_MAX_LOADED_MODELS=1      # Keep models in VRAM
OLLAMA_KEEP_ALIVE=24h           # How long to keep model loaded
OLLAMA_CONTEXT_LENGTH=4096      # Context window size

# CUDA-specific
CUDA_VISIBLE_DEVICES=0          # Which GPU(s) to use
OLLAMA_DEBUG=INFO               # Verbose logging

# ROCm-specific
HIP_VISIBLE_DEVICES=0           # Which GPU(s) to use
HSA_OVERRIDE_GFX_VERSION=10.3.0 # ISA override if needed

# CPU-specific
OLLAMA_NUM_THREADS=8            # CPU threads to use

Model Selection

For speed:

Use Q4_K_M quantization (good quality, fast)
Smaller context windows (2048 vs 8192)
Smaller models (7B vs 13B)

For quality:

Use Q8 or FP16 quantization
Larger context windows
Larger models with VRAM to spare

Power Consumption Benchmarks

Configuration	Idle	Light Load (7B)	Heavy Load (30B)
RTX 4090	~30W	~150W	~400W
RX 7900 XTX	~25W	~180W	~320W
RTX 3060	~15W	~100W	~170W
RX 6600	~10W	~80W	~130W
M2 Max	~5W	~25W	~40W
CPU (Ryzen 7)	~30W	~90W	~150W

Monthly cost (at $0.12/kWh, 24/7 idle + 4h heavy use daily):

High-end GPU: ~$25-35/month
Mid-range GPU: ~$15-20/month
Apple Silicon: ~$5-8/month
CPU-only: ~$10-15/month

Hackable Hardware Projects

For detailed single-board computer builds: See Ada on Single-Board Computers (SBCs) - comprehensive guide to running Ada on Raspberry Pi, Orange Pi, Rock Pi, and other ARM boards.

Raspberry Pi 5 + AI HAT

Cost: ~$100 total
Models: 1B-3B quantized
Pros: Ultra low power, portable, great for learning
Cons: Very limited, slow inference
Details: See Ada on Single-Board Computers (SBCs)

Used Gaming Laptop

Cost: $300-600
Models: 7B-13B (depending on GPU)
Pros: Portable, all-in-one
Cons: Harder to upgrade, thermal limits
Look for: RTX 3060+ or RX 6600M+

Mini PC + eGPU

Cost: ~$400-700
Models: Depends on eGPU
Pros: Compact, upgradeable GPU
Cons: Thunderbolt bottleneck, complexity
Example: NUC + Razer Core X + used GPU

DIY NAS + GPU

Cost: ~$500-1000
Models: 7B-30B
Pros: Doubles as storage, always-on
Cons: Power consumption, noise
Build: Cheap server board + used workstation GPU

Cloud Alternatives

When privacy isn’t critical:

Provider	Cost	GPU Options	Notes
Vast.ai	~$0.10-0.50/hr	RTX 3090, 4090, A100	Spot instances, cheapest
RunPod	~$0.30-1.00/hr	RTX 4090, A100	More reliable
Lambda Labs	~$1.10/hr	A100, H100	Professional tier

Setup Ollama on cloud:

# SSH into instance
curl -fsSL https://ollama.com/install.sh | sh
ollama serve &
ollama pull your-model

# Expose to local machine
ssh -L 11434:localhost:11434 user@cloud-ip

Troubleshooting

SELinux Volume Permissions (Fedora/RHEL)

On systems with SELinux enforcing (Fedora, RHEL, CentOS), you may see permission denied errors when containers try to access mounted volumes.

Symptom: Containers fail to start or can’t write to ./data/ directories.

Solution: Add :z suffix to volume mounts in compose.yaml:

volumes:
  - ./data/chroma:/data:z
  - ./data/brain:/data:z

The :z flag tells Docker to relabel the volume content with a shared label that all containers can access.

Warning

Only use :z if you’re on SELinux-enabled systems. It’s harmless but unnecessary on other systems.

GPU not detected

# NVIDIA
nvidia-smi  # Should show GPU
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

# AMD
rocm-smi    # Should show GPU
docker run --rm --device=/dev/kfd --device=/dev/dri rocm/pytorch:latest rocminfo

# Check Ollama logs
docker logs ada-v1-ollama-1

Out of Memory (OOM)

Use smaller model (13B → 7B)
Use more aggressive quantization (Q8 → Q4)
Reduce context length: OLLAMA_CONTEXT_LENGTH=2048
Close other GPU applications

Slow inference

Check GPU utilization: nvidia-smi or rocm-smi
Ensure model fully fits in VRAM
Check thermal throttling
Try different quantization

Model compatibility

Some models don’t work with all backends
ROCm may need specific model formats
Check model card on ollama.com/library

FAQ

Q: Can I mix NVIDIA and AMD GPUs?

A: No, Ollama uses one backend at a time. Choose the better GPU.

Q: Will integrated graphics work?

A: Technically yes (via Vulkan), but performance will be poor. Only for small models.

Q: What about Intel Arc GPUs?

A: Limited support via Vulkan. Experimental, your mileage may vary.

Q: Can I run multiple models on one GPU?

A: Yes, set OLLAMA_NUM_PARALLEL=2+ but total VRAM must fit all models.

Q: Do I need a GPU for embeddings?

A: No, CPU embeddings are fast enough for most use cases.

Q: What’s the minimum for a good experience?

A: 8GB VRAM GPU + 16GB RAM for 7B models, 16GB VRAM + 32GB RAM for 13B+

Getting Help

Need help? Open an issue on GitHub with:

Your hardware (GPU, RAM, OS)
Ollama version: docker exec ada-v1-ollama-1 ollama --version
Logs: docker logs ada-v1-ollama-1 --tail 100