Running Ada Locally (No Docker Required!)
Ada now supports running entirely locally without Docker! This is the recommended deployment mode for most users.
Quick Start
Prerequisites
Python 3.13+
python3 --version # Should show 3.13 or higher
Ollama (for LLM inference)
# Install from https://ollama.ai curl -fsSL https://ollama.com/install.sh | sh # Start Ollama ollama serve # Pull a model ollama pull qwen2.5-coder:7b
Installation
# Clone the repository
git clone https://github.com/luna-system/ada.git
cd ada
# Run setup wizard
python3 ada_cli.py setup
# Or manual setup:
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -e .
cp .env.example .env
Running Ada
# Activate virtual environment
source .venv/bin/activate
# Start Ada (auto-detects local Ollama)
ada run
# Or explicitly use local mode
ada run --local
# Development mode (hot reload)
ada run --local --dev
Ada will:
✅ Detect your local Ollama installation
✅ Use embedded ChromaDB (no separate database process)
✅ Start the brain API on http://localhost:7000
✅ Store all data in
./data/directory
Quick Commands
# Check system health
ada doctor
# View running services
ada status
# Chat directly from terminal
ada chat "What's the weather today?"
# Stop Ada
ada stop
Architecture: Local Mode
Terminal Local Ollama (port 11434)
│ │
├─ ada run ────────────────┤
│ │
└─ Brain API ───────────────┘
(port 7000)
│
└─ Embedded ChromaDB
(./data/chroma/)
No Docker containers! Everything runs as local processes.
Comparison: Local vs Docker
Feature |
Local Mode |
Docker Mode |
|---|---|---|
Setup Time |
< 5 minutes |
10-15 minutes |
Memory Usage |
~2GB (Ollama + Brain) |
~4GB (all containers) |
Startup Time |
< 5 seconds |
~30 seconds |
Model Storage |
Single copy |
Separate container volume |
GPU Support |
Native (CUDA/ROCm/Metal) |
Requires GPU passthrough |
Development |
Hot reload built-in |
Requires rebuild |
Portability |
Requires Python/Ollama |
Requires Docker |
Data Backup |
Simple |
Container volumes |
Recommendation: Use local mode unless you need multi-service orchestration.
Configuration
.env for Local Mode
# Minimal local configuration
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=qwen2.5-coder:7b
CHROMA_MODE=embedded
DATA_DIR=./data
Advanced Options
# Use different port
ada run --local --port 8080
# Bind to specific interface
ada run --local --host 127.0.0.1
# Development mode (auto-reload)
ada run --local --dev
Features
All Ada features work in local mode:
✅ RAG (Retrieval-Augmented Generation) - Embedded ChromaDB
✅ Conversation Memory - Stored in
./data/chroma/✅ Streaming Responses - Full SSE support
✅ Specialists - All specialists work (OCR, web search, wiki, etc.)
✅ Matrix Bridge - Can connect to local brain
✅ CLI Adapter - Works out of the box
✅ Web UI - Use nginx proxy or run Astro dev server
What requires Docker:
❌ Web UI (nginx) - Optional, can use Astro dev server instead
❌ Matrix Bridge - Optional, requires containerized setup
❌ Automated backups - Optional maintenance scripts
Accessing the API
From Browser
http://localhost:7000/v1/healthz
Using ada-cli
# Interactive mode
ada-cli
# One-shot query
ada-cli "Tell me about Python"
# JSON output
ada-cli --json "List 3 facts about Mars"
Using curl
curl -X POST http://localhost:7000/v1/chat/stream \
-H "Content-Type: application/json" \
-d '{"prompt":"Hello Ada!"}' \
--no-buffer
Data Management
Data Directory Structure
./data/
├── chroma/ # Embedded vector database
│ ├── personas/ # Persona documents
│ ├── faqs/ # FAQ collection
│ ├── memories/ # Long-term memories
│ └── turns/ # Conversation history
├── brain/ # Brain service data
│ ├── notices.json # System notifications
│ └── uploads/ # Uploaded files
└── backups/ # Optional manual backups
Backing Up
# Simple backup
tar -czf ada-backup-$(date +%Y%m%d).tar.gz data/
# Restore
tar -xzf ada-backup-20250116.tar.gz
Migrating from Docker
# 1. Stop Docker services
docker compose down
# 2. Copy data from Docker volumes
docker run --rm -v ada-v1_chroma-data:/data -v $(pwd)/data:/backup \
alpine tar -czf /backup/chroma.tar.gz -C /data .
# 3. Extract to local data directory
mkdir -p data/chroma
tar -xzf data/chroma.tar.gz -C data/chroma/
# 4. Update .env for local mode
cp .env.example .env
# Edit .env - use local mode configuration
# 5. Start local mode
ada run --local
Troubleshooting
“Ollama not running”
Solution:
# Check if Ollama is running
curl http://localhost:11434/api/tags
# If not, start it
ollama serve
# Check if model is available
ollama list
ollama pull qwen2.5-coder:7b # If model missing
“ModuleNotFoundError: No module named ‘brain’”
Solution:
# Ensure you're in virtual environment
source .venv/bin/activate
# Reinstall in editable mode
pip install -e .
# Verify installation
ada --version
“Port 7000 already in use”
Solution:
# Find process using port
lsof -i :7000
# Kill it
kill -9 <PID>
# Or use different port
ada run --local --port 8000
“Permission denied: ./data/chroma”
Solution:
# Fix permissions
chmod -R 755 data/
# Or run with sudo (not recommended)
sudo ada run --local
Ollama connection refused from brain
Issue: Brain can’t reach Ollama even though it’s running.
Solution:
# Check Ollama is listening on all interfaces (not just 127.0.0.1)
ss -tlnp | grep 11434
# If only on 127.0.0.1, restart with:
OLLAMA_HOST=0.0.0.0:11434 ollama serve
ChromaDB errors
Solution:
# Clear ChromaDB data (WARNING: loses history!)
rm -rf data/chroma/
# Restart Ada
ada run --local
Performance Tips
Memory Optimization
# Use smaller model for faster responses
ada run --local
# Edit .env: OLLAMA_MODEL=qwen2.5-coder:3b
# Reduce context window
RAG_TURN_TOP_K=2
RAG_MEMORY_TOP_K=1
CPU-Only Performance
If you don’t have a GPU, use optimized models:
ollama pull qwen2.5-coder:3b
OLLAMA_MODEL=qwen2.5-coder:3b ada run --local
GPU Acceleration
Ada automatically uses GPU if available:
NVIDIA: CUDA toolkit (automatically detected by Ollama)
AMD: ROCm (Linux only)
Apple: Metal (macOS M1/M2)
No configuration needed - Ollama handles it!
Development Workflow
Hot Reload
# Start in development mode
ada run --local --dev
# Edit brain/*.py files
# Changes apply immediately (no restart needed)
Running Tests
# Unit tests
pytest tests/
# Integration tests (requires brain running)
ada run --local &
pytest tests/test_integration.py
Adding Specialists
# 1. Create specialist
vim brain/specialists/my_specialist.py
# 2. Register in __init__.py
vim brain/specialists/__init__.py
# 3. Restart brain (dev mode auto-reloads)
# Changes take effect immediately
Hybrid Setups
Local Brain + Docker Services
Run brain locally but use Docker for optional services:
# Start only web UI and matrix bridge
docker compose up -d frontend matrix-bridge
# Run brain locally
ada run --local
# Access web UI at http://localhost:5000
Local Development + Docker Ollama
Use Docker for Ollama isolation:
# Start Ollama in Docker
docker compose up -d ollama
# Run brain locally pointing to Docker Ollama
OLLAMA_BASE_URL=http://localhost:11434 ada run --local
Next Steps
Web UI: See docs/adapters.rst for Astro dev server
Matrix Bot: See matrix-bridge/QUICKSTART.md
Custom Persona: Edit
persona.mdand restartSpecialists: See docs/specialists.rst
Why Local Mode?
Benefits
Faster startup - No container orchestration
Less memory - One Ollama instance shared across tools
Native GPU - Direct hardware access, no passthrough
Easier development - Hot reload, native debugging
Simpler backups - Just copy
./data/folderLower disk usage - No duplicate model storage
When to use Docker instead
Production deployment with multiple services
Isolated environments (testing, CI/CD)
Teams sharing consistent setup
Remote deployment without shell access
Community & Support
Issues: https://github.com/luna-system/ada/issues
Discussions: https://github.com/luna-system/ada/discussions
Matrix:
#ada-assistant:matrix.org
Made with 💜 by the Ada team
Docker optional, autonomy essential.