Testing Guide

This guide covers testing Ada’s Brain service, including manual testing, automated tests with pytest, and API endpoint testing.

For API endpoint documentation, see API Reference. For configuration options, see Configuration Reference. For architecture overview, see Architecture.

Running Services for Testing

Option 1: Using Docker Compose (Recommended)

cd /home/luna/Code/ada-v1
docker compose up

Then open the UI in your browser at http://localhost:5000/.

Option 2: Local Development

If testing the FastAPI backend directly without Docker:

cd /home/luna/Code/ada-v1
source .venv/bin/activate
python -m uvicorn brain.app:app --host 0.0.0.0 --port 7000

The API will be available at http://localhost:7000/v1/* with interactive docs at http://localhost:7000/docs.

Manual Testing

Testing Markdown Rendering

With the frontend running (via Docker Compose or local dev server):

Try these messages in the chat composer and submit (note: inline markdown is rendered; block-level markdown/code fences are rendered with syntax highlighting):
- Hello bold text
- italic and bold together
- A link: OpenAI
- Inline code: const a = 1;
- Code block:
```
function test() {
  return 42;
}
```
Verify:
- Messages render with bold/italic/links/inline code formatting
- Links open in a new tab and have rel="noopener noreferrer" set
- Code fences (`...`) render with a monospace, sanitized <pre><code> block and basic syntax highlighting
- The ‘Thinking’ bubble (if enabled) renders inline markdown similarly
- If markdown does not render, check the small header status next to the brand for marked:✓ DOMPurify:✓ hljs:✓ — these indicate the client-side libraries loaded successfully
- If any are missing (✕), open DevTools console to see errors and clear browser cache (or refresh with Ctrl/Cmd+Shift+R)
Optional: Check that the chat input and memory list still work as before.

Testing API Endpoints Directly

To test API endpoints without the frontend:

# Health check
curl http://localhost:7000/v1/healthz

# Get recent conversations
curl http://localhost:7000/v1/conversations/recent

# Query memory
curl "http://localhost:7000/v1/memory?query=example"

# Create a memory
curl -X POST http://localhost:7000/v1/memory \
  -H "Content-Type: application/json" \
  -d '{"content": "Test memory", "memory_type": "important"}'

# Interactive API docs
# Open http://localhost:7000/docs in your browser

For complete endpoint documentation, see API Reference and API Usage Guide. For code examples, see Examples.

Testing with Frontend Proxy

All API calls from the frontend go through the Nginx reverse proxy:

# Same endpoints available via proxy
curl http://localhost:5000/api/health
curl http://localhost:5000/api/conversations/recent

# Stream endpoint (SSE)
curl http://localhost:5000/api/chat/stream -X POST \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello", "conversation_id": "test"}'

The proxy remaps /api/* → /v1/* on the brain service with proper SSE handling.

Automated Testing with Pytest

Ada uses a hybrid testing approach combining traditional example-based tests with property-based testing using Hypothesis. All tests are located in the tests/ directory and run in a dedicated Docker container for consistency.

Test Infrastructure

The testing infrastructure includes:

Tests Container: Dedicated Docker service with Python 3.13 and all dependencies
Pytest Configuration: pytest.ini with sensible defaults
Fixtures: Shared test fixtures in tests/conftest.py and feature-specific conftest.py files
Hypothesis: Property-based testing for algorithmic correctness
Convenience Script: scripts/run.sh wrapper for common test commands
Organized Structure: Tests grouped by feature and type (traditional vs property-based)

Running Tests

Quick Way (Recommended)

# Run all tests
./scripts/run.sh test

# Run with verbose output
docker compose run --rm scripts pytest -vv

# Run specific test file
docker compose run --rm scripts pytest tests/test_rag.py

# Run tests matching a pattern
docker compose run --rm scripts pytest -k "memory"

# Skip slow tests
docker compose run --rm scripts pytest -m "not slow"

Direct Way (Full Control)

# Run all tests
docker compose run --rm scripts pytest

# Run specific file with verbose output
docker compose run --rm scripts pytest tests/test_prompt_builder.py -vv

# Run only async tests
docker compose run --rm scripts pytest -k "async"

Test Structure

The test suite uses an organized directory structure:

tests/
  conftest.py                   # Global fixtures (rag_store, conversation_id)

  prompt_builder/               # Traditional example-based tests
    conftest.py                 # Shared fixtures for prompt_builder
    test_context_retriever.py   # Context retrieval from RAG (14 tests)
    test_section_builder.py     # Prompt section formatting
    test_prompt_assembler.py    # Final prompt assembly

  property/                     # Property-based tests (Hypothesis)
    conftest.py                 # Hypothesis configuration
    test_token_properties.py    # Token counting invariants (11 tests)
    test_memory_properties.py   # Memory decay, ranking (v2.0)

  test_rag.py                   # RAG store tests (6 tests)
  test_specialists.py           # Specialist system tests
  test_ai_documentation.py      # Documentation validation tests

Traditional Tests (example-based):: Test specific API behavior, integration workflows, and regression cases. Use parametrization to reduce duplication.
Property Tests (Hypothesis):: Test mathematical properties that should ALWAYS hold (bounds, monotonicity, etc.). Hypothesis generates 100+ random test cases automatically.

Test Types

Traditional Tests verify specific behavior:

def test_context_retriever_initialization():
    """ContextRetriever initializes correctly."""
    retriever = ContextRetriever()
    assert retriever is not None

@pytest.mark.parametrize("method,kwargs,expected_len", [
    ("get_memories", {"query": "test", "k": 5}, 2),
    ("get_faqs", {"query": "test", "k": 3}, 2),
])
def test_retrieval_methods(retriever, method, kwargs, expected_len):
    """Test multiple similar methods with one parametrized test."""
    method = getattr(retriever, method)
    result = method(**kwargs)
    assert len(result) == expected_len

Property Tests verify universal invariants:

from hypothesis import given, strategies as st, example

@given(st.text(min_size=1, max_size=10000))
@example("🎵" * 100)  # Always test this edge case
def test_positive_token_count(text):
    """Non-empty text ALWAYS produces positive tokens."""
    monitor = TokenBudgetMonitor()
    tokens = monitor.count_tokens(text)
    assert tokens > 0

Hypothesis will generate random inputs to try to falsify your assertions!

Adding New Tests

When to use which pattern:

Traditional Tests (example-based)

✅ Specific API behavior
✅ Integration tests
✅ Regression tests for known bugs
✅ Business logic and workflows

Property Tests (Hypothesis)

✅ Mathematical invariants
✅ Algorithmic properties
✅ Edge case discovery
✅ Universal constraints

Example: Traditional Test

# tests/prompt_builder/test_my_feature.py
import pytest
from brain.my_module import my_function

def test_my_feature(rag_store):
    """Test description."""
    result = my_function(rag_store)
    assert result == expected_value
    assert len(result) > 0

@pytest.mark.parametrize("input,expected", [
    ("hello", 2),
    ("hello world", 3),
])
def test_multiple_cases(input, expected):
    """Test multiple similar cases with parametrization."""
    result = my_function(input)
    assert result == expected

Example: Property Test

# tests/property/test_my_properties.py
from hypothesis import given, strategies as st, example

@given(st.text(min_size=1, max_size=1000))
@example("edge case")  # Always test specific cases
def test_universal_property(text):
    """This should ALWAYS be true for ANY input."""
    result = my_function(text)
    assert result > 0  # Non-empty input always produces positive result
    assert isinstance(result, int)  # Result is always an integer

No rebuild needed! Tests are volume-mounted, so you can ad

# Run only property tests docker compose run –rm scripts pytest tests/property/

# Run only traditional tests for a feature docker compose run –rm scripts pytest tests/prompt_builder/

Hypothesis Configuration

Property tests use different profiles for different environments:

# Local development (100 examples, fast)
pytest tests/property/

# CI/thorough testing (1000 examples)
HYPOTHESIS_PROFILE=ci pytest tests/property/

# Debugging (10 examples, verbose)
HYPOTHESIS_PROFILE=debug pytest tests/property/ -v

Configuration is in tests/property/conftest.py.d/edit tests and run immediately.

Test Markers

Use markers to categorize tests:

@pytest.mark.slow
def test_expensive_operation():
    """This test takes a while."""
    pass

@pytest.mark.integration
def test_full_workflow():
    """Tests multiple components together."""
    pass

@pytest.mark.unit
def test_isolated_function():
    """Tests a single function."""
    pass

Run specific markers:

# Skip slow tests
docker compose run --rm scripts pytest -m "not slow"

# Run only integration tests
docker compose run --rm scripts pytest -m "integration"

Health Check Script

Comprehensive health check for operational validation (not a test):

./scripts/run.sh health
# Or directly:
docker compose run --rm scripts python /app/scripts/health_check_chroma.py

Checks:

Chroma server connectivity
Collection existence and document count
Embedding generation
Memory/FAQ/turn query functionality
Specialist docs retrieval
Query consistency
Persona loading

Exit codes: 0 (healthy), 1 (unhealthy)

CI/CD Integration

The pytest infrastructure is designed for easy CI/CD integration:

GitLab CI Example

# .gitlab-ci.yml
test:
  script:
    - docker compose build scripts
    - docker compose run --rm scripts pytest
    - docker compose run --rm scripts python /app/scripts/health_check_chroma.py

GitHub Actions Example

# .github/workflows/test.yml
name: Test
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run tests
        run: |
          docker compose build scripts
          docker compose run --rm scripts pytest
      - name: Health check
        run: docker compose run --rm scripts python /app/scripts/health_check_chroma.py

Pre-Deployment Health Check

#!/bin/bash
# pre-deploy.sh
./scripts/run.sh health || exit 1
./scripts/run.sh test || exit 1

if [ $? -eq 0 ]; then
  echo "All checks passed, proceeding with deployment"
  docker compose up -d
else
  echo "Checks failed, aborting deployment"
  exit 1
fi

Interactive Debugging

Start an interactive Python session with all dependencies:

./scripts/run.sh shell
# Or directly:
docker compose run --rm scripts python

Then in Python:

>>> from brain.rag_store import RagStore
>>> from brain import config
>>>
>>> store = RagStore()
>>> results = store.retrieve_memories("test query", k=5)
>>> print(results)
>>>
>>> # Test embedding generation
>>> embedding = store.embed_text("sample text")
>>> print(len(embedding))  # Should be 768

Test Coverage Priorities

Current Coverage

✅ Well-Covered:

RAG retrieval (6 tests)
Prompt building (2 tests)
Specialist system (1 test)

⚠️ Needs Coverage:

API endpoints (streaming, non-streaming)
Error handling (Chroma down, Ollama down)
Edge cases (invalid input, timeouts)
Specialist execution
Memory consolidation

Recommended Next Tests

High Priority:

API endpoint tests (~10 tests needed)
- Test /v1/chat/stream endpoint
- Test /v1/chat/completions endpoint
- Test health check endpoint variations
- Test error responses (400, 500, 503)
Error handling tests (~5-10 tests needed)
- Test Chroma down scenarios
- Test Ollama down scenarios
- Test invalid input handling
- Test timeout handling

Medium Priority:

Specialist system tests
- Test specialist activation logic
- Test specialist execution
- Test bidirectional specialist communication
Memory consolidation tests
- Test nightly consolidation logic
- Test summary generation
- Test memory archival

Example API Endpoint Test

# tests/test_api.py
import pytest
import httpx
import json

@pytest.mark.asyncio
async def test_chat_stream_endpoint():
    """Test streaming chat endpoint."""
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "http://localhost:7000/v1/chat/stream",
            json={"messages": [{"role": "user", "content": "hi"}]}
        )
        assert response.status_code == 200

        chunks = []
        async for line in response.aiter_lines():
            if line.startswith("data: "):
                chunks.append(json.loads(line[6:]))

        assert len(chunks) > 0

@pytest.mark.asyncio
async def test_handles_chroma_down(monkeypatch):
    """Test graceful handling when Chroma is unavailable."""
    def mock_fail(*args, **kwargs):
        raise ConnectionError("Chroma unavailable")

    monkeypatch.setattr("chromadb.HttpClient", mock_fail)

    # Test that API returns 503 instead of crashing
    async with httpx.AsyncClient() as client:
        response = await client.get("http://localhost:7000/v1/healthz")
        assert response.status_code == 503

Troubleshooting Tests

Tests Fail Locally but Pass in Container

Solution: Always run tests in the scripts container, not on your host machine. The container has the correct environment:

# ✅ Correct
./scripts/run.sh test

# ❌ Wrong
pytest  # Don't run on host

Import Errors

Problem: ModuleNotFoundError: No module named 'brain'

Solution: Use the scripts container. The PYTHONPATH is configured correctly:

docker compose run --rm scripts pytest

Can’t Reach Services

Problem: httpx.ConnectError: Connection refused

Solution:

Ensure services are running: docker compose ps
Use service names (chroma, ollama), not localhost
Check environment variables in compose.yaml

Test Discovery Issues

Problem: Pytest doesn’t find your tests