API Usage Guide

The Brain service is a pure FastAPI REST API service fully documented with docstrings and OpenAPI/Swagger specifications. The API handles LLM orchestration with retrieval-augmented generation (RAG).

Quick Start

This guide covers API usage patterns and integration. For setup instructions, see Getting Started. For configuration options, see Configuration Reference.

Viewing API Documentation

Option 1: Interactive API Docs (Recommended)

Start the service and open the interactive docs in your browser:

cd /home/luna/Code/ada-v1

# Option A: Docker Compose (all services)
docker compose up
# Then visit: http://localhost:7000/docs

# Option B: Local development
source .venv/bin/activate
python -m uvicorn brain.app:app --host 0.0.0.0 --port 7000
# Then visit: http://localhost:7000/docs

The /docs endpoint provides:

Swagger UI - Interactive endpoint testing
OpenAPI 3.0 schema - Auto-generated from FastAPI decorators and type hints
Request/response examples - With actual JSON schemas
Try it out - Send real requests directly from the UI

Option 2: Using pydoc (Built-in)

cd /home/luna/Code/ada-v1
python -m pydoc brain.app | less

Option 3: VS Code Pylance Hover Tooltips

Hover over any function/endpoint name in brain/app.py to see full documentation inline.

API Architecture

Framework

Framework: FastAPI 0.109.0+
Server: Gunicorn + Uvicorn workers (async ASGI)
Port: 7000 (direct) / 5000 (via Nginx proxy)
Workers: 25 Uvicorn workers (CPU × 2 + 1)
Timeout: 300 seconds (for long LLM operations)

Base URL

Direct:  http://localhost:7000/v1
Proxied: http://localhost:5000/api  (via Nginx, remapped to /v1)

Request/Response Format

Content-Type: application/json
Streaming: Server-Sent Events (SSE) for /v1/chat/stream (see Streaming)
Error Handling: JSON with HTTP status codes and error messages

API Endpoints

All endpoints are fully documented in brain/app.py with docstrings. Below is a summary:

Health & Status

GET /v1/healthz

Purpose: Service health check with dependency status
Returns: JSON with service status, config, persona, Chroma connectivity
Example:

curl http://localhost:7000/v1/healthz

Media Integration

GET /v1/media/listenbrainz

Purpose: Get user’s recent listening context from ListenBrainz
Returns: User listening data or empty dict if not configured
Example:

curl http://localhost:7000/v1/media/listenbrainz

Chat - Streaming (Recommended)

POST /v1/chat/stream

Purpose: Stream LLM responses token-by-token via Server-Sent Events (SSE)
Request Body:

{
  "message": "Your question here",
  "conversation_id": "optional-id"
}

Response: Server-Sent Events stream (each line is a JSON event)
Events:
- data: {"token": "string"} - Streamed token
- data: {"done": true} - Stream complete
- data: {"error": "message"} - Error occurred
Example:

curl http://localhost:7000/v1/chat/stream -X POST \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello", "conversation_id": "chat-1"}'

Note

Nginx reverse proxy configured with proxy_buffering off for real-time SSE

Memory Management

List Memories

GET /v1/memory?query=string&limit=10&conversation_id=id

Purpose: Search long-term memories with semantic query
Query Parameters:
- query (string, required): Search text
- limit (int, optional): Max results (default: 10)
- conversation_id (string, optional): Filter by conversation
Returns: List of memory objects with embeddings and metadata
Example:

curl "http://localhost:7000/v1/memory?query=previous%20topic&limit=5"

Create Memory

POST /v1/memory

Purpose: Create new long-term memory entry
Request Body:

{
  "content": "Memory text",
  "memory_type": "important|context|fact",
  "conversation_id": "optional-conversation-id"
}

Returns: Memory object with ID, embeddings, metadata
Status: 201 Created
Example:

curl -X POST http://localhost:7000/v1/memory \
  -H "Content-Type: application/json" \
  -d '{"content": "User likes Python", "memory_type": "fact"}'

Delete Memory

DELETE /v1/memory/{mem_id}

Purpose: Delete a specific memory entry
Path Parameter: mem_id (string): Memory UUID
Returns: Confirmation message
Example:

curl -X DELETE http://localhost:7000/v1/memory/abc-123-def

For detailed memory patterns and best practices, see Memory. For memory schema definitions, see Data Model Reference. Debug Endpoints ~~~~~~~~~~~~~~~

RAG System Stats

GET /v1/debug/rag

Purpose: Get information about RAG system (Chroma vector DB)
Returns: Database stats, indexed document count, embedding model info
Requires: RAG_DEBUG=true environment variable
Example:

curl http://localhost:7000/v1/debug/rag

Assembled Prompt Debug

GET /v1/debug/prompt

Purpose: See the final prompt that will be sent to the LLM
Returns: Full prompt text with context, persona, memory, etc.
Requires: RAG_DEBUG=true environment variable
Example:

curl http://localhost:7000/v1/debug/prompt

Conversation History

Recent Conversations

GET /v1/conversations/recent?limit=10

Purpose: Get list of recent conversations
Query Parameters:
- limit (int, optional): Max results (default: 10)
Returns: List of conversation summaries
Example:

curl "http://localhost:7000/v1/conversations/recent?limit=5"

Get Conversation Turns

GET /v1/conversations/{conversation_id}

Purpose: Get all turns (messages) in a specific conversation
Path Parameter: conversation_id (string): Conversation UUID
Returns: List of turns with timestamps, roles (user/assistant), content
Example:

curl http://localhost:7000/v1/conversations/chat-session-001

Quick Reference

Common Operations

Health Check

curl http://localhost:7000/v1/healthz

Streaming Chat

curl -N -X POST http://localhost:7000/v1/chat/stream \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Hello!", "conversation_id": "chat-123"}'

Search Memories

curl "http://localhost:7000/v1/memory?search=preferences&limit=5"

Add Memory

curl -X POST http://localhost:7000/v1/memory \
  -H "Content-Type: application/json" \
  -d '{"text": "User prefers concise answers", "importance": 4}'

Response Status Codes

Code	Endpoint	Meaning
200	All	Success
201	/memory	Created
400	/chat*	Invalid parameters
404	/debug/*	Debug disabled
500	All	Server error (Ollama/DB issue)
503	/memory*	RAG not available
503	/healthz	Critical dependency unavailable

Python Examples

Non-Streaming Request

import requests

response = requests.post(
    'http://localhost:7000/v1/chat',
    json={'prompt': 'Hello!', 'include_thinking': True}
)
data = response.json()
print(data['response'])

Streaming Request

import requests
import json

response = requests.post(
    'http://localhost:7000/v1/chat/stream',
    json={'prompt': 'Hello!'},
    stream=True
)

for line in response.iter_lines():
    if line.startswith(b'data: '):
        event = json.loads(line[6:])
        if event['type'] == 'token':
            print(event['content'], end='', flush=True)

JavaScript Examples

Using Fetch + EventSource

// Using EventSource (simpler)
const es = new EventSource('/api/chat/stream');
es.addEventListener('message', (e) => {
  const data = JSON.parse(e.data);
  if (data.type === 'token') {
    document.body.innerHTML += data.content;
  }
  if (data.type === 'done') es.close();
});

Using Fetch + ReadableStream

const res = await fetch('/api/chat/stream', {
  method: 'POST',
  body: JSON.stringify({prompt: 'Hello!'})
});

const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
  const {done, value} = await reader.read();
  if (done) break;

  buffer += decoder.decode(value);
  const lines = buffer.split('\n');
  buffer = lines.pop();

  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));
      if (data.type === 'token') {
        document.body.innerHTML += data.content;
      }
    }
  }
}

Environment Variables

Key environment variables affecting the API:

Variable	Default	Description
`OLLAMA_BASE_URL`	http://localhost:11434	LLM backend
`CHROMA_HOST`	http://chroma:8000	Vector database
`PERSONA_FILE`	/app/persona.md	Persona configuration
`RAG_DEBUG`	false	Enable debug endpoints
`LISTENBRAINZ_USER`		ListenBrainz username
`LISTENBRAINZ_TOKEN`		ListenBrainz API token
`SEARXNG_URL`		Web search service URL

Code Structure

Main Files

brain/app.py - Main FastAPI application (850+ lines)
- All 10 endpoints with async handlers
- Type hints throughout
- Full docstrings for each endpoint
- Lifespan context manager for startup/shutdown

Modular Components

brain/config.py - Configuration management (60 lines)
brain/rag_store.py - RAG system integration (600+ lines)
brain/llm.py - LLM provider interface (60 lines)
brain/media.py - External media integration (100 lines)
brain/prompt_builder/ - Modular prompt assembly with caching: - context_retriever.py - RAG data retrieval (cache-aware) - section_builder.py - Section formatting (150 lines) - prompt_assembler.py - Final assembly with caching (170 lines)
brain/context_cache.py - Multi-timescale context caching (220 lines)

Type Hints & Documentation

All endpoints use FastAPI type hints:

from fastapi import FastAPI, Query
from fastapi.responses import StreamingResponse

@app.get('/v1/memory')
async def list_memory(
    query: str = Query(..., description="Search query text"),
    limit: int = Query(10, description="Max results"),
) -> dict:
    """
    Search long-term memories with semantic query.

    Full docstring with details...
    """

This enables:

✅ Automatic request validation ✅ Type checking with Pylance ✅ Interactive API docs at /docs ✅ OpenAPI 3.0 schema export

Error Handling

All errors return JSON with HTTP status codes:

{
  "detail": "Error message describing what went wrong"
}

Common status codes:

200 - Success
201 - Created
400 - Bad request (invalid parameters)
404 - Not found
500 - Server error

Deployment

Docker Compose (Recommended)

cd /home/luna/Code/ada-v1
docker compose up

Services:

web (port 5000): Nginx reverse proxy + static frontend
brain (port 7000): FastAPI backend with Gunicorn + Uvicorn workers
chroma (port 8000): Vector database
ollama (port 11434): LLM inference server

Local Development

cd /home/luna/Code/ada-v1
source .venv/bin/activate

# Install dependencies
uv sync

# Run development server
python -m uvicorn brain.app:app --host 0.0.0.0 --port 7000 --reload

The --reload flag auto-restarts on code changes.

Production Deployment

The brain/gunicorn_config.py file is configured for production:

gunicorn -c brain/gunicorn_config.py brain.wsgi:app

Configuration:

Uvicorn workers (ASGI)
25 worker processes
300-second timeout (for LLM calls)
Access logging
Graceful shutdown

Integration with Frontend

The Nginx reverse proxy (frontend/nginx.conf.template) maps:

/api/*  →  http://brain:7000/v1/*

So frontend calls to http://localhost:5000/api/chat/stream are proxied to the backend at http://brain:7000/v1/chat/stream.

Special handling:

SSE streaming: proxy_buffering off for real-time events
Headers: X-Forwarded-* headers preserved for logging

Adding New Endpoints

When adding new routes, follow this template:

from fastapi import FastAPI, Query
from fastapi.responses import JSONResponse

@app.get('/v1/new-endpoint', tags=['category'])
async def new_endpoint(
    param1: str = Query(..., description="Parameter description"),
    param2: int = Query(default=10, description="Optional param"),
) -> dict:
    """
    Brief one-line description.

    Longer explanation of what this endpoint does and when to use it.

    **Parameters:**
    - param1 (str): Required parameter
    - param2 (int): Optional parameter (default: 10)

    **Returns:**
    - dict with keys: result_key1, result_key2

    **Raises:**
    - ValueError: If param1 is empty

    **Example:**

    .. code-block:: bash

       curl http://localhost:7000/v1/new-endpoint?param1=value&param2=20

    Response:

    .. code-block:: json

       {"result": "value"}
    """
    try:
        # Your implementation
        return {"result": "value"}
    except ValueError as e:
        return JSONResponse(
            status_code=400,
            content={"detail": str(e)}
        )

The docstring will automatically appear in:

FastAPI /docs (Swagger UI)
python -m pydoc brain.app
VS Code Pylance tooltips