API Usage Guide =============== The Brain service is a pure **FastAPI** REST API service fully documented with docstrings and OpenAPI/Swagger specifications. The API handles LLM orchestration with retrieval-augmented generation (RAG). Quick Start ----------- This guide covers API usage patterns and integration. For setup instructions, see :doc:`getting_started`. For configuration options, see :doc:`configuration`. Viewing API Documentation ~~~~~~~~~~~~~~~~~~~~~~~~~~ **Option 1: Interactive API Docs (Recommended)** Start the service and open the interactive docs in your browser: .. code-block:: bash cd /home/luna/Code/ada-v1 # Option A: Docker Compose (all services) docker compose up # Then visit: http://localhost:7000/docs # Option B: Local development source .venv/bin/activate python -m uvicorn brain.app:app --host 0.0.0.0 --port 7000 # Then visit: http://localhost:7000/docs The ``/docs`` endpoint provides: - **Swagger UI** - Interactive endpoint testing - **OpenAPI 3.0 schema** - Auto-generated from FastAPI decorators and type hints - **Request/response examples** - With actual JSON schemas - **Try it out** - Send real requests directly from the UI **Option 2: Using pydoc (Built-in)** .. code-block:: bash cd /home/luna/Code/ada-v1 python -m pydoc brain.app | less **Option 3: VS Code Pylance Hover Tooltips** Hover over any function/endpoint name in ``brain/app.py`` to see full documentation inline. API Architecture ---------------- Framework ~~~~~~~~~ - **Framework:** FastAPI 0.109.0+ - **Server:** Gunicorn + Uvicorn workers (async ASGI) - **Port:** 7000 (direct) / 5000 (via Nginx proxy) - **Workers:** 25 Uvicorn workers (CPU × 2 + 1) - **Timeout:** 300 seconds (for long LLM operations) Base URL ~~~~~~~~ :: Direct: http://localhost:7000/v1 Proxied: http://localhost:5000/api (via Nginx, remapped to /v1) Request/Response Format ~~~~~~~~~~~~~~~~~~~~~~~ - **Content-Type:** ``application/json`` - **Streaming:** Server-Sent Events (SSE) for ``/v1/chat/stream`` (see :doc:`streaming`) - **Error Handling:** JSON with HTTP status codes and error messages API Endpoints ------------- All endpoints are fully documented in ``brain/app.py`` with docstrings. Below is a summary: Health & Status ~~~~~~~~~~~~~~~ ``GET /v1/healthz`` - **Purpose:** Service health check with dependency status - **Returns:** JSON with service status, config, persona, Chroma connectivity - **Example:** .. code-block:: bash curl http://localhost:7000/v1/healthz Media Integration ~~~~~~~~~~~~~~~~~ ``GET /v1/media/listenbrainz`` - **Purpose:** Get user's recent listening context from ListenBrainz - **Returns:** User listening data or empty dict if not configured - **Example:** .. code-block:: bash curl http://localhost:7000/v1/media/listenbrainz Chat - Streaming (Recommended) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``POST /v1/chat/stream`` - **Purpose:** Stream LLM responses token-by-token via Server-Sent Events (SSE) - **Request Body:** .. code-block:: json { "message": "Your question here", "conversation_id": "optional-id" } - **Response:** Server-Sent Events stream (each line is a JSON event) - **Events:** - ``data: {"token": "string"}`` - Streamed token - ``data: {"done": true}`` - Stream complete - ``data: {"error": "message"}`` - Error occurred - **Example:** .. code-block:: bash curl http://localhost:7000/v1/chat/stream -X POST \ -H "Content-Type: application/json" \ -d '{"message": "Hello", "conversation_id": "chat-1"}' .. note:: Nginx reverse proxy configured with ``proxy_buffering off`` for real-time SSE Memory Management ~~~~~~~~~~~~~~~~~ **List Memories** ``GET /v1/memory?query=string&limit=10&conversation_id=id`` - **Purpose:** Search long-term memories with semantic query - **Query Parameters:** - ``query`` (string, required): Search text - ``limit`` (int, optional): Max results (default: 10) - ``conversation_id`` (string, optional): Filter by conversation - **Returns:** List of memory objects with embeddings and metadata - **Example:** .. code-block:: bash curl "http://localhost:7000/v1/memory?query=previous%20topic&limit=5" **Create Memory** ``POST /v1/memory`` - **Purpose:** Create new long-term memory entry - **Request Body:** .. code-block:: json { "content": "Memory text", "memory_type": "important|context|fact", "conversation_id": "optional-conversation-id" } - **Returns:** Memory object with ID, embeddings, metadata - **Status:** 201 Created - **Example:** .. code-block:: bash curl -X POST http://localhost:7000/v1/memory \ -H "Content-Type: application/json" \ -d '{"content": "User likes Python", "memory_type": "fact"}' **Delete Memory** ``DELETE /v1/memory/{mem_id}`` - **Purpose:** Delete a specific memory entry - **Path Parameter:** ``mem_id`` (string): Memory UUID - **Returns:** Confirmation message - **Example:** .. code-block:: bash curl -X DELETE http://localhost:7000/v1/memory/abc-123-def For detailed memory patterns and best practices, see :doc:`memory`. For memory schema definitions, see :doc:`data_model`. Debug Endpoints ~~~~~~~~~~~~~~~ **RAG System Stats** ``GET /v1/debug/rag`` - **Purpose:** Get information about RAG system (Chroma vector DB) - **Returns:** Database stats, indexed document count, embedding model info - **Requires:** ``RAG_DEBUG=true`` environment variable - **Example:** .. code-block:: bash curl http://localhost:7000/v1/debug/rag **Assembled Prompt Debug** ``GET /v1/debug/prompt`` - **Purpose:** See the final prompt that will be sent to the LLM - **Returns:** Full prompt text with context, persona, memory, etc. - **Requires:** ``RAG_DEBUG=true`` environment variable - **Example:** .. code-block:: bash curl http://localhost:7000/v1/debug/prompt Conversation History ~~~~~~~~~~~~~~~~~~~~ **Recent Conversations** ``GET /v1/conversations/recent?limit=10`` - **Purpose:** Get list of recent conversations - **Query Parameters:** - ``limit`` (int, optional): Max results (default: 10) - **Returns:** List of conversation summaries - **Example:** .. code-block:: bash curl "http://localhost:7000/v1/conversations/recent?limit=5" **Get Conversation Turns** ``GET /v1/conversations/{conversation_id}`` - **Purpose:** Get all turns (messages) in a specific conversation - **Path Parameter:** ``conversation_id`` (string): Conversation UUID - **Returns:** List of turns with timestamps, roles (user/assistant), content - **Example:** .. code-block:: bash curl http://localhost:7000/v1/conversations/chat-session-001 Quick Reference --------------- Common Operations ~~~~~~~~~~~~~~~~~ Health Check ^^^^^^^^^^^^ .. code-block:: bash curl http://localhost:7000/v1/healthz Streaming Chat ^^^^^^^^^^^^^^ .. code-block:: bash curl -N -X POST http://localhost:7000/v1/chat/stream \ -H "Content-Type: application/json" \ -d '{"prompt": "Hello!", "conversation_id": "chat-123"}' Search Memories ^^^^^^^^^^^^^^^ .. code-block:: bash curl "http://localhost:7000/v1/memory?search=preferences&limit=5" Add Memory ^^^^^^^^^^ .. code-block:: bash curl -X POST http://localhost:7000/v1/memory \ -H "Content-Type: application/json" \ -d '{"text": "User prefers concise answers", "importance": 4}' Response Status Codes ~~~~~~~~~~~~~~~~~~~~~ ========= ============ ============================================= Code Endpoint Meaning ========= ============ ============================================= 200 All Success 201 /memory Created 400 /chat* Invalid parameters 404 /debug/* Debug disabled 500 All Server error (Ollama/DB issue) 503 /memory* RAG not available 503 /healthz Critical dependency unavailable ========= ============ ============================================= Python Examples ~~~~~~~~~~~~~~~ Non-Streaming Request ^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python import requests response = requests.post( 'http://localhost:7000/v1/chat', json={'prompt': 'Hello!', 'include_thinking': True} ) data = response.json() print(data['response']) Streaming Request ^^^^^^^^^^^^^^^^^ .. code-block:: python import requests import json response = requests.post( 'http://localhost:7000/v1/chat/stream', json={'prompt': 'Hello!'}, stream=True ) for line in response.iter_lines(): if line.startswith(b'data: '): event = json.loads(line[6:]) if event['type'] == 'token': print(event['content'], end='', flush=True) JavaScript Examples ~~~~~~~~~~~~~~~~~~~ Using Fetch + EventSource ^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: javascript // Using EventSource (simpler) const es = new EventSource('/api/chat/stream'); es.addEventListener('message', (e) => { const data = JSON.parse(e.data); if (data.type === 'token') { document.body.innerHTML += data.content; } if (data.type === 'done') es.close(); }); Using Fetch + ReadableStream ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: javascript const res = await fetch('/api/chat/stream', { method: 'POST', body: JSON.stringify({prompt: 'Hello!'}) }); const reader = res.body.getReader(); const decoder = new TextDecoder(); let buffer = ''; while (true) { const {done, value} = await reader.read(); if (done) break; buffer += decoder.decode(value); const lines = buffer.split('\n'); buffer = lines.pop(); for (const line of lines) { if (line.startsWith('data: ')) { const data = JSON.parse(line.slice(6)); if (data.type === 'token') { document.body.innerHTML += data.content; } } } } Environment Variables --------------------- Key environment variables affecting the API: ========================= ================================== ================================================== Variable Default Description ========================= ================================== ================================================== ``OLLAMA_BASE_URL`` http://localhost:11434 LLM backend ``CHROMA_HOST`` http://chroma:8000 Vector database ``PERSONA_FILE`` /app/persona.md Persona configuration ``RAG_DEBUG`` false Enable debug endpoints ``LISTENBRAINZ_USER`` - ListenBrainz username ``LISTENBRAINZ_TOKEN`` - ListenBrainz API token ``SEARXNG_URL`` - Web search service URL ========================= ================================== ================================================== Code Structure -------------- Main Files ~~~~~~~~~~ - ``brain/app.py`` - Main FastAPI application (850+ lines) - All 10 endpoints with async handlers - Type hints throughout - Full docstrings for each endpoint - Lifespan context manager for startup/shutdown Modular Components ~~~~~~~~~~~~~~~~~~ - ``brain/config.py`` - Configuration management (60 lines) - ``brain/rag_store.py`` - RAG system integration (600+ lines) - ``brain/llm.py`` - LLM provider interface (60 lines) - ``brain/media.py`` - External media integration (100 lines) - ``brain/prompt_builder/`` - Modular prompt assembly with caching: - ``context_retriever.py`` - RAG data retrieval (cache-aware) - ``section_builder.py`` - Section formatting (150 lines) - ``prompt_assembler.py`` - Final assembly with caching (170 lines) - ``brain/context_cache.py`` - Multi-timescale context caching (220 lines) Type Hints & Documentation ~~~~~~~~~~~~~~~~~~~~~~~~~~~ All endpoints use FastAPI type hints: .. code-block:: python from fastapi import FastAPI, Query from fastapi.responses import StreamingResponse @app.get('/v1/memory') async def list_memory( query: str = Query(..., description="Search query text"), limit: int = Query(10, description="Max results"), ) -> dict: """ Search long-term memories with semantic query. Full docstring with details... """ This enables: ✅ Automatic request validation ✅ Type checking with Pylance ✅ Interactive API docs at ``/docs`` ✅ OpenAPI 3.0 schema export Error Handling -------------- All errors return JSON with HTTP status codes: .. code-block:: json { "detail": "Error message describing what went wrong" } Common status codes: - **200** - Success - **201** - Created - **400** - Bad request (invalid parameters) - **404** - Not found - **500** - Server error Deployment ---------- Docker Compose (Recommended) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: bash cd /home/luna/Code/ada-v1 docker compose up Services: - **web** (port 5000): Nginx reverse proxy + static frontend - **brain** (port 7000): FastAPI backend with Gunicorn + Uvicorn workers - **chroma** (port 8000): Vector database - **ollama** (port 11434): LLM inference server Local Development ~~~~~~~~~~~~~~~~~ .. code-block:: bash cd /home/luna/Code/ada-v1 source .venv/bin/activate # Install dependencies uv sync # Run development server python -m uvicorn brain.app:app --host 0.0.0.0 --port 7000 --reload The ``--reload`` flag auto-restarts on code changes. Production Deployment ~~~~~~~~~~~~~~~~~~~~~ The ``brain/gunicorn_config.py`` file is configured for production: .. code-block:: bash gunicorn -c brain/gunicorn_config.py brain.wsgi:app Configuration: - Uvicorn workers (ASGI) - 25 worker processes - 300-second timeout (for LLM calls) - Access logging - Graceful shutdown Integration with Frontend ------------------------- The Nginx reverse proxy (``frontend/nginx.conf.template``) maps: :: /api/* → http://brain:7000/v1/* So frontend calls to ``http://localhost:5000/api/chat/stream`` are proxied to the backend at ``http://brain:7000/v1/chat/stream``. Special handling: - **SSE streaming:** ``proxy_buffering off`` for real-time events - **Headers:** X-Forwarded-* headers preserved for logging Adding New Endpoints -------------------- When adding new routes, follow this template: .. code-block:: python from fastapi import FastAPI, Query from fastapi.responses import JSONResponse @app.get('/v1/new-endpoint', tags=['category']) async def new_endpoint( param1: str = Query(..., description="Parameter description"), param2: int = Query(default=10, description="Optional param"), ) -> dict: """ Brief one-line description. Longer explanation of what this endpoint does and when to use it. **Parameters:** - param1 (str): Required parameter - param2 (int): Optional parameter (default: 10) **Returns:** - dict with keys: result_key1, result_key2 **Raises:** - ValueError: If param1 is empty **Example:** .. code-block:: bash curl http://localhost:7000/v1/new-endpoint?param1=value¶m2=20 Response: .. code-block:: json {"result": "value"} """ try: # Your implementation return {"result": "value"} except ValueError as e: return JSONResponse( status_code=400, content={"detail": str(e)} ) The docstring will automatically appear in: - FastAPI ``/docs`` (Swagger UI) - ``python -m pydoc brain.app`` - VS Code Pylance tooltips Resources --------- - `FastAPI Documentation `_ - `Uvicorn Documentation `_ - `OpenAPI 3.0 Specification `_ - `Server-Sent Events (SSE) `_