========= Streaming ========= The Brain API supports real-time token delivery via **Server-Sent Events (SSE)** on ``POST /v1/chat/stream``. Use this for low-latency, progressive rendering of LLM responses. For complete endpoint documentation, see :doc:`api_reference`. For client examples, see :doc:`examples`. Endpoint -------- .. code-block:: bash POST /v1/chat/stream Request Body (JSON) ------------------- Same as ``/v1/chat``: - ``prompt`` (str, required) - ``conversation_id`` (str, optional) - ``include_thinking`` (bool, optional) - ``entity`` (str, optional) - ``save_memory`` (bool, optional) - ``memory_text`` (str, optional) Response Format (SSE) --------------------- ``Content-Type: text/event-stream`` Events are newline-delimited and prefixed with ``data: ``: .. code-block:: text data: {"type": "thinking", "content": "Reasoning token"} data: {"type": "token", "content": "Hello "} data: {"type": "token", "content": "world"} data: {"type": "done", "conversation_id": "...", "used_context": {...}} Event Types ----------- - ``thinking``: Reasoning tokens (only if ``include_thinking`` is true) - ``token``: Assistant response tokens - ``done``: Final metadata (conversation_id, used_context, timestamps, request_id) - ``error``: Error details if the stream fails curl Example ------------ .. code-block:: bash curl -N -X POST http://localhost:7000/v1/chat/stream \ -H "Content-Type: application/json" \ -d '{"prompt": "Hello!", "include_thinking": true}' Python (requests) ----------------- .. code-block:: python import requests, json resp = requests.post( 'http://localhost:7000/v1/chat/stream', json={'prompt': 'What is AI?', 'include_thinking': True}, stream=True, ) for line in resp.iter_lines(): if line.startswith(b'data: '): event = json.loads(line[6:]) if event['type'] == 'token': print(event['content'], end='', flush=True) elif event['type'] == 'thinking': print('\n🤔', event['content'], end='', flush=True) elif event['type'] == 'done': print('\n✓ done') JavaScript (fetch + ReadableStream) ----------------------------------- .. code-block:: javascript async function streamChat(prompt) { const res = await fetch('/api/chat/stream', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ prompt, include_thinking: true }) }); const reader = res.body.getReader(); const decoder = new TextDecoder(); let buffer = ''; while (true) { const { done, value } = await reader.read(); if (done) break; buffer += decoder.decode(value, { stream: true }); const lines = buffer.split('\n'); buffer = lines.pop(); for (const line of lines) { if (line.startsWith('data: ')) { const event = JSON.parse(line.slice(6)); if (event.type === 'token') console.log(event.content); if (event.type === 'done') console.log('done'); } } } } Best Practices -------------- - Use ``-N`` with curl to disable buffering. - In UIs, batch DOM updates (e.g., via ``requestAnimationFrame``) for smoother rendering. - Handle both HTTP errors and stream ``error`` events. - Use ``AbortController`` (fetch) to cancel long-running streams. - Memory is persisted after ``done``; partial streams do not save. See :doc:`memory` for memory management. Related Documentation --------------------- - :doc:`api_reference` - Complete endpoint documentation - :doc:`examples` - Code examples in Python, JavaScript, curl - :doc:`memory` - Memory persistence and retrieval - :doc:`configuration` - Streaming and timeout configuration