Streaming
The Brain API supports real-time token delivery via Server-Sent Events (SSE) on POST /v1/chat/stream. Use this for low-latency, progressive rendering of LLM responses.
For complete endpoint documentation, see API Reference. For client examples, see Examples.
Endpoint
POST /v1/chat/stream
Request Body (JSON)
Same as /v1/chat:
prompt(str, required)conversation_id(str, optional)include_thinking(bool, optional)entity(str, optional)save_memory(bool, optional)memory_text(str, optional)
Response Format (SSE)
Content-Type: text/event-stream
Events are newline-delimited and prefixed with ``data: ``:
data: {"type": "thinking", "content": "Reasoning token"}
data: {"type": "token", "content": "Hello "}
data: {"type": "token", "content": "world"}
data: {"type": "done", "conversation_id": "...", "used_context": {...}}
Event Types
thinking: Reasoning tokens (only ifinclude_thinkingis true)token: Assistant response tokensdone: Final metadata (conversation_id, used_context, timestamps, request_id)error: Error details if the stream fails
curl Example
curl -N -X POST http://localhost:7000/v1/chat/stream \
-H "Content-Type: application/json" \
-d '{"prompt": "Hello!", "include_thinking": true}'
Python (requests)
import requests, json
resp = requests.post(
'http://localhost:7000/v1/chat/stream',
json={'prompt': 'What is AI?', 'include_thinking': True},
stream=True,
)
for line in resp.iter_lines():
if line.startswith(b'data: '):
event = json.loads(line[6:])
if event['type'] == 'token':
print(event['content'], end='', flush=True)
elif event['type'] == 'thinking':
print('\n🤔', event['content'], end='', flush=True)
elif event['type'] == 'done':
print('\n✓ done')
JavaScript (fetch + ReadableStream)
async function streamChat(prompt) {
const res = await fetch('/api/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt, include_thinking: true })
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop();
for (const line of lines) {
if (line.startsWith('data: ')) {
const event = JSON.parse(line.slice(6));
if (event.type === 'token') console.log(event.content);
if (event.type === 'done') console.log('done');
}
}
}
}
Best Practices
Use
-Nwith curl to disable buffering.In UIs, batch DOM updates (e.g., via
requestAnimationFrame) for smoother rendering.Handle both HTTP errors and stream
errorevents.Use
AbortController(fetch) to cancel long-running streams.Memory is persisted after
done; partial streams do not save. See Memory for memory management.