Bidirectional Specialist System
Overview
The bidirectional specialist system enables LLM ↔ Specialist communication during inference. Unlike one-way specialists (OCR, media) that only inject context upfront, bidirectional specialists can be requested by the LLM mid-response.
Architecture
Flow Diagram
User Message → Prompt Assembly (with upfront specialists)
↓
LLM starts generating
↓
Streaming tokens to frontend
↓
[LLM emits SPECIALIST_REQUEST[...]]
↓
Handler detects request syntax
↓
Execute requested specialist
↓
Inject result back into stream
↓
LLM continues with new context
Key Components
1. Bidirectional Handler (brain/specialists/bidirectional.py)
Monitors streaming output for specialist requests
Parses request syntax and parameters
Executes specialists on-demand
Injects results back into token stream
Safety limits (max 5 calls per turn)
2. System Prompt (brain/config.py)
Teaches LLM the specialist request syntax
Documents available specialists
Provides usage examples
Sets expectations (results format, limits)
3. Stream Wrapper (brain/app.py)
Wraps base LLM stream with specialist handler
Passes request context for specialist execution
Seamlessly integrates with existing SSE streaming
Request Syntax
Primary Syntax
SPECIALIST_REQUEST[specialist_name:{"param":"value","another":"param"}]
Example:
SPECIALIST_REQUEST[vision:{"focus":"technical_diagrams"}]
Alternative Mention Syntax (future)
@vision_specialist(focus=architecture)
Usage Examples
Example 1: Vision Analysis
User: “What’s in this image?”
LLM Response:
Let me analyze the image in detail.
SPECIALIST_REQUEST[vision:{"focus":"content_description"}]
[SPECIALIST_RESULT: vision]
👁️ VISUAL ANALYSIS from 'diagram.png'
The image contains a system architecture diagram showing microservices...
[/SPECIALIST_RESULT]
Based on the visual analysis, this is a microservices architecture with...
Example 2: Enhanced OCR
User: “Extract text from this complex document”
LLM Response:
I'll analyze the document structure first.
SPECIALIST_REQUEST[ocr:{"enhance":true,"language":"en"}]
[SPECIALIST_RESULT: ocr]
📄 OCR EXTRACTED TEXT from 'document.pdf'
[Text content here...]
[/SPECIALIST_RESULT]
The document appears to be a contract with the following key sections...
Current Specialists
One-Way (Auto-activated)
OCR: Activated when
ocr_contextpresentMedia: Activated when
mediapresent
Bidirectional (LLM-requested)
Web Search: Real-time information from the web
Wiki Lookup: MediaWiki-based encyclopedia queries
Docs Lookup: Ada’s own documentation self-reference
OCR (can also be requested): Enhanced text extraction
Safety & Limits
Built-in Protections
Max Calls Per Turn: 5 specialist requests maximum
Error Handling: Graceful failures don’t crash the stream
Timeout Protection: Specialists should complete quickly
Validation: Only registered specialists can be invoked
Error Messages
When specialists fail, the LLM sees:
[SPECIALIST_ERROR: specialist_name failed - reason]
The LLM can acknowledge this and handle gracefully.
Implementation Details
Handler Detection Logic
# Monitors accumulating stream for pattern
SPECIALIST_REQUEST_PATTERN = re.compile(
r'SPECIALIST_REQUEST\[(\w+):(.*?)\]',
re.DOTALL
)
async def detect_request(text: str):
match = SPECIALIST_REQUEST_PATTERN.search(text)
if match:
specialist_name = match.group(1)
params_json = match.group(2)
params = json.loads(params_json)
return {'specialist': specialist_name, 'params': params}
Stream Integration
# Wrap base stream with specialist handler
base_stream = stream_chat_async(prompt, model=MODEL)
specialist_stream = stream_with_specialists(
base_stream,
request_context,
max_specialist_calls=5
)
# Frontend receives enriched stream
async for chunk in specialist_stream:
yield chunk # Normal token OR specialist result
Future Enhancements
Phase 2: Pause & Resume
Currently, specialists inject mid-stream. For better quality:
Detect specialist request
Pause LLM generation
Execute specialist
Build new prompt with result
Resume generation from enriched context
Phase 3: Multi-turn Planning
Enable agentic workflows:
User: Analyze these 3 images
LLM: I'll analyze each image systematically.
Image 1: SPECIALIST_REQUEST[vision:{"image_id":1}]
[Result 1...]
Image 2: SPECIALIST_REQUEST[vision:{"image_id":2}]
[Result 2...]
Comparing the three images, I notice...
Phase 4: Function Calling API
If the selected LLM gains native tool calling:
# Auto-generate tool definitions from specialists
tools = [specialist.capability.to_openai_tool() for s in specialists]
response = ollama.chat(
model="qwen2.5-coder:7b",
messages=messages,
tools=tools
)
Testing
Manual Test via UI
Upload an image via OCR
Ask: “Analyze this image in more detail”
Watch logs for specialist detection
See if LLM requests vision specialist
Programmatic Test
# Simulate LLM output with specialist request
test_stream = [
{'token': 'Let me analyze that. '},
{'token': 'SPECIALIST_REQUEST[vision:{"focus":"test"}]'},
{'token': ' Based on the analysis...'}
]
# Should detect request and inject result
results = []
async for chunk in stream_with_specialists(test_stream, context):
results.append(chunk)
# Verify specialist result appears in stream
Best Practices
For System Operators
Monitor logs: Watch for specialist detection/execution
Track usage: Which specialists are requested most?
Tune limits: Adjust max_calls based on patterns
Update prompts: Refine instructions as LLM learns
Troubleshooting
Specialist Not Detected
Check syntax: Must be
SPECIALIST_REQUEST[name:{"json"}]Verify specialist registered:
list_specialists()Check logs: Look for detection attempts
Specialist Fails
Check
should_activate(): Does it accept the context?Verify parameters: Does JSON match expected schema?
Review error logs: What exception occurred?
Max Calls Reached
LLM requesting too many specialists in one turn
Refine system prompt to reduce over-requesting
Increase limit if legitimate use case
Summary
The bidirectional specialist system provides a flexible framework for LLM-initiated tool use:
✅ Syntax-based: Simple, no fine-tuning needed
✅ Extensible: Add specialists, LLM can request them
✅ Safe: Built-in limits and error handling
✅ Future-proof: Can upgrade to native function calling later
This bridges the gap between static context injection and full agentic workflows, enabling Ada to request capabilities as needed!