Bidirectional Specialist System

Overview

The bidirectional specialist system enables LLM ↔ Specialist communication during inference. Unlike one-way specialists (OCR, media) that only inject context upfront, bidirectional specialists can be requested by the LLM mid-response.

Architecture

Flow Diagram

User Message → Prompt Assembly (with upfront specialists)
                      ↓
              LLM starts generating
                      ↓
         Streaming tokens to frontend
                      ↓
    [LLM emits SPECIALIST_REQUEST[...]]
                      ↓
       Handler detects request syntax
                      ↓
          Execute requested specialist
                      ↓
       Inject result back into stream
                      ↓
      LLM continues with new context

Key Components

1. Bidirectional Handler (brain/specialists/bidirectional.py)

Monitors streaming output for specialist requests
Parses request syntax and parameters
Executes specialists on-demand
Injects results back into token stream
Safety limits (max 5 calls per turn)

2. System Prompt (brain/config.py)

Teaches LLM the specialist request syntax
Documents available specialists
Provides usage examples
Sets expectations (results format, limits)

3. Stream Wrapper (brain/app.py)

Wraps base LLM stream with specialist handler
Passes request context for specialist execution
Seamlessly integrates with existing SSE streaming

Request Syntax

Primary Syntax

SPECIALIST_REQUEST[specialist_name:{"param":"value","another":"param"}]

Example:

SPECIALIST_REQUEST[vision:{"focus":"technical_diagrams"}]

Alternative Mention Syntax (future)

@vision_specialist(focus=architecture)

Usage Examples

Example 1: Vision Analysis

User: “What’s in this image?”

LLM Response:

Let me analyze the image in detail.

SPECIALIST_REQUEST[vision:{"focus":"content_description"}]

[SPECIALIST_RESULT: vision]
👁️ VISUAL ANALYSIS from 'diagram.png'
The image contains a system architecture diagram showing microservices...
[/SPECIALIST_RESULT]

Based on the visual analysis, this is a microservices architecture with...

Example 2: Enhanced OCR

User: “Extract text from this complex document”

LLM Response:

I'll analyze the document structure first.

SPECIALIST_REQUEST[ocr:{"enhance":true,"language":"en"}]

[SPECIALIST_RESULT: ocr]
📄 OCR EXTRACTED TEXT from 'document.pdf'
[Text content here...]
[/SPECIALIST_RESULT]

The document appears to be a contract with the following key sections...

Current Specialists

One-Way (Auto-activated)

OCR: Activated when ocr_context present
Media: Activated when media present

Bidirectional (LLM-requested)

Web Search: Real-time information from the web
Wiki Lookup: MediaWiki-based encyclopedia queries
Docs Lookup: Ada’s own documentation self-reference
OCR (can also be requested): Enhanced text extraction

Safety & Limits

Built-in Protections

Max Calls Per Turn: 5 specialist requests maximum
Error Handling: Graceful failures don’t crash the stream
Timeout Protection: Specialists should complete quickly
Validation: Only registered specialists can be invoked

Error Messages

When specialists fail, the LLM sees:

[SPECIALIST_ERROR: specialist_name failed - reason]

The LLM can acknowledge this and handle gracefully.

Implementation Details

Handler Detection Logic

# Monitors accumulating stream for pattern
SPECIALIST_REQUEST_PATTERN = re.compile(
    r'SPECIALIST_REQUEST\[(\w+):(.*?)\]',
    re.DOTALL
)

async def detect_request(text: str):
    match = SPECIALIST_REQUEST_PATTERN.search(text)
    if match:
        specialist_name = match.group(1)
        params_json = match.group(2)
        params = json.loads(params_json)
        return {'specialist': specialist_name, 'params': params}

Stream Integration

# Wrap base stream with specialist handler
base_stream = stream_chat_async(prompt, model=MODEL)
specialist_stream = stream_with_specialists(
    base_stream,
    request_context,
    max_specialist_calls=5
)

# Frontend receives enriched stream
async for chunk in specialist_stream:
    yield chunk  # Normal token OR specialist result

Future Enhancements

Phase 2: Pause & Resume

Currently, specialists inject mid-stream. For better quality:

Detect specialist request
Pause LLM generation
Execute specialist
Build new prompt with result
Resume generation from enriched context

Phase 3: Multi-turn Planning

Enable agentic workflows:

User: Analyze these 3 images
LLM: I'll analyze each image systematically.

     Image 1: SPECIALIST_REQUEST[vision:{"image_id":1}]
     [Result 1...]

     Image 2: SPECIALIST_REQUEST[vision:{"image_id":2}]
     [Result 2...]

     Comparing the three images, I notice...

Phase 4: Function Calling API

If the selected LLM gains native tool calling:

# Auto-generate tool definitions from specialists
tools = [specialist.capability.to_openai_tool() for s in specialists]

response = ollama.chat(
      model="qwen2.5-coder:7b",
    messages=messages,
    tools=tools
)

Testing

Manual Test via UI

Upload an image via OCR
Ask: “Analyze this image in more detail”
Watch logs for specialist detection
See if LLM requests vision specialist

Programmatic Test

# Simulate LLM output with specialist request
test_stream = [
    {'token': 'Let me analyze that. '},
    {'token': 'SPECIALIST_REQUEST[vision:{"focus":"test"}]'},
    {'token': ' Based on the analysis...'}
]

# Should detect request and inject result
results = []
async for chunk in stream_with_specialists(test_stream, context):
    results.append(chunk)

# Verify specialist result appears in stream

Best Practices

For Specialist Authors

Fast execution: Specialists should complete in <5s
Clear results: Format output for LLM readability
Graceful failures: Return error results, don’t raise
Document parameters: Clear schemas in capability

For System Operators

Monitor logs: Watch for specialist detection/execution
Track usage: Which specialists are requested most?
Tune limits: Adjust max_calls based on patterns
Update prompts: Refine instructions as LLM learns

Troubleshooting

Specialist Not Detected

Check syntax: Must be SPECIALIST_REQUEST[name:{"json"}]
Verify specialist registered: list_specialists()
Check logs: Look for detection attempts

Specialist Fails

Check should_activate(): Does it accept the context?
Verify parameters: Does JSON match expected schema?
Review error logs: What exception occurred?

Max Calls Reached

LLM requesting too many specialists in one turn
Refine system prompt to reduce over-requesting
Increase limit if legitimate use case

Summary

The bidirectional specialist system provides a flexible framework for LLM-initiated tool use:

✅ Syntax-based: Simple, no fine-tuning needed

✅ Extensible: Add specialists, LLM can request them

✅ Safe: Built-in limits and error handling

✅ Future-proof: Can upgrade to native function calling later

This bridges the gap between static context injection and full agentic workflows, enabling Ada to request capabilities as needed!