Bidirectional Specialist System

Overview

The bidirectional specialist system enables LLM ↔ Specialist communication during inference. Unlike one-way specialists (OCR, media) that only inject context upfront, bidirectional specialists can be requested by the LLM mid-response.

Architecture

Flow Diagram

User Message → Prompt Assembly (with upfront specialists)
                      ↓
              LLM starts generating
                      ↓
         Streaming tokens to frontend
                      ↓
    [LLM emits SPECIALIST_REQUEST[...]]
                      ↓
       Handler detects request syntax
                      ↓
          Execute requested specialist
                      ↓
       Inject result back into stream
                      ↓
      LLM continues with new context

Key Components

1. Bidirectional Handler (brain/specialists/bidirectional.py)

  • Monitors streaming output for specialist requests

  • Parses request syntax and parameters

  • Executes specialists on-demand

  • Injects results back into token stream

  • Safety limits (max 5 calls per turn)

2. System Prompt (brain/config.py)

  • Teaches LLM the specialist request syntax

  • Documents available specialists

  • Provides usage examples

  • Sets expectations (results format, limits)

3. Stream Wrapper (brain/app.py)

  • Wraps base LLM stream with specialist handler

  • Passes request context for specialist execution

  • Seamlessly integrates with existing SSE streaming

Request Syntax

Primary Syntax

SPECIALIST_REQUEST[specialist_name:{"param":"value","another":"param"}]

Example:

SPECIALIST_REQUEST[vision:{"focus":"technical_diagrams"}]

Alternative Mention Syntax (future)

@vision_specialist(focus=architecture)

Usage Examples

Example 1: Vision Analysis

User: “What’s in this image?”

LLM Response:

Let me analyze the image in detail.

SPECIALIST_REQUEST[vision:{"focus":"content_description"}]

[SPECIALIST_RESULT: vision]
👁️ VISUAL ANALYSIS from 'diagram.png'
The image contains a system architecture diagram showing microservices...
[/SPECIALIST_RESULT]

Based on the visual analysis, this is a microservices architecture with...

Example 2: Enhanced OCR

User: “Extract text from this complex document”

LLM Response:

I'll analyze the document structure first.

SPECIALIST_REQUEST[ocr:{"enhance":true,"language":"en"}]

[SPECIALIST_RESULT: ocr]
📄 OCR EXTRACTED TEXT from 'document.pdf'
[Text content here...]
[/SPECIALIST_RESULT]

The document appears to be a contract with the following key sections...

Current Specialists

One-Way (Auto-activated)

  • OCR: Activated when ocr_context present

  • Media: Activated when media present

Bidirectional (LLM-requested)

  • Web Search: Real-time information from the web

  • Wiki Lookup: MediaWiki-based encyclopedia queries

  • Docs Lookup: Ada’s own documentation self-reference

  • OCR (can also be requested): Enhanced text extraction

Safety & Limits

Built-in Protections

  1. Max Calls Per Turn: 5 specialist requests maximum

  2. Error Handling: Graceful failures don’t crash the stream

  3. Timeout Protection: Specialists should complete quickly

  4. Validation: Only registered specialists can be invoked

Error Messages

When specialists fail, the LLM sees:

[SPECIALIST_ERROR: specialist_name failed - reason]

The LLM can acknowledge this and handle gracefully.

Implementation Details

Handler Detection Logic

# Monitors accumulating stream for pattern
SPECIALIST_REQUEST_PATTERN = re.compile(
    r'SPECIALIST_REQUEST\[(\w+):(.*?)\]',
    re.DOTALL
)

async def detect_request(text: str):
    match = SPECIALIST_REQUEST_PATTERN.search(text)
    if match:
        specialist_name = match.group(1)
        params_json = match.group(2)
        params = json.loads(params_json)
        return {'specialist': specialist_name, 'params': params}

Stream Integration

# Wrap base stream with specialist handler
base_stream = stream_chat_async(prompt, model=MODEL)
specialist_stream = stream_with_specialists(
    base_stream,
    request_context,
    max_specialist_calls=5
)

# Frontend receives enriched stream
async for chunk in specialist_stream:
    yield chunk  # Normal token OR specialist result

Future Enhancements

Phase 2: Pause & Resume

Currently, specialists inject mid-stream. For better quality:

  1. Detect specialist request

  2. Pause LLM generation

  3. Execute specialist

  4. Build new prompt with result

  5. Resume generation from enriched context

Phase 3: Multi-turn Planning

Enable agentic workflows:

User: Analyze these 3 images
LLM: I'll analyze each image systematically.

     Image 1: SPECIALIST_REQUEST[vision:{"image_id":1}]
     [Result 1...]

     Image 2: SPECIALIST_REQUEST[vision:{"image_id":2}]
     [Result 2...]

     Comparing the three images, I notice...

Phase 4: Function Calling API

If the selected LLM gains native tool calling:

# Auto-generate tool definitions from specialists
tools = [specialist.capability.to_openai_tool() for s in specialists]

response = ollama.chat(
      model="qwen2.5-coder:7b",
    messages=messages,
    tools=tools
)

Testing

Manual Test via UI

  1. Upload an image via OCR

  2. Ask: “Analyze this image in more detail”

  3. Watch logs for specialist detection

  4. See if LLM requests vision specialist

Programmatic Test

# Simulate LLM output with specialist request
test_stream = [
    {'token': 'Let me analyze that. '},
    {'token': 'SPECIALIST_REQUEST[vision:{"focus":"test"}]'},
    {'token': ' Based on the analysis...'}
]

# Should detect request and inject result
results = []
async for chunk in stream_with_specialists(test_stream, context):
    results.append(chunk)

# Verify specialist result appears in stream

Best Practices

For Specialist Authors

  1. Fast execution: Specialists should complete in <5s

  2. Clear results: Format output for LLM readability

  3. Graceful failures: Return error results, don’t raise

  4. Document parameters: Clear schemas in capability

For System Operators

  1. Monitor logs: Watch for specialist detection/execution

  2. Track usage: Which specialists are requested most?

  3. Tune limits: Adjust max_calls based on patterns

  4. Update prompts: Refine instructions as LLM learns

Troubleshooting

Specialist Not Detected

  • Check syntax: Must be SPECIALIST_REQUEST[name:{"json"}]

  • Verify specialist registered: list_specialists()

  • Check logs: Look for detection attempts

Specialist Fails

  • Check should_activate(): Does it accept the context?

  • Verify parameters: Does JSON match expected schema?

  • Review error logs: What exception occurred?

Max Calls Reached

  • LLM requesting too many specialists in one turn

  • Refine system prompt to reduce over-requesting

  • Increase limit if legitimate use case

Summary

The bidirectional specialist system provides a flexible framework for LLM-initiated tool use:

Syntax-based: Simple, no fine-tuning needed

Extensible: Add specialists, LLM can request them

Safe: Built-in limits and error handling

Future-proof: Can upgrade to native function calling later

This bridges the gap between static context injection and full agentic workflows, enabling Ada to request capabilities as needed!