================================ Bidirectional Specialist System ================================ Overview ======== The bidirectional specialist system enables **LLM ↔ Specialist** communication during inference. Unlike one-way specialists (OCR, media) that only inject context upfront, bidirectional specialists can be requested by the LLM mid-response. Architecture ============ Flow Diagram ------------ :: User Message → Prompt Assembly (with upfront specialists) ↓ LLM starts generating ↓ Streaming tokens to frontend ↓ [LLM emits SPECIALIST_REQUEST[...]] ↓ Handler detects request syntax ↓ Execute requested specialist ↓ Inject result back into stream ↓ LLM continues with new context Key Components -------------- **1. Bidirectional Handler** (``brain/specialists/bidirectional.py``) - Monitors streaming output for specialist requests - Parses request syntax and parameters - Executes specialists on-demand - Injects results back into token stream - Safety limits (max 5 calls per turn) **2. System Prompt** (``brain/config.py``) - Teaches LLM the specialist request syntax - Documents available specialists - Provides usage examples - Sets expectations (results format, limits) **3. Stream Wrapper** (``brain/app.py``) - Wraps base LLM stream with specialist handler - Passes request context for specialist execution - Seamlessly integrates with existing SSE streaming Request Syntax ============== Primary Syntax -------------- :: SPECIALIST_REQUEST[specialist_name:{"param":"value","another":"param"}] **Example:** :: SPECIALIST_REQUEST[vision:{"focus":"technical_diagrams"}] Alternative Mention Syntax (future) ----------------------------------- :: @vision_specialist(focus=architecture) Usage Examples ============== Example 1: Vision Analysis -------------------------- **User:** "What's in this image?" **LLM Response:** .. code-block:: text Let me analyze the image in detail. SPECIALIST_REQUEST[vision:{"focus":"content_description"}] [SPECIALIST_RESULT: vision] 👁️ VISUAL ANALYSIS from 'diagram.png' The image contains a system architecture diagram showing microservices... [/SPECIALIST_RESULT] Based on the visual analysis, this is a microservices architecture with... Example 2: Enhanced OCR ----------------------- **User:** "Extract text from this complex document" **LLM Response:** .. code-block:: text I'll analyze the document structure first. SPECIALIST_REQUEST[ocr:{"enhance":true,"language":"en"}] [SPECIALIST_RESULT: ocr] 📄 OCR EXTRACTED TEXT from 'document.pdf' [Text content here...] [/SPECIALIST_RESULT] The document appears to be a contract with the following key sections... Current Specialists =================== One-Way (Auto-activated) ------------------------- - **OCR**: Activated when ``ocr_context`` present - **Media**: Activated when ``media`` present Bidirectional (LLM-requested) ------------------------------ - **Web Search**: Real-time information from the web - **Wiki Lookup**: MediaWiki-based encyclopedia queries - **Docs Lookup**: Ada's own documentation self-reference - **OCR** (can also be requested): Enhanced text extraction Safety & Limits =============== Built-in Protections -------------------- 1. **Max Calls Per Turn**: 5 specialist requests maximum 2. **Error Handling**: Graceful failures don't crash the stream 3. **Timeout Protection**: Specialists should complete quickly 4. **Validation**: Only registered specialists can be invoked Error Messages -------------- When specialists fail, the LLM sees: :: [SPECIALIST_ERROR: specialist_name failed - reason] The LLM can acknowledge this and handle gracefully. Implementation Details ====================== Handler Detection Logic ----------------------- .. code-block:: python # Monitors accumulating stream for pattern SPECIALIST_REQUEST_PATTERN = re.compile( r'SPECIALIST_REQUEST\[(\w+):(.*?)\]', re.DOTALL ) async def detect_request(text: str): match = SPECIALIST_REQUEST_PATTERN.search(text) if match: specialist_name = match.group(1) params_json = match.group(2) params = json.loads(params_json) return {'specialist': specialist_name, 'params': params} Stream Integration ------------------ .. code-block:: python # Wrap base stream with specialist handler base_stream = stream_chat_async(prompt, model=MODEL) specialist_stream = stream_with_specialists( base_stream, request_context, max_specialist_calls=5 ) # Frontend receives enriched stream async for chunk in specialist_stream: yield chunk # Normal token OR specialist result Future Enhancements =================== Phase 2: Pause & Resume ----------------------- Currently, specialists inject mid-stream. For better quality: 1. Detect specialist request 2. **Pause** LLM generation 3. Execute specialist 4. Build **new prompt** with result 5. **Resume** generation from enriched context Phase 3: Multi-turn Planning ----------------------------- Enable agentic workflows: .. code-block:: python User: Analyze these 3 images LLM: I'll analyze each image systematically. Image 1: SPECIALIST_REQUEST[vision:{"image_id":1}] [Result 1...] Image 2: SPECIALIST_REQUEST[vision:{"image_id":2}] [Result 2...] Comparing the three images, I notice... Phase 4: Function Calling API ------------------------------ If the selected LLM gains native tool calling: .. code-block:: python # Auto-generate tool definitions from specialists tools = [specialist.capability.to_openai_tool() for s in specialists] response = ollama.chat( model="qwen2.5-coder:7b", messages=messages, tools=tools ) Testing ======= Manual Test via UI ------------------ 1. Upload an image via OCR 2. Ask: "Analyze this image in more detail" 3. Watch logs for specialist detection 4. See if LLM requests vision specialist Programmatic Test ----------------- .. code-block:: python # Simulate LLM output with specialist request test_stream = [ {'token': 'Let me analyze that. '}, {'token': 'SPECIALIST_REQUEST[vision:{"focus":"test"}]'}, {'token': ' Based on the analysis...'} ] # Should detect request and inject result results = [] async for chunk in stream_with_specialists(test_stream, context): results.append(chunk) # Verify specialist result appears in stream Best Practices ============== For Specialist Authors ---------------------- 1. **Fast execution**: Specialists should complete in <5s 2. **Clear results**: Format output for LLM readability 3. **Graceful failures**: Return error results, don't raise 4. **Document parameters**: Clear schemas in capability For System Operators -------------------- 1. **Monitor logs**: Watch for specialist detection/execution 2. **Track usage**: Which specialists are requested most? 3. **Tune limits**: Adjust max_calls based on patterns 4. **Update prompts**: Refine instructions as LLM learns Troubleshooting =============== Specialist Not Detected ----------------------- - Check syntax: Must be ``SPECIALIST_REQUEST[name:{"json"}]`` - Verify specialist registered: ``list_specialists()`` - Check logs: Look for detection attempts Specialist Fails ---------------- - Check ``should_activate()``: Does it accept the context? - Verify parameters: Does JSON match expected schema? - Review error logs: What exception occurred? Max Calls Reached ----------------- - LLM requesting too many specialists in one turn - Refine system prompt to reduce over-requesting - Increase limit if legitimate use case Summary ======= The bidirectional specialist system provides a flexible framework for LLM-initiated tool use: ✅ **Syntax-based**: Simple, no fine-tuning needed ✅ **Extensible**: Add specialists, LLM can request them ✅ **Safe**: Built-in limits and error handling ✅ **Future-proof**: Can upgrade to native function calling later This bridges the gap between static context injection and full agentic workflows, enabling Ada to request capabilities as needed!