Specialist RAG Documentation System

Overview

Instead of using static specialist instructions in the system prompt, Ada now uses RAG-based dynamic documentation that retrieves relevant specialist guidance based on the user’s query context.

How It Works

1. Automatic Sync on Startup

When the brain service starts:

[BRAIN] Discovered 2 specialists: 🎧 media, 📄 ocr
[BRAIN] Synced 8 specialist FAQ entries to RAG

The system:

Discovers all registered specialists via the plugin registry
Generates FAQ entries from specialist capabilities
Stores them in Chroma with type="faq" and topic="specialists"
Removes old entries to ensure idempotency

2. Context-Aware Retrieval

During prompt building (PromptAssembler in brain/prompt_builder/ package with caching):

User’s query is embedded
RAG retrieves the top K most relevant specialist FAQs (with caching for frequent queries)
Retrieved docs are injected into the prompt before specialist execution
This provides just-in-time specialist guidance instead of static instructions

3. FAQ Entry Types

The system generates multiple FAQ types:

Overview FAQ:

Q: What specialist capabilities are available?
A: Ada has 2 specialist capabilities integrated: 🎧 media, 📄 ocr.
   These can be invoked mid-conversation using SPECIALIST_REQUEST[name:{params}] syntax...

Per-Specialist Capability FAQ:

Q: What does the ocr specialist do?
A: Extract text from images using Tesseract OCR (Priority: HIGH, Icon: 📄)

Per-Specialist Syntax FAQ:

Q: How do I invoke the ocr specialist?
A: Use the syntax SPECIALIST_REQUEST[ocr:{}] in your response. I will detect this pattern,
   pause generation, execute the specialist, and resume with enriched context.

Usage Pattern FAQs:

When to use specialists
Chaining multiple specialists
How pause/resume works

Configuration

Enable/Disable RAG Docs

SPECIALIST_RAG_DOCS=true  # Use dynamic RAG retrieval (default)
SPECIALIST_RAG_DOCS=false # Use static SPECIALIST_INSTRUCTIONS only

Set in brain/config.py:

SPECIALIST_RAG_DOCS = os.getenv("SPECIALIST_RAG_DOCS", "true").lower() == "true"

Retrieval Count

In the prompt building system, specialist docs retrieve 2 FAQs by default:

specialist_docs = get_relevant_specialist_docs(user_prompt, rag_store, k=2)

Benefits

1. Context-Aware Guidance

User asks “can you analyze this image?” → OCR/vision docs retrieved
User asks “what’s playing?” → Media specialist docs retrieved
Irrelevant specialists don’t clutter the prompt

2. Automatic Updates

Add a new specialist → FAQ entries auto-generated on next startup
Modify specialist capability → Updated in RAG automatically
No manual prompt engineering required

3. Token Efficiency

Static SPECIALIST_INSTRUCTIONS = ~300 tokens always present
Dynamic RAG retrieval = ~100-200 tokens only when relevant
Reduces prompt bloat for non-specialist queries

4. Semantic Matching

User query: “What can you see in this photo?”
RAG retrieves: OCR + vision specialist documentation
LLM learns specialist syntax contextually

Implementation Files

brain/specialists/specialist_docs.py

sync_specialist_docs_to_faq() - Generate FAQ entries from all specialists
get_relevant_specialist_docs() - Retrieve relevant docs for a query

brain/app.py (lifespan)

# Sync specialist documentation to FAQ system
if rag_store is not None:
    doc_count = sync_specialist_docs_to_faq(rag_store)
    print(f"[BRAIN] Synced {doc_count} specialist FAQ entries to RAG")

brain/prompt_builder/ (modular package) brain/_legacy_prompt_builder.py (legacy compatibility) ———————–

# --- Dynamic Specialist Documentation (RAG-based) ---
if SPECIALIST_RAG_DOCS and rag_store is not None:
    specialist_docs = get_relevant_specialist_docs(user_prompt, rag_store, k=2)
    if specialist_docs:
        sections.append(specialist_docs)
        used_context['specialist_docs'] = True

Example Workflow

User Query: “Can you read the text in this image?”

Prompt Building Phase:
- Embed query: “Can you read the text in this image?”
- RAG retrieves from type="faq", topic="specialists":
  - “How do I invoke the ocr specialist?”
  - “What does the ocr specialist do?”
- Format and inject into prompt:
```
📚 Specialist Capabilities Reference:
  • Use the syntax SPECIALIST_REQUEST[ocr:{}]...
  • Extract text from images using Tesseract OCR...
```
LLM Generation:
- Ada sees relevant OCR documentation in context
- Generates: “I can extract the text using OCR. SPECIALIST_REQUEST[ocr:{}]”
Specialist Execution (Pause/Resume):
- Generation pauses
- OCR specialist extracts: “Annual Report 2024…”
- Generation resumes with OCR result injected
Final Response:
- “The image contains: Annual Report 2024…”

Future Enhancements

Phase 1: Embedding-Based Specialist Discovery

Instead of the LLM explicitly requesting specialists, the system could:

Embed user query
Check similarity to specialist capabilities
Auto-suggest specialists in prompt

Phase 2: Example-Based Learning

Store successful specialist invocations as FAQs:

Q: User uploaded a diagram and asked "what's the architecture?"
A: I used SPECIALIST_REQUEST[vision:{"focus":"architecture"}] to analyze...

Phase 3: Failure Case Documentation

Track failed specialist calls and add FAQ warnings:

Q: Can OCR read handwritten text?
A: OCR works best with printed text. Handwritten text may have lower accuracy.

Debugging

Check Synced FAQs

# View all specialist FAQs
docker exec ada-v1-brain-1 python -c "
from rag_store import RagStore
store = RagStore()
result = store.col.query(
    query_texts=['specialists'],
    n_results=10,
    where={'type': 'faq', 'topic': 'specialists'}
)
for doc in result['documents'][0]:
    print(doc)
    print('---')
"

Test Retrieval

Query the debug endpoint (if RAG_DEBUG=true):

curl "http://localhost:7000/v1/debug/prompt?prompt=analyze%20image&faq_k=5"

Check used_context.specialist_docs field to see if docs were injected.

Monitor Logs

docker logs ada-v1-brain-1 | grep "Synced.*FAQ"
# [BRAIN] Synced 8 specialist FAQ entries to RAG

Configuration Reference

Variable	Default	Description
`SPECIALIST_RAG_DOCS`	`true`	Enable dynamic RAG-based specialist documentation
`SPECIALIST_PAUSE_RESUME`	`true`	Enable pause/resume for specialist execution
`SPECIALIST_MAX_TURNS`	`5`	Maximum specialist calls per conversation
`RAG_FAQ_TOP_K`	`2`	Number of FAQ entries to retrieve (includes specialist docs)

Specialist RAG Documentation System

Overview

How It Works

1. Automatic Sync on Startup

2. Context-Aware Retrieval

3. FAQ Entry Types

Configuration

Enable/Disable RAG Docs

Retrieval Count

Benefits

1. Context-Aware Guidance

2. Automatic Updates

3. Token Efficiency

4. Semantic Matching

Implementation Files

brain/specialists/specialist_docs.py

brain/app.py (lifespan)

Example Workflow

User Query: “Can you read the text in this image?”

Future Enhancements

Phase 1: Embedding-Based Specialist Discovery

Phase 2: Example-Based Learning

Phase 3: Failure Case Documentation

Debugging

Check Synced FAQs

Test Retrieval

Monitor Logs

Configuration Reference

Migration Notes

Before (Static Instructions)

After (Dynamic RAG)

Specialist RAG Documentation System

Overview

How It Works

1. Automatic Sync on Startup

2. Context-Aware Retrieval

3. FAQ Entry Types

Configuration

Enable/Disable RAG Docs

Retrieval Count

Benefits

1. Context-Aware Guidance

2. Automatic Updates

3. Token Efficiency

4. Semantic Matching

Implementation Files

brain/specialists/specialist_docs.py

brain/app.py (lifespan)

Example Workflow

User Query: “Can you read the text in this image?”

Future Enhancements

Phase 1: Embedding-Based Specialist Discovery

Phase 2: Example-Based Learning

Phase 3: Failure Case Documentation

Debugging

Check Synced FAQs

Test Retrieval

Monitor Logs

Configuration Reference

Migration Notes

Before (Static Instructions)

After (Dynamic RAG)

Related Documentation