=====================================
Specialist RAG Documentation System
=====================================

Overview
========

Instead of using static specialist instructions in the system prompt, Ada now uses **RAG-based dynamic documentation** that retrieves relevant specialist guidance based on the user's query context.

How It Works
============

1. Automatic Sync on Startup
-----------------------------

When the brain service starts:

.. code-block:: text

   [BRAIN] Discovered 2 specialists: 🎧 media, 📄 ocr
   [BRAIN] Synced 8 specialist FAQ entries to RAG

The system:

- Discovers all registered specialists via the plugin registry
- Generates FAQ entries from specialist capabilities
- Stores them in Chroma with ``type="faq"`` and ``topic="specialists"``
- Removes old entries to ensure idempotency

2. Context-Aware Retrieval
---------------------------

During prompt building (``PromptAssembler`` in ``brain/prompt_builder/`` package with caching):

- User's query is embedded
- RAG retrieves the top K most relevant specialist FAQs (with caching for frequent queries)
- Retrieved docs are injected into the prompt before specialist execution
- This provides **just-in-time** specialist guidance instead of static instructions

3. FAQ Entry Types
------------------

The system generates multiple FAQ types:

**Overview FAQ:**

.. code-block:: text

   Q: What specialist capabilities are available?
   A: Ada has 2 specialist capabilities integrated: 🎧 media, 📄 ocr. 
      These can be invoked mid-conversation using SPECIALIST_REQUEST[name:{params}] syntax...

**Per-Specialist Capability FAQ:**

.. code-block:: text

   Q: What does the ocr specialist do?
   A: Extract text from images using Tesseract OCR (Priority: HIGH, Icon: 📄)

**Per-Specialist Syntax FAQ:**

.. code-block:: text

   Q: How do I invoke the ocr specialist?
   A: Use the syntax SPECIALIST_REQUEST[ocr:{}] in your response. I will detect this pattern, 
      pause generation, execute the specialist, and resume with enriched context.

**Usage Pattern FAQs:**

- When to use specialists
- Chaining multiple specialists
- How pause/resume works

Configuration
=============

Enable/Disable RAG Docs
-----------------------

.. code-block:: bash

   SPECIALIST_RAG_DOCS=true  # Use dynamic RAG retrieval (default)
   SPECIALIST_RAG_DOCS=false # Use static SPECIALIST_INSTRUCTIONS only

Set in ``brain/config.py``:

.. code-block:: python

   SPECIALIST_RAG_DOCS = os.getenv("SPECIALIST_RAG_DOCS", "true").lower() == "true"

Retrieval Count
---------------

In the prompt building system, specialist docs retrieve 2 FAQs by default:

.. code-block:: python

   specialist_docs = get_relevant_specialist_docs(user_prompt, rag_store, k=2)

Benefits
========

1. Context-Aware Guidance
--------------------------

- User asks "can you analyze this image?" → OCR/vision docs retrieved
- User asks "what's playing?" → Media specialist docs retrieved
- Irrelevant specialists don't clutter the prompt

2. Automatic Updates
---------------------

- Add a new specialist → FAQ entries auto-generated on next startup
- Modify specialist capability → Updated in RAG automatically
- No manual prompt engineering required

3. Token Efficiency
-------------------

- Static ``SPECIALIST_INSTRUCTIONS`` = ~300 tokens always present
- Dynamic RAG retrieval = ~100-200 tokens only when relevant
- Reduces prompt bloat for non-specialist queries

4. Semantic Matching
---------------------

- User query: "What can you see in this photo?"
- RAG retrieves: OCR + vision specialist documentation
- LLM learns specialist syntax contextually

Implementation Files
====================

brain/specialists/specialist_docs.py
------------------------------------

- ``sync_specialist_docs_to_faq()`` - Generate FAQ entries from all specialists
- ``get_relevant_specialist_docs()`` - Retrieve relevant docs for a query

brain/app.py (lifespan)
-----------------------

.. code-block:: python

   # Sync specialist documentation to FAQ system
   if rag_store is not None:
       doc_count = sync_specialist_docs_to_faq(rag_store)
       print(f"[BRAIN] Synced {doc_count} specialist FAQ entries to RAG")

brain/prompt_builder/ (modular package)
brain/_legacy_prompt_builder.py (legacy compatibility)
-----------------------

.. code-block:: python

   # --- Dynamic Specialist Documentation (RAG-based) ---
   if SPECIALIST_RAG_DOCS and rag_store is not None:
       specialist_docs = get_relevant_specialist_docs(user_prompt, rag_store, k=2)
       if specialist_docs:
           sections.append(specialist_docs)
           used_context['specialist_docs'] = True

Example Workflow
================

User Query: "Can you read the text in this image?"
---------------------------------------------------

1. **Prompt Building Phase:**

   - Embed query: "Can you read the text in this image?"
   - RAG retrieves from ``type="faq", topic="specialists"``:
     
     - "How do I invoke the ocr specialist?"
     - "What does the ocr specialist do?"
   
   - Format and inject into prompt:
     
     .. code-block:: text
     
        📚 Specialist Capabilities Reference:
          • Use the syntax SPECIALIST_REQUEST[ocr:{}]...
          • Extract text from images using Tesseract OCR...

2. **LLM Generation:**

   - Ada sees relevant OCR documentation in context
   - Generates: "I can extract the text using OCR. SPECIALIST_REQUEST[ocr:{}]"

3. **Specialist Execution (Pause/Resume):**

   - Generation pauses
   - OCR specialist extracts: "Annual Report 2024..."
   - Generation resumes with OCR result injected

4. **Final Response:**

   - "The image contains: Annual Report 2024..."

Future Enhancements
===================

Phase 1: Embedding-Based Specialist Discovery
----------------------------------------------

Instead of the LLM explicitly requesting specialists, the system could:

- Embed user query
- Check similarity to specialist capabilities
- Auto-suggest specialists in prompt

Phase 2: Example-Based Learning
--------------------------------

Store successful specialist invocations as FAQs:

.. code-block:: text

   Q: User uploaded a diagram and asked "what's the architecture?"
   A: I used SPECIALIST_REQUEST[vision:{"focus":"architecture"}] to analyze...

Phase 3: Failure Case Documentation
------------------------------------

Track failed specialist calls and add FAQ warnings:

.. code-block:: text

   Q: Can OCR read handwritten text?
   A: OCR works best with printed text. Handwritten text may have lower accuracy.

Debugging
=========

Check Synced FAQs
-----------------

.. code-block:: bash

   # View all specialist FAQs
   docker exec ada-v1-brain-1 python -c "
   from rag_store import RagStore
   store = RagStore()
   result = store.col.query(
       query_texts=['specialists'],
       n_results=10,
       where={'type': 'faq', 'topic': 'specialists'}
   )
   for doc in result['documents'][0]:
       print(doc)
       print('---')
   "

Test Retrieval
--------------

Query the debug endpoint (if ``RAG_DEBUG=true``):

.. code-block:: bash

   curl "http://localhost:7000/v1/debug/prompt?prompt=analyze%20image&faq_k=5"

Check ``used_context.specialist_docs`` field to see if docs were injected.

Monitor Logs
------------

.. code-block:: bash

   docker logs ada-v1-brain-1 | grep "Synced.*FAQ"
   # [BRAIN] Synced 8 specialist FAQ entries to RAG

Configuration Reference
=======================

.. list-table::
   :header-rows: 1
   :widths: 30 15 55

   * - Variable
     - Default
     - Description
   * - ``SPECIALIST_RAG_DOCS``
     - ``true``
     - Enable dynamic RAG-based specialist documentation
   * - ``SPECIALIST_PAUSE_RESUME``
     - ``true``
     - Enable pause/resume for specialist execution
   * - ``SPECIALIST_MAX_TURNS``
     - ``5``
     - Maximum specialist calls per conversation
   * - ``RAG_FAQ_TOP_K``
     - ``2``
     - Number of FAQ entries to retrieve (includes specialist docs)

Migration Notes
===============

Before (Static Instructions)
-----------------------------

- All specialist syntax in ``SPECIALIST_INSTRUCTIONS`` (300+ tokens)
- Present in every prompt regardless of relevance
- Manual updates required for new specialists

After (Dynamic RAG)
-------------------

- Specialist syntax retrieved on-demand from FAQ system
- Only relevant specialists injected based on query
- Automatic updates when specialists added/changed
- Backward compatible: static instructions still present as fallback

Related Documentation
=====================

- :doc:`specialists` - Plugin protocol and creation guide
- :doc:`bidirectional` - Communication patterns
- API documentation - RAG system overview