Data Model Reference

Ada stores all documents in a Chroma vector database with structured metadata schemas. This page documents the complete data model, including schemas, metadata fields, and usage patterns.

Overview

Document Structure

Every document in Ada’s vector database has four components:

  1. ID: Unique identifier (UUID format: doc-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)

  2. Text: The actual content of the document (plain text)

  3. Embeddings: 768-dimensional vector from nomic-embed-text model

  4. Metadata: Structured fields describing the document (see schemas below)

Collection Organization

Ada uses a single Chroma collection named conversations that contains all document types. Documents are differentiated by their type metadata field.

Available document types:

  • persona: Identity and behavioral guidelines

  • faq: Knowledge base entries and specialist documentation

  • memory: Long-term facts and context

  • turn: Conversation history (user/assistant pairs)

  • summary: Conversation summaries

Schema Introspection

Live Schema API

Ada provides a /v1/schema endpoint that returns JSON Schema definitions for all document types. This makes the data model fully introspectable at runtime.

Get all schemas:

curl http://localhost:5000/api/schema

Get specific schema:

curl http://localhost:5000/api/schema?doc_type=memory

Response structure:

{
  "document_type": "memory",
  "schema": { ... JSON Schema ... },
  "fields": ["type", "timestamp", "source", "scope", ...],
  "required_fields": ["type", "timestamp", "importance"]
}

Python Schema Access

Import Pydantic models directly for type-safe document creation:

from brain.schemas import (
    PersonaMetadata,
    FAQMetadata,
    MemoryMetadata,
    TurnMetadata,
    SummaryMetadata,
    validate_metadata,
    get_all_schemas,
)

# Validate metadata
meta = {"type": "memory", "timestamp": "2025-12-16T06:00:00+00:00", "importance": 4}
validated = validate_metadata("memory", meta)

# Get JSON Schema
schemas = get_all_schemas()
memory_schema = schemas["memory"]

Document Schemas

Base Metadata (All Documents)

All documents share these common fields:

type:
Type:

string (enum)

Required:

Yes

Values:

persona, faq, memory, turn, summary

Description:

Document type identifier

timestamp:
Type:

string (ISO 8601 UTC)

Required:

Yes

Example:

2025-12-16T06:00:00+00:00

Description:

When document was created or last updated

source:
Type:

string

Required:

Yes

Default:

system

Values:

chat, kb, system, import

Description:

Origin of the document

scope:
Type:

string

Required:

Yes

Default:

global

Format:

global or entity:<name>

Description:

Access scope for document retrieval

Example:

entity:project-alpha for project-specific documents

Persona Documents

Purpose: Define Ada’s identity, tone, and behavioral guidelines.

When used: Loaded from persona.md on startup, retrieved when identity/tone questions arise.

Metadata fields:

type:

Fixed value: persona

version:
Type:

string

Required:

Yes

Description:

Version timestamp for persona updates

Example:

2025-12-14T20:12:53.986582Z

topic:
Type:

string (optional)

Description:

Topic or section of persona if chunked

Examples:

tone, safety, reasoning

Example document:

{
  "id": "doc-abc123...",
  "document": "You are Ada, a helpful assistant...",
  "metadata": {
    "type": "persona",
    "timestamp": "2025-12-16T06:00:00+00:00",
    "source": "kb",
    "scope": "global",
    "version": "2025-12-14T20:12:53.986582Z",
    "topic": "tone"
  }
}

Query example:

# Retrieve persona context for identity questions
results = rag_store.col.query(
    query_texts=["What is my tone?"],
    where={"type": "persona"},
    n_results=2
)

FAQ Documents

Purpose: Knowledge base entries, specialist documentation, general reference.

When used: Retrieved via semantic similarity for relevant context.

Metadata fields:

type:

Fixed value: faq

topic:
Type:

string (optional)

Description:

Category for organization

Examples:

specialists, api, configuration, troubleshooting

specialist_name:
Type:

string (optional)

Field name:

_specialist_name (with underscore)

Description:

Name of specialist this doc describes

Examples:

web_search, ocr, vision

version:
Type:

string (optional)

Field name:

_version (with underscore)

Default:

auto

Description:

Version of the FAQ entry

Example documents:

{
  "id": "doc-faq-001",
  "document": "Q: How do I use the web search specialist?\nA: Use SPECIALIST_REQUEST[web_search:{\"query\":\"...\"}]",
  "metadata": {
    "type": "faq",
    "timestamp": "2025-12-16T06:00:00+00:00",
    "source": "system",
    "scope": "global",
    "topic": "specialists",
    "_specialist_name": "web_search",
    "_version": "auto"
  }
}
{
  "id": "doc-faq-002",
  "document": "Q: How do I configure RAG?\nA: Set RAG_ENABLED=true in .env...",
  "metadata": {
    "type": "faq",
    "timestamp": "2025-12-16T06:00:00+00:00",
    "source": "kb",
    "scope": "global",
    "topic": "configuration"
  }
}

Query example:

# Retrieve FAQ entries about specialists
results = rag_store.col.query(
    query_texts=["How do I invoke a specialist?"],
    where={"type": "faq", "topic": "specialists"},
    n_results=3
)

Memory Documents

Purpose: Long-term facts and context that persist across conversations.

When used: Retrieved via semantic similarity, weighted by importance and recency.

Metadata fields:

type:

Fixed value: memory

importance:
Type:

integer

Required:

Yes

Range:

1-5 (1=low, 5=critical)

Description:

Importance level affecting retrieval ranking

Default:

3

tags:
Type:

array of strings (optional)

Description:

Tags for categorization and filtering

Examples:

["python", "preferences"], ["project-alpha", "deadline"]

entity:
Type:

string (optional)

Description:

Entity this memory pertains to (extracted from scope)

Examples:

project-alpha, team-beta

conversation_id:
Type:

string (optional)

Description:

Original conversation where memory was created

Format:

conv-<uuid>

Example documents:

{
  "id": "doc-mem-001",
  "document": "luna prefers Python over JavaScript for backend development",
  "metadata": {
    "type": "memory",
    "timestamp": "2025-12-16T06:00:00+00:00",
    "source": "chat",
    "scope": "global",
    "importance": 4,
    "tags": ["preferences", "programming"]
  }
}
{
  "id": "doc-mem-002",
  "document": "project-alpha deadline is December 20th, 2025",
  "metadata": {
    "type": "memory",
    "timestamp": "2025-12-16T06:00:00+00:00",
    "source": "chat",
    "scope": "entity:project-alpha",
    "entity": "project-alpha",
    "importance": 5,
    "tags": ["deadline", "critical"],
    "conversation_id": "conv-abc123"
  }
}

Query example:

# Retrieve memories with entity scoping
results = rag_store.retrieve_memories(
    query="project-alpha status",
    k=5,
    entity="project-alpha"  # Filter to entity-scoped memories
)

Turn Documents

Purpose: Conversation history (user messages and assistant responses).

When used: Retrieved for recent context in ongoing conversations.

Metadata fields:

type:

Fixed value: turn

conversation_id:
Type:

string

Required:

Yes

Format:

conv-<uuid>

Description:

Conversation this turn belongs to

role:
Type:

string

Required:

Yes

Values:

user or assistant

Description:

Speaker role

turn_index:
Type:

integer (optional)

Description:

Sequential turn number within conversation

Examples:

1, 2, 3

Example documents:

{
  "id": "doc-turn-001",
  "document": "How do I configure the web search specialist?",
  "metadata": {
    "type": "turn",
    "timestamp": "2025-12-16T06:00:00+00:00",
    "source": "chat",
    "scope": "global",
    "conversation_id": "conv-abc123",
    "role": "user",
    "turn_index": 1
  }
}
{
  "id": "doc-turn-002",
  "document": "To configure web search, set SEARXNG_URL in your .env file...",
  "metadata": {
    "type": "turn",
    "timestamp": "2025-12-16T06:00:01+00:00",
    "source": "chat",
    "scope": "global",
    "conversation_id": "conv-abc123",
    "role": "assistant",
    "turn_index": 1
  }
}

Query example:

# Retrieve recent turns from specific conversation
results = rag_store.col.query(
    query_texts=["conversation context"],
    where={"type": "turn", "conversation_id": "conv-abc123"},
    n_results=10
)

Summary Documents

Purpose: Compressed summaries of conversation turns for efficient context.

When used: Generated periodically (every N turns) to maintain context efficiency.

Metadata fields:

type:

Fixed value: summary

conversation_id:
Type:

string

Required:

Yes

Format:

conv-<uuid>

Description:

Conversation this summary belongs to

turn_range:
Type:

string (optional)

Description:

Range of turns covered by this summary

Format:

<start>-<end>

Examples:

1-8, 9-16

summary_index:
Type:

integer (optional)

Description:

Sequential summary number within conversation

Examples:

1, 2, 3

Example document:

{
  "id": "doc-sum-001",
  "document": "Discussed web search configuration. User wants to set up SearxNG...",
  "metadata": {
    "type": "summary",
    "timestamp": "2025-12-16T06:08:00+00:00",
    "source": "system",
    "scope": "global",
    "conversation_id": "conv-abc123",
    "turn_range": "1-8",
    "summary_index": 1
  }
}

Query example:

# Retrieve conversation summaries
results = rag_store.col.query(
    query_texts=["web search configuration"],
    where={"type": "summary"},
    n_results=3
)

Retrieval Strategies

Semantic Similarity

All documents are retrieved via semantic similarity using the nomic-embed-text embedding model (768 dimensions).

# Basic semantic query
results = rag_store.col.query(
    query_texts=["How do I use specialists?"],
    n_results=5
)

Metadata Filtering

Chroma supports filtering by metadata fields using where clauses:

# Filter by type
where = {"type": "faq"}

# Filter by multiple fields
where = {"type": "memory", "importance": {"$gte": 4}}

# Filter by scope
where = {"scope": "entity:project-alpha"}

# Complex filters
where = {
    "$and": [
        {"type": "memory"},
        {"importance": {"$gte": 4}},
        {"tags": {"$contains": "critical"}}
    ]
}

Importance-Weighted Retrieval

For memory documents, retrieval can be weighted by importance and recency:

# Memory retrieval with importance weighting
memories = rag_store.retrieve_memories(
    query="project status",
    k=5,
    entity="project-alpha"  # Optional entity scoping
)

# Scoring formula:
# score = (similarity * 0.5) + (importance * 0.5) + (recency_bonus)

Recency-Based Ranking

Turn and summary documents use timestamp-based ranking for temporal awareness:

# Retrieve recent turns (automatically sorted by timestamp)
recent_turns = rag_store.col.query(
    query_texts=["context"],
    where={
        "type": "turn",
        "conversation_id": conversation_id
    },
    n_results=RAG_TURN_TOP_K
)

Usage Examples

Creating Documents

Persona:

from brain.rag_store import RagStore

rag_store = RagStore()
rag_store.upsert_doc(
    text="You are Ada, a helpful assistant for luna.",
    type="persona",
    scope="global",
    timestamp="2025-12-16T06:00:00+00:00",
    source="kb",
    version="2025-12-16T06:00:00+00:00"
)

FAQ:

rag_store.upsert_doc(
    text="Q: How do I use web search?\nA: Use SPECIALIST_REQUEST[web_search:...]",
    type="faq",
    scope="global",
    timestamp="2025-12-16T06:00:00+00:00",
    source="system",
    topic="specialists",
    _specialist_name="web_search"
)

Memory:

rag_store.upsert_memory(
    text="luna prefers Python over JavaScript",
    scope="global",
    importance=4,
    tags=["preferences", "programming"],
    timestamp="2025-12-16T06:00:00+00:00",
    source="chat"
)

Turn:

conversation_id = "conv-abc123"
rag_store.upsert_turn(
    conversation_id=conversation_id,
    user_text="How do I configure web search?",
    assistant_text="Set SEARXNG_URL in .env...",
    user_ts="2025-12-16T06:00:00+00:00",
    assistant_ts="2025-12-16T06:00:01+00:00",
    source="chat"
)

Querying Documents

Get persona context:

persona_docs = rag_store.col.query(
    query_texts=["What is my tone?"],
    where={"type": "persona"},
    n_results=2
)

Search FAQs:

faq_results = rag_store.col.query(
    query_texts=["specialist documentation"],
    where={"type": "faq", "topic": "specialists"},
    n_results=3
)

Retrieve memories:

memories = rag_store.retrieve_memories(
    query="user preferences",
    k=5,
    entity=None  # Global scope
)

Get conversation history:

turns = rag_store.col.query(
    query_texts=["recent discussion"],
    where={"type": "turn", "conversation_id": "conv-abc123"},
    n_results=10
)

Validating Metadata

Use Pydantic schemas for validation:

from brain.schemas import validate_metadata

# Valid metadata
meta = {
    "type": "memory",
    "timestamp": "2025-12-16T06:00:00+00:00",
    "source": "chat",
    "scope": "global",
    "importance": 4,
    "tags": ["test"]
}
validated = validate_metadata("memory", meta)

# Invalid metadata (raises ValidationError)
bad_meta = {
    "type": "memory",
    "importance": 10  # Out of range (1-5)
}
# Raises: ValidationError: importance must be between 1 and 5

Configuration

RAG-related configuration options (see Configuration Reference for full reference):

RAG_ENABLED:

Master toggle for RAG system (default: true)

RAG_ENABLE_PERSONA:

Enable persona document retrieval (default: true)

RAG_ENABLE_FAQ:

Enable FAQ document retrieval (default: true)

RAG_ENABLE_MEMORY:

Enable memory document retrieval (default: true)

RAG_ENABLE_TURN:

Enable turn document retrieval (default: true)

RAG_ENABLE_SUMMARY:

Enable summary document retrieval (default: true)

RAG_TURN_TOP_K:

Number of turns to retrieve (default: 4, range: 1-20)

RAG_FAQ_TOP_K:

Number of FAQs to retrieve (default: 2, range: 1-10)

RAG_MEMORY_TOP_K:

Number of memories to retrieve (default: 3, range: 1-20)

RAG_SUMMARY_TOP_K:

Number of summaries to retrieve (default: 2, range: 1-10)

See Configuration Reference for complete details on all RAG settings.

See Also