Data Model Reference
Ada stores all documents in a Chroma vector database with structured metadata schemas. This page documents the complete data model, including schemas, metadata fields, and usage patterns.
Overview
Document Structure
Every document in Ada’s vector database has four components:
ID: Unique identifier (UUID format:
doc-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)Text: The actual content of the document (plain text)
Embeddings: 768-dimensional vector from
nomic-embed-textmodelMetadata: Structured fields describing the document (see schemas below)
Collection Organization
Ada uses a single Chroma collection named conversations that contains all document types. Documents are differentiated by their type metadata field.
Available document types:
persona: Identity and behavioral guidelinesfaq: Knowledge base entries and specialist documentationmemory: Long-term facts and contextturn: Conversation history (user/assistant pairs)summary: Conversation summaries
Schema Introspection
Live Schema API
Ada provides a /v1/schema endpoint that returns JSON Schema definitions for all document types. This makes the data model fully introspectable at runtime.
Get all schemas:
curl http://localhost:5000/api/schema
Get specific schema:
curl http://localhost:5000/api/schema?doc_type=memory
Response structure:
{
"document_type": "memory",
"schema": { ... JSON Schema ... },
"fields": ["type", "timestamp", "source", "scope", ...],
"required_fields": ["type", "timestamp", "importance"]
}
Python Schema Access
Import Pydantic models directly for type-safe document creation:
from brain.schemas import (
PersonaMetadata,
FAQMetadata,
MemoryMetadata,
TurnMetadata,
SummaryMetadata,
validate_metadata,
get_all_schemas,
)
# Validate metadata
meta = {"type": "memory", "timestamp": "2025-12-16T06:00:00+00:00", "importance": 4}
validated = validate_metadata("memory", meta)
# Get JSON Schema
schemas = get_all_schemas()
memory_schema = schemas["memory"]
Document Schemas
Base Metadata (All Documents)
All documents share these common fields:
- type:
- Type:
string(enum)- Required:
Yes
- Values:
persona,faq,memory,turn,summary- Description:
Document type identifier
- timestamp:
- Type:
string(ISO 8601 UTC)- Required:
Yes
- Example:
2025-12-16T06:00:00+00:00- Description:
When document was created or last updated
- source:
- Type:
string- Required:
Yes
- Default:
system- Values:
chat,kb,system,import- Description:
Origin of the document
- scope:
- Type:
string- Required:
Yes
- Default:
global- Format:
globalorentity:<name>- Description:
Access scope for document retrieval
- Example:
entity:project-alphafor project-specific documents
Persona Documents
Purpose: Define Ada’s identity, tone, and behavioral guidelines.
When used: Loaded from persona.md on startup, retrieved when identity/tone questions arise.
Metadata fields:
- type:
Fixed value:
persona- version:
- Type:
string- Required:
Yes
- Description:
Version timestamp for persona updates
- Example:
2025-12-14T20:12:53.986582Z
- topic:
- Type:
string(optional)- Description:
Topic or section of persona if chunked
- Examples:
tone,safety,reasoning
Example document:
{
"id": "doc-abc123...",
"document": "You are Ada, a helpful assistant...",
"metadata": {
"type": "persona",
"timestamp": "2025-12-16T06:00:00+00:00",
"source": "kb",
"scope": "global",
"version": "2025-12-14T20:12:53.986582Z",
"topic": "tone"
}
}
Query example:
# Retrieve persona context for identity questions
results = rag_store.col.query(
query_texts=["What is my tone?"],
where={"type": "persona"},
n_results=2
)
FAQ Documents
Purpose: Knowledge base entries, specialist documentation, general reference.
When used: Retrieved via semantic similarity for relevant context.
Metadata fields:
- type:
Fixed value:
faq- topic:
- Type:
string(optional)- Description:
Category for organization
- Examples:
specialists,api,configuration,troubleshooting
- specialist_name:
- Type:
string(optional)- Field name:
_specialist_name(with underscore)- Description:
Name of specialist this doc describes
- Examples:
web_search,ocr,vision
- version:
- Type:
string(optional)- Field name:
_version(with underscore)- Default:
auto- Description:
Version of the FAQ entry
Example documents:
{
"id": "doc-faq-001",
"document": "Q: How do I use the web search specialist?\nA: Use SPECIALIST_REQUEST[web_search:{\"query\":\"...\"}]",
"metadata": {
"type": "faq",
"timestamp": "2025-12-16T06:00:00+00:00",
"source": "system",
"scope": "global",
"topic": "specialists",
"_specialist_name": "web_search",
"_version": "auto"
}
}
{
"id": "doc-faq-002",
"document": "Q: How do I configure RAG?\nA: Set RAG_ENABLED=true in .env...",
"metadata": {
"type": "faq",
"timestamp": "2025-12-16T06:00:00+00:00",
"source": "kb",
"scope": "global",
"topic": "configuration"
}
}
Query example:
# Retrieve FAQ entries about specialists
results = rag_store.col.query(
query_texts=["How do I invoke a specialist?"],
where={"type": "faq", "topic": "specialists"},
n_results=3
)
Memory Documents
Purpose: Long-term facts and context that persist across conversations.
When used: Retrieved via semantic similarity, weighted by importance and recency.
Metadata fields:
- type:
Fixed value:
memory- importance:
- Type:
integer- Required:
Yes
- Range:
1-5 (1=low, 5=critical)
- Description:
Importance level affecting retrieval ranking
- Default:
3
- tags:
- Type:
array of strings(optional)- Description:
Tags for categorization and filtering
- Examples:
["python", "preferences"],["project-alpha", "deadline"]
- entity:
- Type:
string(optional)- Description:
Entity this memory pertains to (extracted from scope)
- Examples:
project-alpha,team-beta
- conversation_id:
- Type:
string(optional)- Description:
Original conversation where memory was created
- Format:
conv-<uuid>
Example documents:
{
"id": "doc-mem-001",
"document": "luna prefers Python over JavaScript for backend development",
"metadata": {
"type": "memory",
"timestamp": "2025-12-16T06:00:00+00:00",
"source": "chat",
"scope": "global",
"importance": 4,
"tags": ["preferences", "programming"]
}
}
{
"id": "doc-mem-002",
"document": "project-alpha deadline is December 20th, 2025",
"metadata": {
"type": "memory",
"timestamp": "2025-12-16T06:00:00+00:00",
"source": "chat",
"scope": "entity:project-alpha",
"entity": "project-alpha",
"importance": 5,
"tags": ["deadline", "critical"],
"conversation_id": "conv-abc123"
}
}
Query example:
# Retrieve memories with entity scoping
results = rag_store.retrieve_memories(
query="project-alpha status",
k=5,
entity="project-alpha" # Filter to entity-scoped memories
)
Turn Documents
Purpose: Conversation history (user messages and assistant responses).
When used: Retrieved for recent context in ongoing conversations.
Metadata fields:
- type:
Fixed value:
turn- conversation_id:
- Type:
string- Required:
Yes
- Format:
conv-<uuid>- Description:
Conversation this turn belongs to
- role:
- Type:
string- Required:
Yes
- Values:
userorassistant- Description:
Speaker role
- turn_index:
- Type:
integer(optional)- Description:
Sequential turn number within conversation
- Examples:
1,2,3
Example documents:
{
"id": "doc-turn-001",
"document": "How do I configure the web search specialist?",
"metadata": {
"type": "turn",
"timestamp": "2025-12-16T06:00:00+00:00",
"source": "chat",
"scope": "global",
"conversation_id": "conv-abc123",
"role": "user",
"turn_index": 1
}
}
{
"id": "doc-turn-002",
"document": "To configure web search, set SEARXNG_URL in your .env file...",
"metadata": {
"type": "turn",
"timestamp": "2025-12-16T06:00:01+00:00",
"source": "chat",
"scope": "global",
"conversation_id": "conv-abc123",
"role": "assistant",
"turn_index": 1
}
}
Query example:
# Retrieve recent turns from specific conversation
results = rag_store.col.query(
query_texts=["conversation context"],
where={"type": "turn", "conversation_id": "conv-abc123"},
n_results=10
)
Summary Documents
Purpose: Compressed summaries of conversation turns for efficient context.
When used: Generated periodically (every N turns) to maintain context efficiency.
Metadata fields:
- type:
Fixed value:
summary- conversation_id:
- Type:
string- Required:
Yes
- Format:
conv-<uuid>- Description:
Conversation this summary belongs to
- turn_range:
- Type:
string(optional)- Description:
Range of turns covered by this summary
- Format:
<start>-<end>- Examples:
1-8,9-16
- summary_index:
- Type:
integer(optional)- Description:
Sequential summary number within conversation
- Examples:
1,2,3
Example document:
{
"id": "doc-sum-001",
"document": "Discussed web search configuration. User wants to set up SearxNG...",
"metadata": {
"type": "summary",
"timestamp": "2025-12-16T06:08:00+00:00",
"source": "system",
"scope": "global",
"conversation_id": "conv-abc123",
"turn_range": "1-8",
"summary_index": 1
}
}
Query example:
# Retrieve conversation summaries
results = rag_store.col.query(
query_texts=["web search configuration"],
where={"type": "summary"},
n_results=3
)
Retrieval Strategies
Semantic Similarity
All documents are retrieved via semantic similarity using the nomic-embed-text embedding model (768 dimensions).
# Basic semantic query
results = rag_store.col.query(
query_texts=["How do I use specialists?"],
n_results=5
)
Metadata Filtering
Chroma supports filtering by metadata fields using where clauses:
# Filter by type
where = {"type": "faq"}
# Filter by multiple fields
where = {"type": "memory", "importance": {"$gte": 4}}
# Filter by scope
where = {"scope": "entity:project-alpha"}
# Complex filters
where = {
"$and": [
{"type": "memory"},
{"importance": {"$gte": 4}},
{"tags": {"$contains": "critical"}}
]
}
Importance-Weighted Retrieval
For memory documents, retrieval can be weighted by importance and recency:
# Memory retrieval with importance weighting
memories = rag_store.retrieve_memories(
query="project status",
k=5,
entity="project-alpha" # Optional entity scoping
)
# Scoring formula:
# score = (similarity * 0.5) + (importance * 0.5) + (recency_bonus)
Recency-Based Ranking
Turn and summary documents use timestamp-based ranking for temporal awareness:
# Retrieve recent turns (automatically sorted by timestamp)
recent_turns = rag_store.col.query(
query_texts=["context"],
where={
"type": "turn",
"conversation_id": conversation_id
},
n_results=RAG_TURN_TOP_K
)
Usage Examples
Creating Documents
Persona:
from brain.rag_store import RagStore
rag_store = RagStore()
rag_store.upsert_doc(
text="You are Ada, a helpful assistant for luna.",
type="persona",
scope="global",
timestamp="2025-12-16T06:00:00+00:00",
source="kb",
version="2025-12-16T06:00:00+00:00"
)
FAQ:
rag_store.upsert_doc(
text="Q: How do I use web search?\nA: Use SPECIALIST_REQUEST[web_search:...]",
type="faq",
scope="global",
timestamp="2025-12-16T06:00:00+00:00",
source="system",
topic="specialists",
_specialist_name="web_search"
)
Memory:
rag_store.upsert_memory(
text="luna prefers Python over JavaScript",
scope="global",
importance=4,
tags=["preferences", "programming"],
timestamp="2025-12-16T06:00:00+00:00",
source="chat"
)
Turn:
conversation_id = "conv-abc123"
rag_store.upsert_turn(
conversation_id=conversation_id,
user_text="How do I configure web search?",
assistant_text="Set SEARXNG_URL in .env...",
user_ts="2025-12-16T06:00:00+00:00",
assistant_ts="2025-12-16T06:00:01+00:00",
source="chat"
)
Querying Documents
Get persona context:
persona_docs = rag_store.col.query(
query_texts=["What is my tone?"],
where={"type": "persona"},
n_results=2
)
Search FAQs:
faq_results = rag_store.col.query(
query_texts=["specialist documentation"],
where={"type": "faq", "topic": "specialists"},
n_results=3
)
Retrieve memories:
memories = rag_store.retrieve_memories(
query="user preferences",
k=5,
entity=None # Global scope
)
Get conversation history:
turns = rag_store.col.query(
query_texts=["recent discussion"],
where={"type": "turn", "conversation_id": "conv-abc123"},
n_results=10
)
Validating Metadata
Use Pydantic schemas for validation:
from brain.schemas import validate_metadata
# Valid metadata
meta = {
"type": "memory",
"timestamp": "2025-12-16T06:00:00+00:00",
"source": "chat",
"scope": "global",
"importance": 4,
"tags": ["test"]
}
validated = validate_metadata("memory", meta)
# Invalid metadata (raises ValidationError)
bad_meta = {
"type": "memory",
"importance": 10 # Out of range (1-5)
}
# Raises: ValidationError: importance must be between 1 and 5
Configuration
RAG-related configuration options (see Configuration Reference for full reference):
- RAG_ENABLED:
Master toggle for RAG system (default:
true)- RAG_ENABLE_PERSONA:
Enable persona document retrieval (default:
true)- RAG_ENABLE_FAQ:
Enable FAQ document retrieval (default:
true)- RAG_ENABLE_MEMORY:
Enable memory document retrieval (default:
true)- RAG_ENABLE_TURN:
Enable turn document retrieval (default:
true)- RAG_ENABLE_SUMMARY:
Enable summary document retrieval (default:
true)- RAG_TURN_TOP_K:
Number of turns to retrieve (default:
4, range: 1-20)- RAG_FAQ_TOP_K:
Number of FAQs to retrieve (default:
2, range: 1-10)- RAG_MEMORY_TOP_K:
Number of memories to retrieve (default:
3, range: 1-20)- RAG_SUMMARY_TOP_K:
Number of summaries to retrieve (default:
2, range: 1-10)
See Configuration Reference for complete details on all RAG settings.
See Also
Configuration Reference - Environment variable configuration
API Usage Guide - API usage and endpoints
Memory - Memory management and consolidation
Architecture - System architecture overview
Testing Guide - Testing and validation