Haven Data Dictionary¶
Last Updated: January 2025
Status: Comprehensive Reference
Scope: Complete field definitions and mappings across all services
Table of Contents¶
- Overview
- Database Schema
- API Models
- Swift Data Structures
- Field Mappings Between Services
- Enrichment Metadata Structures
- Intent Signal Structures
- People Normalization Structures
Overview¶
This data dictionary provides comprehensive definitions for all data structures used throughout the Haven platform, including:
- Database Tables: Postgres schema with field definitions, constraints, and relationships
- API Models: Request/response structures for Gateway, Catalog, Search, and Worker services
- Swift Structures: Data models used in Haven.app and hostagent collectors
- Field Mappings: How fields transform as data flows between services
- Metadata Schemas: Enrichment, intent, and type-specific metadata structures
Data Flow Overview¶
Haven.app (Swift)
↓ CollectorDocument
↓ EnrichedDocument
↓ GatewaySubmissionClient
Gateway API (:8085)
↓ IngestRequestModel
↓ DocumentIngestRequest
Catalog API (:8081)
↓ documents table (with metadata.attachments)
↓ chunks table
↓ chunk_documents table
Worker Service
↓ EmbeddingSubmitRequest
↓ IntentSignalCreateRequest
Search Service (:8080)
↓ SearchDocument
↓ SearchChunk
Database Schema¶
Core Tables¶
documents¶
Primary table for all atomic units of information (messages, files, notes, reminders, etc.).
| Field | Type | Nullable | Description | Constraints |
|---|---|---|---|---|
doc_id |
UUID | NO | Primary key, auto-generated | PRIMARY KEY |
external_id |
TEXT | NO | Source-specific identifier | UNIQUE(source_type, source_provider, source_account_id, external_id, version_number) |
source_type |
TEXT | NO | Source system type | CHECK: imessage, sms, email, email_local, localfs, gdrive, note, reminder, macos_reminders, calendar_event, contact |
source_provider |
TEXT | YES | Provider name (e.g., "apple_messages", "gmail") | |
source_account_id |
TEXT | YES | Stable account identifier for multi-account sources | |
version_number |
INTEGER | NO | Document version (increments on edits) | DEFAULT 1 |
previous_version_id |
UUID | YES | Reference to previous version | FK → documents(doc_id) |
is_active_version |
BOOLEAN | NO | True for current version | DEFAULT true |
superseded_at |
TIMESTAMPTZ | YES | When this version was superseded | |
superseded_by_id |
UUID | YES | Reference to newer version | FK → documents(doc_id) |
title |
TEXT | YES | Document title/name | |
text |
TEXT | NO | Full searchable text content | |
text_sha256 |
TEXT | NO | SHA256 hash of text for deduplication | |
mime_type |
TEXT | YES | MIME type of content | |
canonical_uri |
TEXT | YES | Canonical URL/path reference | |
content_timestamp |
TIMESTAMPTZ | NO | Primary timestamp (sent, created, modified, due) | |
content_timestamp_type |
TEXT | NO | Meaning of timestamp | CHECK: sent, received, modified, created, event_start, event_end, due, completed |
people |
JSONB | NO | Array of person identifiers | DEFAULT '[]'::jsonb |
thread_id |
UUID | YES | Reference to parent thread | FK → threads(thread_id) |
parent_doc_id |
UUID | YES | Reference to parent document | FK → documents(doc_id) |
source_doc_ids |
UUID[] | YES | Documents this came from | |
related_doc_ids |
UUID[] | YES | Related document references | |
has_attachments |
BOOLEAN | NO | True if document has attachments | DEFAULT false |
attachment_count |
INTEGER | NO | Number of attachments | DEFAULT 0 |
has_location |
BOOLEAN | NO | True if document has location data | DEFAULT false |
has_due_date |
BOOLEAN | NO | True if document has due date | DEFAULT false |
due_date |
TIMESTAMPTZ | YES | Due date for tasks/reminders | |
is_completed |
BOOLEAN | YES | Completion status for tasks | |
completed_at |
TIMESTAMPTZ | YES | When task was completed | |
metadata |
JSONB | NO | Type-specific structured metadata | DEFAULT '{}'::jsonb |
status |
TEXT | NO | Processing workflow status | CHECK: submitted, extracting, extracted, enriching, enriched, indexed, failed |
extraction_failed |
BOOLEAN | NO | True if text extraction failed | DEFAULT false |
enrichment_failed |
BOOLEAN | NO | True if enrichment failed | DEFAULT false |
error_details |
JSONB | YES | Error information if processing failed | |
intent |
JSONB | YES | Intent classification data | |
relevance_score |
FLOAT | YES | Relevance score (0.0-1.0) for noise filtering | |
intent_status |
TEXT | NO | Intent processing status | CHECK: pending, processing, processed, failed, skipped |
intent_processing_started_at |
TIMESTAMPTZ | YES | When intent processing started | |
intent_processing_completed_at |
TIMESTAMPTZ | YES | When intent processing completed | |
intent_processing_error |
TEXT | YES | Error message if intent processing failed | |
ingested_at |
TIMESTAMPTZ | NO | When document was ingested | DEFAULT NOW() |
created_at |
TIMESTAMPTZ | NO | Record creation timestamp | DEFAULT NOW() |
updated_at |
TIMESTAMPTZ | NO | Last update timestamp | DEFAULT NOW() |
Indexes:
- idx_documents_source_type: source_type
- idx_documents_external_id: external_id
- idx_documents_source_account_id: source_account_id (partial, WHERE source_account_id IS NOT NULL)
- idx_documents_active_version: is_active_version (partial, WHERE is_active_version = true)
- idx_documents_thread: thread_id (partial, WHERE thread_id IS NOT NULL)
- idx_documents_content_timestamp: content_timestamp DESC
- idx_documents_status: status
- idx_documents_people: people (GIN)
- idx_documents_metadata: metadata (GIN)
- idx_documents_text_search: to_tsvector('english', text) (GIN)
- idx_documents_intent: intent (GIN)
- idx_documents_relevance_score: relevance_score (partial, WHERE relevance_score IS NOT NULL)
threads¶
First-class entity for conversations, chat threads, email threads.
| Field | Type | Nullable | Description | Constraints |
|---|---|---|---|---|
thread_id |
UUID | NO | Primary key | PRIMARY KEY |
external_id |
TEXT | NO | Source-specific thread identifier | UNIQUE(source_type, source_provider, source_account_id, external_id) |
source_type |
TEXT | NO | Source system type | CHECK: imessage, sms, email, slack, whatsapp, signal |
source_provider |
TEXT | YES | Provider name | |
source_account_id |
TEXT | YES | Stable account identifier for multi-account sources | |
title |
TEXT | YES | Thread title/name | |
participants |
JSONB | NO | Array of participant identifiers | DEFAULT '[]'::jsonb |
thread_type |
TEXT | YES | Thread type classification | |
is_group |
BOOLEAN | YES | True for group conversations | |
participant_count |
INTEGER | YES | Number of participants | |
metadata |
JSONB | NO | Thread-specific metadata | DEFAULT '{}'::jsonb |
first_message_at |
TIMESTAMPTZ | YES | Timestamp of first message | |
last_message_at |
TIMESTAMPTZ | YES | Timestamp of last message | |
created_at |
TIMESTAMPTZ | NO | Record creation timestamp | DEFAULT NOW() |
updated_at |
TIMESTAMPTZ | NO | Last update timestamp | DEFAULT NOW() |
Indexes:
- idx_threads_external_id: external_id
- idx_threads_source_type: source_type
- idx_threads_last_message: last_message_at DESC
- idx_threads_participants: participants (GIN)
- idx_threads_source_account_id: source_account_id (partial, WHERE source_account_id IS NOT NULL)
Note: Files and document_files tables have been removed. All attachment/file data is now stored in metadata.attachments within the documents table.
chunks¶
Text segments for semantic search with embeddings.
| Field | Type | Nullable | Description | Constraints |
|---|---|---|---|---|
chunk_id |
UUID | NO | Primary key | PRIMARY KEY |
text |
TEXT | NO | Chunk text content | |
text_sha256 |
TEXT | NO | SHA256 hash of chunk text | |
source_ref |
JSONB | YES | Source reference metadata (e.g., text span info) | |
embedding_status |
TEXT | NO | Embedding processing status | CHECK: pending, processing, embedded, failed, DEFAULT 'pending' |
embedding_model |
TEXT | YES | Model used for embedding | |
embedding_vector |
VECTOR(1024) | YES | Embedding vector | |
created_at |
TIMESTAMPTZ | NO | Record creation timestamp | DEFAULT NOW() |
updated_at |
TIMESTAMPTZ | NO | Last update timestamp | DEFAULT NOW() |
Indexes:
- idx_chunks_embedding_status: embedding_status
- idx_chunks_text_search: to_tsvector('english', text) (GIN)
chunk_documents¶
Junction table linking chunks to documents (many-to-many).
| Field | Type | Nullable | Description | Constraints |
|---|---|---|---|---|
chunk_id |
UUID | NO | Chunk reference | FK → chunks(chunk_id), PRIMARY KEY |
doc_id |
UUID | NO | Document reference | FK → documents(doc_id), PRIMARY KEY |
ordinal |
INTEGER | YES | Order within document | |
weight |
DECIMAL(3,2) | YES | Relevance weight (0.0-1.0) | CHECK: weight >= 0.0 AND weight <= 1.0 |
created_at |
TIMESTAMPTZ | NO | Record creation timestamp | DEFAULT NOW() |
Indexes:
- idx_chunk_documents_chunk: chunk_id
- idx_chunk_documents_doc: doc_id
ingest_submissions¶
Idempotency tracking for document ingestion.
| Field | Type | Nullable | Description | Constraints |
|---|---|---|---|---|
submission_id |
UUID | NO | Primary key | PRIMARY KEY |
idempotency_key |
TEXT | NO | Unique idempotency key | UNIQUE |
source_type |
TEXT | NO | Source system type | |
source_id |
TEXT | NO | Source-specific identifier | |
content_sha256 |
TEXT | NO | SHA256 hash of content | |
status |
TEXT | NO | Submission status | CHECK: submitted, processing, cataloged, completed, failed, DEFAULT 'submitted' |
result_doc_id |
UUID | YES | Resulting document ID | FK → documents(doc_id) |
batch_id |
UUID | YES | Batch reference | FK → ingest_batches(batch_id) |
error_details |
JSONB | YES | Error information if failed | |
created_at |
TIMESTAMPTZ | NO | Record creation timestamp | DEFAULT NOW() |
updated_at |
TIMESTAMPTZ | NO | Last update timestamp | DEFAULT NOW() |
Indexes:
- idx_ingest_submissions_status: status
- idx_ingest_submissions_source: source_type, source_id
- idx_ingest_submissions_batch_id: batch_id
ingest_batches¶
Batch tracking for bulk ingestion operations.
| Field | Type | Nullable | Description | Constraints |
|---|---|---|---|---|
batch_id |
UUID | NO | Primary key | PRIMARY KEY |
idempotency_key |
TEXT | NO | Unique batch idempotency key | UNIQUE |
status |
TEXT | NO | Batch status | CHECK: submitted, processing, completed, partial, failed, DEFAULT 'submitted' |
total_count |
INTEGER | NO | Total items in batch | DEFAULT 0 |
success_count |
INTEGER | NO | Successful items | DEFAULT 0 |
failure_count |
INTEGER | NO | Failed items | DEFAULT 0 |
error_details |
JSONB | YES | Batch-level error information | |
created_at |
TIMESTAMPTZ | NO | Record creation timestamp | DEFAULT NOW() |
updated_at |
TIMESTAMPTZ | NO | Last update timestamp | DEFAULT NOW() |
Indexes:
- idx_ingest_batches_status: status
- idx_ingest_batches_created_at: created_at DESC
People Normalization Tables¶
people¶
Canonical person records.
| Field | Type | Nullable | Description | Constraints |
|---|---|---|---|---|
person_id |
UUID | NO | Primary key | PRIMARY KEY |
display_name |
TEXT | NO | Display name | |
given_name |
TEXT | YES | Given/first name | |
family_name |
TEXT | YES | Family/last name | |
organization |
TEXT | YES | Organization name | |
nicknames |
TEXT[] | YES | Array of nicknames | DEFAULT '{}' |
notes |
TEXT | YES | Notes about person | |
photo_hash |
TEXT | YES | Hash of photo | |
source |
TEXT | NO | Source system | |
version |
INTEGER | NO | Version number | DEFAULT 1 |
deleted |
BOOLEAN | NO | Deletion flag | DEFAULT false |
merged_into |
UUID | YES | Reference to merged person | FK → people(person_id) |
created_at |
TIMESTAMPTZ | NO | Record creation timestamp | DEFAULT NOW() |
updated_at |
TIMESTAMPTZ | NO | Last update timestamp | DEFAULT NOW() |
person_identifiers¶
Normalized phone/email identifiers for people.
| Field | Type | Nullable | Description | Constraints |
|---|---|---|---|---|
person_id |
UUID | NO | Person reference | FK → people(person_id), PRIMARY KEY |
kind |
identifier_kind | NO | Identifier type | PRIMARY KEY |
value_raw |
TEXT | NO | Raw identifier value | PRIMARY KEY |
value_canonical |
TEXT | NO | Canonical (normalized) value | PRIMARY KEY |
label |
TEXT | YES | Label (home, work, mobile) | |
priority |
INTEGER | NO | Priority/order | DEFAULT 100 |
verified |
BOOLEAN | NO | Verification status | DEFAULT true |
Enum: identifier_kind: phone, email, imessage, shortcode, social
people_source_map¶
Maps external contact IDs to person_id.
| Field | Type | Nullable | Description | Constraints |
|---|---|---|---|---|
source |
TEXT | NO | Source system | PRIMARY KEY |
external_id |
TEXT | NO | External identifier | PRIMARY KEY |
person_id |
UUID | NO | Person reference | FK → people(person_id) |
Intent Signals Tables¶
intent_signals¶
Intent signals extracted from documents.
| Field | Type | Nullable | Description | Constraints |
|---|---|---|---|---|
signal_id |
UUID | NO | Primary key | PRIMARY KEY |
artifact_id |
UUID | NO | Document reference | FK → documents(doc_id) |
taxonomy_version |
VARCHAR(50) | NO | Intent taxonomy version | |
parent_thread_id |
UUID | YES | Parent thread reference | FK → threads(thread_id) |
signal_data |
JSONB | NO | IntentSignalData structure | |
status |
VARCHAR(20) | NO | User feedback status | CHECK: pending, confirmed, edited, rejected, snoozed, DEFAULT 'pending' |
user_feedback |
JSONB | YES | User feedback data | |
conflict |
BOOLEAN | NO | Conflict flag | DEFAULT FALSE |
conflicting_fields |
TEXT[] | YES | Conflicting field names | |
created_at |
TIMESTAMPTZ | NO | Record creation timestamp | DEFAULT NOW() |
updated_at |
TIMESTAMPTZ | NO | Last update timestamp | DEFAULT NOW() |
Indexes:
- idx_intent_signals_artifact: artifact_id
- idx_intent_signals_thread: parent_thread_id (partial, WHERE parent_thread_id IS NOT NULL)
- idx_intent_signals_status: status
- idx_intent_signals_data: signal_data (GIN)
API Models¶
Envelope v2 Models (Preferred)¶
Standardized transport wrapper and payloads used across Haven.app → Gateway → Catalog.
Envelope¶
| Field | Type | Required | Description |
|---|---|---|---|
schema_version |
string | YES | Semantic version of envelope + payload schema (e.g., "2.0") |
kind |
string | YES | "document" or "person" |
source |
object | YES | { source_type, source_provider, source_account_id } |
payload |
object | YES | Document or Person payload (see below) |
Document (payload)¶
Transport representation of a documents row plus structured metadata.
- Top-level fields:
external_id,version_number,title,text,text_sha256,mime_type,canonical_uri,content_timestamp,content_timestamp_type,people[],thread,relationships,facets,metadata,intent. - Mappings match the Database Schema section;
metadata.attachmentsis the sole owner of OCR/caption/vision/EXIF.
Person (payload)¶
Standardized person/contact object for people normalization.
- Core fields:
external_id,display_name,given_name,family_name,organization,nicknames[],notes,photo_hash,change_token,version,deleted,identifiers[]. - Identifiers align with
person_identifiersschema:{ kind, value_raw, value_canonical, label, priority, verified }.
Endpoints (Gateway)¶
POST /v2/ingest/document— acceptsEnvelope(kind=document).POST /v2/ingest/person— acceptsEnvelope(kind=person).POST /v2/ingest:batch— array of envelopes.
Gateway validates envelopes, normalizes timestamps, computes idempotency, and forwards payloads unchanged to Catalog v2.
Gateway API Models¶
Note: This section documents legacy v1 request/response models. Prefer the v2 envelope models above for all new integrations.
IngestRequestModel¶
Request model for document ingestion via Gateway API.
| Field | Type | Required | Description |
|---|---|---|---|
source_type |
string | YES | Source system type |
source_id |
string | YES | Source-specific identifier |
source_provider |
string | NO | Provider name |
title |
string | NO | Document title |
canonical_uri |
string | NO | Canonical URL/path |
content |
IngestContentModel | YES | Content data |
metadata |
object | NO | Type-specific metadata |
content_timestamp |
datetime | NO | Primary timestamp |
content_timestamp_type |
string | NO | Timestamp type |
source_account_id |
string | NO | Account identifier for multi-account sources |
people |
DocumentPerson[] | NO | Person identifiers |
thread_id |
UUID | NO | Thread reference |
thread |
ThreadPayloadModel | NO | Thread payload |
IngestContentModel¶
Content data for ingestion.
| Field | Type | Required | Description |
|---|---|---|---|
mime_type |
string | NO | MIME type (default: "text/plain") |
data |
string | YES | Base64-encoded content |
encoding |
string | NO | Encoding type |
DocumentPerson¶
Person identifier in document.
| Field | Type | Required | Description |
|---|---|---|---|
identifier |
string | YES | Identifier value (phone, email, etc.) |
identifier_type |
string | NO | Type (phone, email, etc.) |
role |
string | NO | Role (sender, recipient, participant, mentioned) |
display_name |
string | NO | Display name |
metadata |
object | NO | Additional metadata |
ThreadPayloadModel¶
Thread payload for ingestion.
| Field | Type | Required | Description |
|---|---|---|---|
external_id |
string | YES | Thread external ID |
source_type |
string | NO | Source type |
source_provider |
string | NO | Provider name |
title |
string | NO | Thread title |
participants |
DocumentPerson[] | NO | Participant list |
thread_type |
string | NO | Thread type |
is_group |
boolean | NO | Group conversation flag |
participant_count |
integer | NO | Participant count |
metadata |
object | NO | Thread metadata |
first_message_at |
datetime | NO | First message timestamp |
last_message_at |
datetime | NO | Last message timestamp |
IngestSubmissionResponse¶
Response from document ingestion.
| Field | Type | Description |
|---|---|---|
submission_id |
UUID | Submission identifier |
doc_id |
UUID | Created document ID |
external_id |
string | Document external ID |
version_number |
integer | Document version |
status |
string | Submission status |
thread_id |
UUID | Thread ID (if created) |
file_ids |
UUID[] | Associated file IDs |
duplicate |
boolean | True if duplicate submission |
total_chunks |
integer | Number of chunks created |
Catalog API Models¶
DocumentIngestRequest¶
Request model for Catalog API document ingestion.
| Field | Type | Required | Description |
|---|---|---|---|
idempotency_key |
string | YES | Unique idempotency key |
source_type |
string | YES | Source system type |
source_provider |
string | NO | Provider name |
source_id |
string | YES | Source-specific identifier |
content_sha256 |
string | YES | SHA256 hash of content |
external_id |
string | NO | External identifier |
title |
string | NO | Document title |
text |
string | YES | Text content |
mime_type |
string | NO | MIME type |
canonical_uri |
string | NO | Canonical URI |
metadata |
object | NO | Type-specific metadata |
content_timestamp |
datetime | YES | Primary timestamp |
content_timestamp_type |
string | YES | Timestamp type |
source_account_id |
string | NO | Account identifier for multi-account sources |
people |
PersonPayload[] | NO | Person identifiers |
thread_id |
UUID | NO | Thread reference |
thread |
ThreadPayload | NO | Thread payload |
parent_doc_id |
UUID | NO | Parent document reference |
source_doc_ids |
UUID[] | NO | Source document IDs |
related_doc_ids |
UUID[] | NO | Related document IDs |
has_location |
boolean | NO | Location flag |
has_due_date |
boolean | NO | Due date flag |
due_date |
datetime | NO | Due date |
is_completed |
boolean | NO | Completion status |
completed_at |
datetime | NO | Completion timestamp |
attachments |
DocumentFileLink[] | NO | File attachments |
DocumentFileLink¶
File attachment link (stored in metadata.attachments).
| Field | Type | Required | Description |
|---|---|---|---|
index |
integer | YES | Attachment index |
kind |
string | YES | Kind (image, pdf, file, other) |
role |
string | YES | Role (attachment, inline, thumbnail, related) |
mime_type |
string | YES | MIME type |
size_bytes |
integer | NO | File size in bytes |
source_ref |
AttachmentSourceRef | NO | Source reference (path, message_attachment_id, page) |
ocr |
AttachmentOCR | NO | OCR results |
caption |
AttachmentCaption | NO | Caption results |
vision |
AttachmentVision | NO | Vision results (faces, objects, scene) |
exif |
AttachmentEXIF | NO | EXIF metadata |
Note: FileDescriptor model is deprecated. Attachments are now stored directly in metadata.attachments with full enrichment data (OCR, caption, vision, exif).
| Field | Type | Required | Description |
|---|---|---|---|
content_sha256 |
string | YES | SHA256 hash |
object_key |
string | YES | Storage object key |
storage_backend |
string | NO | Storage backend |
filename |
string | NO | Filename |
mime_type |
string | NO | MIME type |
size_bytes |
integer | NO | File size |
enrichment_status |
string | NO | Enrichment status |
enrichment |
object | NO | Enrichment data |
DocumentIngestResponse¶
Response from Catalog API document ingestion.
| Field | Type | Description |
|---|---|---|
submission_id |
UUID | Submission identifier |
doc_id |
UUID | Created document ID |
external_id |
string | Document external ID |
version_number |
integer | Document version |
thread_id |
UUID | Thread ID (if created) |
file_ids |
UUID[] | Associated file IDs |
status |
string | Document status |
duplicate |
boolean | True if duplicate |
Search Service Models¶
SearchDocument¶
Document model for search service.
| Field | Type | Description |
|---|---|---|
doc_id |
UUID | Document ID |
external_id |
string | External ID |
source_type |
string | Source type |
source_provider |
string | Provider name |
title |
string | Document title |
canonical_uri |
string | Canonical URI |
mime_type |
string | MIME type |
content_timestamp |
datetime | Primary timestamp |
content_timestamp_type |
string | Timestamp type |
people |
SearchPerson[] | Person identifiers |
has_attachments |
boolean | Has attachments flag |
attachment_count |
integer | Attachment count |
has_location |
boolean | Has location flag |
has_due_date |
boolean | Has due date flag |
due_date |
datetime | Due date |
is_completed |
boolean | Completion status |
metadata |
object | Type-specific metadata |
thread_id |
UUID | Thread reference |
SearchChunk¶
Chunk model for search service.
| Field | Type | Description |
|---|---|---|
chunk_id |
UUID | Chunk ID |
text |
string | Chunk text |
ordinal |
integer | Order index |
Swift Data Structures¶
CollectorDocument¶
Base document structure from collectors.
| Field | Type | Description |
|---|---|---|
content |
String | Markdown text extracted from source |
sourceType |
String | Source system type |
externalId |
String | External identifier |
metadata |
DocumentMetadata | Document metadata |
images |
[ImageAttachment] | Array of extracted images (metadata only, files not retained) |
contentType |
DocumentContentType | Content type enum |
title |
String? | Document title |
canonicalUri |
String? | Canonical URI |
DocumentMetadata¶
Document metadata structure.
| Field | Type | Description |
|---|---|---|
contentHash |
String | Content hash |
mimeType |
String | MIME type |
timestamp |
Date? | Primary timestamp |
timestampType |
String? | Timestamp type |
createdAt |
Date? | Creation timestamp |
modifiedAt |
Date? | Modification timestamp |
additionalMetadata |
[String: String] | Additional metadata |
EnrichedDocument¶
Enriched document with progressive enhancements.
| Field | Type | Description |
|---|---|---|
base |
CollectorDocument | Base document |
documentEnrichment |
DocumentEnrichment? | Enrichment for primary document |
imageEnrichments |
[ImageEnrichment] | One per image, parallel to base.images array |
DocumentEnrichment¶
Enrichment for the primary document (text content).
| Field | Type | Description |
|---|---|---|
entities |
[Entity]? | Entities extracted from text + OCR text from all images |
enrichmentTimestamp |
Date | When enrichment was performed |
ImageEnrichment¶
Enrichment for a single image attachment.
| Field | Type | Description |
|---|---|---|
ocr |
OCRResult? | OCR results for this image |
faces |
FaceDetectionResult? | Face detection results for this image |
caption |
String? | Caption for this image (NOT enriched further) |
enrichmentTimestamp |
Date | When enrichment was performed |
EmailDocumentMetadata¶
Email-specific metadata structure.
| Field | Type | Description |
|---|---|---|
messageId |
String? | Email message ID |
subject |
String? | Email subject |
snippet |
String? | Email snippet |
listUnsubscribe |
String? | List unsubscribe header |
headers |
[String: String] | Email headers |
hasAttachments |
Bool | Has attachments flag |
attachmentCount |
Int | Attachment count |
contentHash |
String | Content hash |
references |
[String] | Email references |
inReplyTo |
String? | In-reply-to header |
intent |
EmailIntentPayload? | Intent classification |
relevanceScore |
Double? | Relevance score |
imageCaptions |
[String]? | Image captions |
bodyProcessed |
Bool? | Body processed flag |
enrichmentEntities |
[String: Any]? | Enrichment entities |
EmailIntentPayload¶
Email intent classification payload.
| Field | Type | Description |
|---|---|---|
primaryIntent |
String | Primary intent name |
confidence |
Double | Confidence score |
secondaryIntents |
[String] | Secondary intent names |
extractedEntities |
[String: String] | Extracted entities |
Field Mappings Between Services¶
Haven.app → Gateway API¶
| Haven.app Field | Gateway API Field | Transformation |
|---|---|---|
CollectorDocument.externalId |
IngestRequestModel.source_id |
Direct mapping |
CollectorDocument.sourceType |
IngestRequestModel.source_type |
Direct mapping |
CollectorDocument.content |
IngestRequestModel.content.data |
Base64 encoded |
CollectorDocument.metadata.mimeType |
IngestRequestModel.content.mime_type |
Direct mapping |
CollectorDocument.metadata.timestamp |
IngestRequestModel.content_timestamp |
ISO8601 format |
CollectorDocument.metadata.timestampType |
IngestRequestModel.content_timestamp_type |
Direct mapping |
CollectorDocument.metadata.timestamps.source_specific.* |
IngestRequestModel.metadata.timestamps.source_specific.* |
Timestamps stored in metadata.timestamps.source_specific |
CollectorDocument.title |
IngestRequestModel.title |
Direct mapping |
CollectorDocument.canonicalUri |
IngestRequestModel.canonical_uri |
Direct mapping |
EnrichedDocument.base.images |
Not sent | Image files NOT sent to gateway (business rule) |
EnrichedDocument.imageEnrichments |
IngestRequestModel.metadata.enrichment.images |
Embedded as metadata |
EnrichedDocument.documentEnrichment.entities |
IngestRequestModel.metadata.enrichment.entities |
Embedded as metadata |
Gateway API → Catalog API¶
| Gateway API Field | Catalog API Field | Transformation |
|---|---|---|
IngestRequestModel.source_id |
DocumentIngestRequest.source_id |
Direct mapping |
IngestRequestModel.source_type |
DocumentIngestRequest.source_type |
Direct mapping |
IngestRequestModel.content.data |
DocumentIngestRequest.text |
Base64 decoded |
IngestRequestModel.content.mime_type |
DocumentIngestRequest.mime_type |
Direct mapping |
IngestRequestModel.content_timestamp |
DocumentIngestRequest.content_timestamp |
Direct mapping |
IngestRequestModel.content_timestamp_type |
DocumentIngestRequest.content_timestamp_type |
Normalized to lowercase |
IngestRequestModel.people |
DocumentIngestRequest.people |
Direct mapping |
IngestRequestModel.thread |
DocumentIngestRequest.thread |
Direct mapping |
IngestRequestModel.metadata |
DocumentIngestRequest.metadata |
Direct mapping |
Generated idempotency_key |
DocumentIngestRequest.idempotency_key |
Generated from source_id + content_sha256 |
Computed content_sha256 |
DocumentIngestRequest.content_sha256 |
SHA256 of text |
Catalog API → Database¶
| Catalog API Field | Database Field | Transformation |
|---|---|---|
DocumentIngestRequest.source_id |
documents.external_id |
Direct mapping |
DocumentIngestRequest.source_type |
documents.source_type |
Direct mapping |
DocumentIngestRequest.text |
documents.text |
Direct mapping |
DocumentIngestRequest.content_sha256 |
documents.text_sha256 |
Direct mapping |
DocumentIngestRequest.content_timestamp |
documents.content_timestamp |
Direct mapping |
DocumentIngestRequest.content_timestamp_type |
documents.content_timestamp_type |
Normalized |
DocumentIngestRequest.people |
documents.people |
JSONB array |
DocumentIngestRequest.thread |
threads table |
Creates thread if not exists |
DocumentIngestRequest.thread.external_id |
threads.external_id |
Direct mapping |
DocumentIngestRequest.thread.source_account_id |
threads.source_account_id |
Direct mapping |
DocumentIngestRequest.source_account_id |
documents.source_account_id |
Direct mapping |
DocumentIngestRequest.attachments[] |
documents.metadata.attachments[] |
Stored in metadata.attachments array |
DocumentIngestRequest.attachments[].index |
metadata.attachments[].index |
Direct mapping |
DocumentIngestRequest.attachments[].kind |
metadata.attachments[].kind |
Direct mapping |
DocumentIngestRequest.attachments[].ocr |
metadata.attachments[].ocr |
Direct mapping |
DocumentIngestRequest.attachments[].caption |
metadata.attachments[].caption |
Direct mapping |
DocumentIngestRequest.attachments[].vision |
metadata.attachments[].vision |
Direct mapping |
DocumentIngestRequest.attachments[].exif |
metadata.attachments[].exif |
Direct mapping |
Database → Search Service¶
| Database Field | Search Service Field | Transformation |
|---|---|---|
documents.doc_id |
SearchDocument.doc_id |
Direct mapping |
documents.external_id |
SearchDocument.external_id |
Direct mapping |
documents.source_type |
SearchDocument.source_type |
Direct mapping |
documents.title |
SearchDocument.title |
Direct mapping |
documents.text |
SearchDocument.raw_text |
Direct mapping |
documents.content_timestamp |
SearchDocument.content_timestamp |
Direct mapping |
documents.people |
SearchDocument.people |
Converted to SearchPerson[] |
chunks.chunk_id |
SearchChunk.chunk_id |
Direct mapping |
chunks.text |
SearchChunk.text |
Direct mapping |
chunk_documents.ordinal |
Order within document | Ordinal stored in chunk_documents, not chunks |
Metadata Structure¶
The documents.metadata JSONB field has a fixed set of top-level keys to ensure consistent structure across all document types.
Top-Level Metadata Keys¶
{
"ingested_at": "<ISO-8601 UTC timestamp>",
"timestamps": { ... },
"attachments": [ ... ],
"source": { ... },
"type": { ... },
"enrichment": { ... },
"extraction": { ... }
}
| Key | Type | Description |
|---|---|---|
ingested_at |
string | ISO-8601 UTC timestamp when Catalog stored this document |
timestamps |
object | Timestamp structure (see Timestamps section) |
attachments |
array | Full description of attached images/files (see Attachments section) |
source |
object | Raw source-system oriented details |
type |
object | Normalized, type-specific semantics |
enrichment |
object | ML-derived enrichment over document text |
extraction |
object | Ingestion and parsing diagnostics |
metadata.timestamps¶
The timestamps structure mirrors the document-level content_timestamp and content_timestamp_type fields.
{
"timestamps": {
"primary": {
"value": "<ISO-8601 UTC timestamp>",
"type": "<enum string matching content_timestamp_type>"
},
"source_specific": {
"<source_field_name>": "<ISO-8601 or raw string>",
"...": "..."
}
}
}
Rules:
- primary.value must equal content_timestamp
- primary.type must equal content_timestamp_type
- source_specific keys are source-defined (e.g., sent_at, received_at, internaldate, header_date, fs_created, fs_modified, exif_taken_at)
metadata.attachments¶
Attachments represent any file/image that is part of the document. All OCR, caption, face detection, and EXIF information is stored here.
See the Attachments section for the full schema.
Note: There is no separate files table; all file-level enrichment is embedded in metadata.attachments.
metadata.source¶
Contains raw source-system details for debugging or advanced features.
Examples:
- iMessage: { "imessage": { "chat_guid": "...", "handle_id": 42, "service": "iMessage", "row_id": 123456 } }
- Email: { "email": { "folder": "INBOX", "uid": 12345, "raw_flags": [...], "header_map": {...} } }
metadata.type¶
Exposes normalized semantics by document kind.
Base structure:
{
"type": {
"kind": "email" // or imessage | sms | note | reminder | calendar_event | file | ...
}
}
Type-specific examples:
- Email: { "kind": "email", "email": { "subject": "...", "is_outbound": true, "in_reply_to_message_id": "..." } }
- iMessage: { "kind": "imessage", "imessage": { "direction": "outgoing", "is_group": true } }
- Reminder: { "kind": "reminder", "reminder": { "status": "open", "priority": 1, "due_date": "..." } }
metadata.enrichment¶
ML-derived document-level signals over text.
{
"enrichment": {
"entities": [
{ "text": "Acme HVAC", "type": "organization", "offset": 10, "length": 9 }
],
"classification": {
"categories": [
{ "label": "home_maintenance", "confidence": 0.92 }
]
}
}
}
metadata.extraction¶
Tracks how the document was ingested and processed (diagnostic information).
{
"extraction": {
"collector_name": "imessage",
"collector_version": "1.3.0",
"hostagent_modules": ["ocr", "entities", "faces"],
"warnings": [
{ "code": "ATTACHMENT_OCR_FAILED", "attachment_index": 2 }
]
}
}
Image Token → Slug Flow¶
Haven uses a token-based approach to preserve positional context for images in text content:
Token Insertion (Collection Phase)¶
During document collection, collectors insert {IMG:<id>} tokens where images appear in text:
- iMessage: Replaces
\u{FFFC}(object replacement characters) with tokens using file path as id - Embedded images: Uses MD5 hash of image data as id
- File images: Uses canonical file path as id
Slug Replacement (Enrichment Phase)¶
After enrichment, EnrichmentMerger replaces tokens with image slugs:
{IMG:abc123} → [Image: <caption> | <filename or hash>]
Slug Format:
- caption: Image caption text, or "No caption" if captioning failed
- filename or hash: Image filename/path for files, MD5 hash for embedded images
Example:
Input: "Check this out: {IMG:sunset.jpg}"
Output: "Check this out: [Image: Beautiful sunset over ocean | sunset.jpg]"
This ensures exactly one slug per image at the correct positional context.
Enrichment Metadata Structures¶
Document Metadata Enrichment¶
Stored in documents.metadata.enrichment JSONB field.
{
"enrichment": {
"entities": [
{
"type": "PERSON",
"text": "John Doe",
"start": 0,
"end": 8,
"confidence": 0.95
}
],
// Image slugs are now embedded directly in text content, not stored separately
"images": [
{
"filename": "photo.jpg",
"caption": "A group of people standing in front of a building",
"ocr": "Extracted text from image",
"faces": [
{
"x": 0.1,
"y": 0.2,
"w": 0.15,
"h": 0.2,
"confidence": 0.92
}
]
}
]
}
}
Field Definitions:
| Field | Type | Description |
|---|---|---|
enrichment.entities |
array | Named entity recognition results |
enrichment.entities[].type |
string | Entity type (PERSON, ORGANIZATION, LOCATION, DATE, etc.) |
enrichment.entities[].text |
string | Entity text |
enrichment.entities[].start |
integer | Start offset in text |
enrichment.entities[].end |
integer | End offset in text |
enrichment.entities[].confidence |
float | Confidence score (0.0-1.0) |
enrichment.captions |
array | DEPRECATED: Image slugs now embedded directly in text content |
enrichment.images |
array | Image enrichment metadata |
enrichment.images[].filename |
string | Image filename |
enrichment.images[].caption |
string | Image caption |
enrichment.images[].ocr |
string | OCR extracted text |
enrichment.images[].faces |
array | Face detection results |
enrichment.images[].faces[].x |
float | Face bounding box x (normalized 0-1) |
enrichment.images[].faces[].y |
float | Face bounding box y (normalized 0-1) |
enrichment.images[].faces[].w |
float | Face bounding box width (normalized 0-1) |
enrichment.images[].faces[].h |
float | Face bounding box height (normalized 0-1) |
enrichment.images[].faces[].confidence |
float | Face detection confidence (0.0-1.0) |
File Enrichment¶
Stored in documents.metadata.attachments[].ocr, metadata.attachments[].caption, metadata.attachments[].vision, metadata.attachments[].exif JSONB fields.
{
"ocr": {
"text": "Extracted text from image",
"confidence": 0.95,
"language": "en",
"boxes": [
{
"text": "specific text region",
"x": 0.005,
"y": 0.861,
"w": 0.854,
"h": 0.065,
"confidence": 0.97
}
],
"entities": {
"dates": ["2024-10-15"],
"phone_numbers": ["+15551234567"],
"emails": ["test@example.com"],
"urls": ["https://example.com"],
"addresses": ["123 Main St, San Francisco"]
}
},
"caption": {
"text": "A group of people standing in front of a building",
"model": "llava:13b",
"confidence": 0.85,
"generated_at": "2025-10-08T19:25:30.000Z"
},
"vision": {
"faces": [
{
"x": 0.1,
"y": 0.2,
"w": 0.15,
"h": 0.2,
"confidence": 0.92
}
],
"objects": [
{
"label": "person",
"confidence": 0.95,
"count": 3
}
],
"scene": "outdoor",
"colors": {
"dominant": ["#F0F0F0", "#333333"]
}
},
"exif": {
"camera": "iPhone 14 Pro",
"taken_at": "2023-03-27T23:45:00.000Z",
"location": {
"latitude": 37.7749,
"longitude": -122.4194
},
"width": 1920,
"height": 1080
}
}
Field Definitions:
| Field | Type | Description |
|---|---|---|
ocr.text |
string | Full OCR extracted text |
ocr.confidence |
float | Overall OCR confidence (0.0-1.0) |
ocr.language |
string | Detected language code |
ocr.boxes |
array | Text region bounding boxes |
ocr.boxes[].text |
string | Text in this region |
ocr.boxes[].x |
float | X coordinate (normalized 0-1) |
ocr.boxes[].y |
float | Y coordinate (normalized 0-1) |
ocr.boxes[].w |
float | Width (normalized 0-1) |
ocr.boxes[].h |
float | Height (normalized 0-1) |
ocr.boxes[].confidence |
float | Region confidence (0.0-1.0) |
ocr.entities |
object | Extracted entities from OCR text |
caption.text |
string | Image caption |
caption.model |
string | Model used for captioning |
caption.confidence |
float | Caption confidence (0.0-1.0) |
caption.generated_at |
string | ISO8601 timestamp |
vision.faces |
array | Face detection results |
vision.objects |
array | Object detection results |
vision.scene |
string | Scene classification |
vision.colors.dominant |
array | Dominant color hex codes |
exif.camera |
string | Camera model |
exif.taken_at |
string | ISO8601 timestamp |
exif.location.latitude |
float | GPS latitude |
exif.location.longitude |
float | GPS longitude |
exif.width |
integer | Image width in pixels |
exif.height |
integer | Image height in pixels |
Type-Specific Metadata¶
iMessage/SMS Metadata¶
Stored in documents.metadata JSONB field.
{
"source": "imessage",
"message": {
"guid": "6F15DDA0-9D80-4872-9B99-51D509AD24BE",
"sent_at": "2020-09-10T22:42:46.195666+00:00",
"received_at": "2020-09-10T22:42:46.195666+00:00",
"sender": "+14197330824",
"is_from_me": false,
"service": "iMessage",
"direction": "received",
"read": true,
"row_id": 124398
},
"attachments": {
"known": [
{
"attachment_index": 0,
"filename": "IMG_2930.png",
"mime_type": "image/png",
"size_bytes": 359915,
"file_id": "uuid-of-file",
"enriched": true
}
]
},
"reply_to_guid": "parent-message-guid",
"thread_originator_guid": "thread-start-guid",
"associated_message_guid": "sticker-parent-guid",
"associated_message_type": 1000,
"is_audio_message": false,
"expire_state": null,
"expressive_send_style_id": null,
"extraction": {
"status": "ready",
"method": "direct_text"
},
"ingested_at": "2025-10-15T16:13:39.525813+00:00"
}
Field Definitions:
| Field | Type | Description |
|---|---|---|
message.guid |
string | Source-specific message ID (matches external_id suffix) |
message.row_id |
integer | Original database row ID for lookups |
message.sent_at |
string | ISO8601 sent timestamp |
message.received_at |
string | ISO8601 received timestamp |
message.sender |
string | Sender identifier (phone/email) |
message.is_from_me |
boolean | True if sent by user |
message.service |
string | Service type (iMessage, SMS) |
message.direction |
string | Message direction (sent, received) |
message.read |
boolean | Read status |
reply_to_guid |
string | Parent message GUID for thread reconstruction |
associated_message_type |
integer | 1000 = sticker, 2000 = reaction |
associated_message_guid |
string | Parent message for stickers/reactions |
is_audio_message |
boolean | True for voice messages |
expire_state |
integer | Voice message expiration (1=expired, 3=saved) |
expressive_send_style_id |
integer | Message effects ID (Bloom, Echo, Confetti, etc.) |
Email Metadata¶
Stored in documents.metadata JSONB field.
{
"source": "email_local",
"message_id": "message-id@example.com",
"subject": "Email Subject",
"snippet": "Email snippet text",
"list_unsubscribe": "mailto:unsubscribe@example.com",
"headers": {
"From": "sender@example.com",
"To": "recipient@example.com",
"Date": "Mon, 1 Jan 2024 12:00:00 +0000"
},
"has_attachments": true,
"attachment_count": 2,
"content_hash": "sha256-hash",
"references": ["ref1@example.com", "ref2@example.com"],
"in_reply_to": "parent@example.com",
"intent": {
"primary_intent": "bill",
"confidence": 0.85,
"secondary_intents": ["receipt"],
"extracted_entities": {
"amount": "$123.45",
"merchant": "Example Store"
}
},
"relevance_score": 0.92,
"image_captions": ["Caption 1", "Caption 2"],
"body_processed": true,
"enrichment_entities": {
"PERSON": ["John Doe"],
"ORGANIZATION": ["Example Corp"]
}
}
Field Definitions:
| Field | Type | Description |
|---|---|---|
message_id |
string | Email message ID header |
subject |
string | Email subject line |
snippet |
string | Email snippet/preview |
list_unsubscribe |
string | List unsubscribe header |
headers |
object | Email headers (key-value pairs) |
has_attachments |
boolean | Has attachments flag |
attachment_count |
integer | Number of attachments |
content_hash |
string | Content hash |
references |
array | Email references header |
in_reply_to |
string | In-reply-to header |
intent.primary_intent |
string | Primary intent (bill, receipt, appointment, etc.) |
intent.confidence |
float | Intent confidence (0.0-1.0) |
intent.secondary_intents |
array | Secondary intent names |
intent.extracted_entities |
object | Extracted entities from intent processing |
relevance_score |
float | Relevance score (0.0-1.0) |
image_captions |
array | Image captions |
body_processed |
boolean | Body processed flag |
enrichment_entities |
object | NER entities by type |
Intent Signal Structures¶
IntentSignalData¶
Complete intent signal schema stored in intent_signals.signal_data JSONB field.
{
"signal_id": "uuid",
"artifact_id": "doc-uuid",
"taxonomy_version": "1.0.0",
"intents": [
{
"name": "bill",
"confidence": 0.92,
"slots": {
"amount": "$123.45",
"merchant": "Example Store",
"due_date": "2024-12-31"
},
"missing_slots": ["account_number"],
"follow_up_needed": true,
"follow_up_reason": "Missing account number",
"evidence": {
"text_spans": [
{
"start_offset": 0,
"end_offset": 50,
"preview": "Your bill for $123.45 is due..."
}
],
"layout_refs": [
{
"attachment_id": "file-uuid",
"page": 1,
"block_id": "block-1",
"line_id": "line-1"
}
],
"entity_refs": [
{
"type": "MONEY",
"index": 0
}
]
}
}
],
"global_confidence": 0.92,
"processing_notes": ["NER completed", "Slot filling completed"],
"processing_timestamps": {
"ner_started_at": "2024-01-01T12:00:00Z",
"ner_completed_at": "2024-01-01T12:00:05Z",
"received_at": "2024-01-01T12:00:00Z",
"intent_started_at": "2024-01-01T12:00:05Z",
"intent_completed_at": "2024-01-01T12:00:10Z",
"emitted_at": "2024-01-01T12:00:10Z"
},
"provenance": {
"ner_version": "1.0.0",
"ner_framework": "spacy",
"classifier_version": "1.0.0",
"slot_filler_version": "1.0.0",
"config_snapshot_id": "config-123",
"processing_location": "server"
},
"parent_thread_id": "thread-uuid",
"conflict": false,
"conflicting_fields": []
}
Field Definitions:
| Field | Type | Description |
|---|---|---|
signal_id |
string | Unique signal identifier |
artifact_id |
string | Document ID that generated this signal |
taxonomy_version |
string | Intent taxonomy version |
intents |
array | List of detected intents |
intents[].name |
string | Intent name (bill, receipt, appointment, etc.) |
intents[].confidence |
float | Intent confidence (0.0-1.0) |
intents[].slots |
object | Filled slot values |
intents[].missing_slots |
array | Required slots that are missing |
intents[].follow_up_needed |
boolean | True if follow-up needed |
intents[].follow_up_reason |
string | Reason for follow-up |
intents[].evidence |
object | Evidence supporting intent |
intents[].evidence.text_spans |
array | Text span evidence |
intents[].evidence.text_spans[].start_offset |
integer | Start character offset |
intents[].evidence.text_spans[].end_offset |
integer | End character offset |
intents[].evidence.text_spans[].preview |
string | Text preview |
intents[].evidence.layout_refs |
array | Layout/OCR evidence |
intents[].evidence.layout_refs[].attachment_id |
string | Attachment file ID |
intents[].evidence.layout_refs[].page |
integer | Page number |
intents[].evidence.layout_refs[].block_id |
string | Block identifier |
intents[].evidence.layout_refs[].line_id |
string | Line identifier |
intents[].evidence.entity_refs |
array | Entity references |
intents[].evidence.entity_refs[].type |
string | Entity type |
intents[].evidence.entity_refs[].index |
integer | Entity index |
global_confidence |
float | Overall confidence score |
processing_notes |
array | Processing notes |
processing_timestamps |
object | Timing information |
processing_timestamps.ner_started_at |
string | NER start timestamp |
processing_timestamps.ner_completed_at |
string | NER completion timestamp |
processing_timestamps.received_at |
string | When signal was received |
processing_timestamps.intent_started_at |
string | Intent processing start |
processing_timestamps.intent_completed_at |
string | Intent processing completion |
processing_timestamps.emitted_at |
string | When signal was emitted |
provenance |
object | Processing provenance |
provenance.ner_version |
string | NER model version |
provenance.ner_framework |
string | NER framework (spacy, etc.) |
provenance.classifier_version |
string | Intent classifier version |
provenance.slot_filler_version |
string | Slot filler version |
provenance.config_snapshot_id |
string | Config snapshot ID |
provenance.processing_location |
string | Processing location (client, server, hybrid) |
parent_thread_id |
string | Parent thread ID |
conflict |
boolean | True if conflicts with other signals |
conflicting_fields |
array | Conflicting field names |
Document Intent Field¶
Stored in documents.intent JSONB field (simplified intent classification).
{
"primary_intent": "bill",
"confidence": 0.85,
"secondary_intents": ["receipt"],
"extracted_entities": {
"amount": "$123.45",
"merchant": "Example Store"
}
}
Field Definitions:
| Field | Type | Description |
|---|---|---|
primary_intent |
string | Primary intent name |
confidence |
float | Confidence score (0.0-1.0) |
secondary_intents |
array | Secondary intent names |
extracted_entities |
object | Extracted entities from intent processing |
People Normalization Structures¶
Person Payload¶
Person identifier payload used in API requests.
| Field | Type | Description |
|---|---|---|
identifier |
string | Identifier value (phone, email, etc.) |
identifier_type |
string | Type (phone, email, imessage, shortcode, social) |
role |
string | Role (sender, recipient, participant, mentioned, contact) |
display_name |
string | Display name |
metadata |
object | Additional metadata |
People JSONB Array¶
Stored in documents.people JSONB field.
[
{
"identifier": "+15551234567",
"identifier_type": "phone",
"role": "sender",
"display_name": "John Doe",
"metadata": {}
},
{
"identifier": "user@example.com",
"identifier_type": "email",
"role": "recipient",
"display_name": "Jane Smith",
"metadata": {}
}
]
Contact Payload¶
Contact ingestion payload for people normalization.
| Field | Type | Description |
|---|---|---|
external_id |
string | External contact ID |
display_name |
string | Display name |
given_name |
string | Given/first name |
family_name |
string | Family/last name |
organization |
string | Organization name |
nicknames |
array | Array of nicknames |
notes |
string | Notes |
photo_hash |
string | Photo hash |
emails |
ContactValue[] | Email addresses |
phones |
ContactValue[] | Phone numbers |
addresses |
ContactAddress[] | Addresses |
urls |
ContactUrl[] | URLs |
change_token |
string | Change token for incremental sync |
version |
integer | Version number |
deleted |
boolean | Deletion flag |
ContactValue¶
Contact identifier value.
| Field | Type | Description |
|---|---|---|
value |
string | Canonical value |
value_raw |
string | Raw value |
label |
string | Label (home, work, mobile) |
priority |
integer | Priority/order (default: 100) |
verified |
boolean | Verification status (default: true) |
ContactAddress¶
Contact address.
| Field | Type | Description |
|---|---|---|
label |
string | Label (home, work) |
street |
string | Street address |
city |
string | City |
region |
string | State/region |
postal_code |
string | Postal/ZIP code |
country |
string | Country |
ContactUrl¶
Contact URL.
| Field | Type | Description |
|---|---|---|
label |
string | Label (homepage, blog, etc.) |
url |
string | URL value |
Summary¶
This data dictionary provides comprehensive definitions for all data structures used throughout the Haven platform. Key points:
- Universal Document Model: All content (messages, files, notes, reminders) stored in unified
documentstable - Deduplication: Files deduplicated by SHA256, documents versioned with full history
- Progressive Enhancement: Support for partial data with clear status tracking
- Enrichment: OCR, face detection, entity extraction, and captioning stored in metadata
- Intent Signals: Structured intent classification with evidence and provenance
- People Normalization: Canonical person records with identifier mapping
For additional details, see: - Schema V2 Reference - Gateway API Reference - Architecture Overview