Skip to content

Email Collectors

Haven supports two email collection methods: IMAP for remote mailboxes and local Mail.app for macOS email clients.

Overview

Email collectors: - Extract emails from IMAP servers or macOS Mail.app - Preserve email structure (threads, attachments, metadata) - Enrich attachments with OCR and captioning - Link emails to contacts and conversations

Prerequisites

  • macOS (for Mail.app collector)
  • IMAP access (for IMAP collector)
  • Gateway API running and accessible

IMAP Collector

The IMAP collector fetches emails directly from IMAP servers.

Using Haven.app

  1. Configure IMAP Account:
  2. Open Settings (⌘,)
  3. Navigate to Email Collector → IMAP
  4. Configure:

    • Server hostname
    • Port (usually 993 for SSL)
    • Username
    • Password or app-specific password
    • Folders to sync (e.g., INBOX, Sent)
  5. Run Collector:

  6. Open Collectors window (⌘2)
  7. Select Email collector
  8. Click "Run"

Configuration

Haven.app Settings:

collectors:
  email:
    imap:
      enabled: true
      server: "imap.example.com"
      port: 993
      tls: true
      username: "user@example.com"
      password: "password"  # or use keychain reference
      folders:
        - "INBOX"
        - "Sent"
        - "Receipts"

Authentication:

  • Password: Direct password (stored securely)
  • App Password: For accounts with 2FA
  • OAuth2: XOAUTH2 support (via keychain)

IMAP Collector Details

The IMAP collector: - Connects to IMAP server - Fetches emails from specified folders - Processes emails in batches - Handles attachments and inline content - Preserves email threading

Supported Features: - Multiple folders - SSL/TLS encryption - App-specific passwords - OAuth2 authentication - Attachment extraction

Limitations: - Requires MailCore2 framework (x86_64, Rosetta on Apple Silicon) - Messages kept in-memory during processing - Large mailboxes may take time

Local Mail.app Collector

The local collector reads emails from macOS Mail.app's local cache.

Using Haven.app

  1. Configure Mail.app Collector:
  2. Open Settings (⌘,)
  3. Navigate to Email Collector → Local
  4. Configure:

    • Mail data directory (usually ~/Library/Mail)
    • Account selection
    • Folders to sync
  5. Run Collector:

  6. Open Collectors window (⌘2)
  7. Select Email collector
  8. Click "Run"

Configuration

Haven.app Settings:

collectors:
  email:
    local:
      enabled: true
      mail_directory: "~/Library/Mail"
      accounts:
        - "iCloud"
        - "Gmail"
      folders:
        - "INBOX"
        - "Sent Messages"

Mail.app Collector Details

The local collector: - Reads .emlx files from Mail.app cache - Extracts email content and metadata - Processes attachments - Preserves folder structure

File Structure:

Mail.app stores emails in:

~/Library/Mail/
  └── V[version]/
      └── [Account]/
          └── [Mailbox]/
              └── [UID].emlx

Supported Formats: - .emlx - Mail.app message format - Extracts headers, body, attachments - Preserves threading via Message-ID

How Email Collection Works

Email Processing

  1. Email Extraction:
  2. Reads email from source (IMAP or Mail.app)
  3. Extracts headers, body, attachments
  4. Parses MIME structure

  5. Metadata Extraction:

  6. From, To, CC, BCC addresses
  7. Subject, Date, Message-ID
  8. Threading information
  9. Folder/mailbox name

  10. People Resolution:

  11. Normalizes email addresses
  12. Links to people table
  13. Creates document_people relationships

  14. Attachment Processing:

  15. Extracts attachments
  16. Uploads to Gateway/MinIO
  17. Enriches images with OCR
  18. Links to email document

  19. Document Creation:

  20. Creates document with email content
  21. Links to thread (if applicable)
  22. Adds metadata (source, folder, date)
  23. Creates chunks for search

Threading

Emails are linked via: - Message-ID: Original message identifier - In-Reply-To: Parent message - References: Thread chain - Subject: Thread matching (fallback)

Deduplication

Emails are deduplicated by: - Message-ID (if present) - Content hash - Source + external ID

Configuration

Environment Variables

See Configuration Reference for complete list. Key variables:

  • AUTH_TOKEN - Gateway API authentication
  • GATEWAY_URL - Gateway API base URL
  • OLLAMA_ENABLED - Enable image captioning
  • OLLAMA_API_URL - Ollama server URL

Haven.app Configuration

Per-Collector Enrichment:

Control enrichment behavior for Email collector via Settings (⌘,) → Enrichment Settings:

  • Skip Enrichment: When enabled, email documents are submitted without OCR, face detection, entity extraction, or captioning
  • Default: Enrichment is enabled (skipEnrichment: false)

Global enrichment module settings (OCR quality, entity types, captioning models) are configured in Advanced Settings. See Configuration Reference for details.

IMAP Configuration

Server Settings:

Provider Server Port TLS
Gmail imap.gmail.com 993 Yes
iCloud imap.mail.me.com 993 Yes
Outlook outlook.office365.com 993 Yes
Yahoo imap.mail.yahoo.com 993 Yes

App Passwords:

For accounts with 2FA, use app-specific passwords: - Gmail: Settings → Security → App passwords - iCloud: appleid.apple.com → App-Specific Passwords - Outlook: Security → App passwords

Mail.app Configuration

Default Mail Directory: - ~/Library/Mail - Standard location - ~/Library/Mail/V[version] - Version-specific

Account Discovery: - Automatically discovers accounts - Lists available mailboxes - Can filter by account name

Troubleshooting

IMAP Connection Issues

Error: Cannot connect to IMAP server

Solutions: - Verify server hostname and port - Check SSL/TLS settings - Verify credentials - Check firewall/network settings - Try app-specific password for 2FA accounts

Mail.app Access Issues

Error: Cannot read Mail.app data

Solutions: - Grant Full Disk Access permission - Verify Mail.app directory path - Check file permissions - Ensure Mail.app is not running (may lock files)

Missing Emails

Issue: Some emails not being collected

Solutions: - Check folder configuration - Verify email filters - Review deduplication logs - Check for permission issues - Verify email format is supported

Attachment Issues

Error: Attachments not processing

Solutions: - Check attachment size limits - Verify Gateway/MinIO is accessible - Review attachment extraction logs - Check for corrupted attachments

Performance Considerations

IMAP Collector

  • Batch Size: Process emails in batches (default: 50)
  • Folders: Limit folders for faster syncs
  • Date Range: Use date filters for initial sync
  • Network: Ensure stable connection

Mail.app Collector

  • Large Mailboxes: May take time for initial sync
  • Incremental: Faster after initial sync
  • File Access: Reading .emlx files is I/O intensive

General Tips

  • Start with recent emails (date filter)
  • Process in smaller batches for large mailboxes
  • Use incremental sync after initial import
  • Monitor memory usage for large attachments