Email Local Collector¶
The Email Local collector orchestrates ingestion of macOS Mail.app .emlx messages through HostAgent. It provides HTTP endpoints that launch a collection run and expose the most recent state so other services (Gateway, CLI tools) can monitor progress.
Overview¶
- Module toggle: controlled via
modules.mail.enabledinhostagent.yaml. - Modes:
simulate(fixture-based) andreal(scans live Mail.app cache; currently stubbed). - State tracking: HostAgent records whether a run is in progress, the last result, timestamps, counters, and the last error string.
- Permissions: simulate mode does not require Full Disk Access; real mode will require Mail cache access once implemented.
Run Endpoint — POST /v1/collectors/email_local:run¶
Launches a collection run. For now, only simulate mode is implemented.
Request Body¶
| Field | Type | Required | Description |
|---|---|---|---|
mode |
string |
Optional (default simulate) |
Either simulate or real. |
limit |
integer |
Optional (default 100) |
Maximum number of .emlx files to process (1–10,000). |
simulate_path |
string |
Required for simulate |
Absolute path to a single .emlx file or directory containing fixtures. |
Example (simulate mode):
curl -X POST http://localhost:7090/v1/collectors/email_local:run \
-H "Content-Type: application/json" \
-H "x-auth: changeme" \
-d '{
"mode": "simulate",
"simulate_path": "/Users/alex/haven-fixtures/emlx",
"limit": 50
}'
Success Response (200 OK)¶
{
"status": "completed",
"mode": "simulate",
"limit": 50,
"simulate_path": "/Users/alex/haven-fixtures/emlx",
"stats": {
"messages_processed": 48,
"documents_created": 48,
"attachments_processed": 6,
"errors_encountered": 0,
"start_time": "2025-10-21T17:46:10.934Z",
"end_time": "2025-10-21T17:46:11.121Z",
"duration_ms": 187
},
"warnings": [
"Failed to parse 1043.emlx: Email file not found: /Users/alex/haven-fixtures/emlx/1043.emlx"
]
}
Run Status Values¶
completed— run finished without parse errors.partial— some messages failed to parse; warnings array is populated.failed— run aborted due to a fatal error (response includes"error"payload instead).
Validation & Error Responses¶
| HTTP Status | Reason |
|---|---|
400 Bad Request |
Invalid mode, malformed JSON body, limit outside allowed range, or missing simulate_path in simulate mode. |
404 Not Found |
Provided simulate_path does not exist or contains no .emlx files. |
409 Conflict |
A run is already in progress. |
503 Service Unavailable |
Mail module disabled in config. |
501 Not Implemented |
Real mode invoked (placeholder). |
State Endpoint — GET /v1/collectors/email_local/state¶
Returns the last known state without triggering a run.
Example:
{
"is_running": false,
"status": "completed",
"last_run_time": "2025-10-21T17:46:11.121Z",
"last_run_stats": {
"messages_processed": 48,
"documents_created": 48,
"attachments_processed": 6,
"errors_encountered": 0,
"start_time": "2025-10-21T17:46:10.934Z",
"end_time": "2025-10-21T17:46:11.121Z",
"duration_ms": 187
},
"last_run_error": null
}
Fields¶
is_running— boolean flag indicating an active run.status— last terminal status (idle,running,completed,partial,failed).last_run_time— ISO-8601 timestamp of the most recent run completion (if any).last_run_stats— same structure as the run responsestatsobject; omitted if no prior runs.last_run_error— string containing the last fatal error, when applicable.
Configuration¶
Enable the mail module and point to filters (advanced filtering is handled by HavenCore mail filters):
modules:
mail:
enabled: true
filters:
combination_mode: any
default_action: include
files:
- ~/.haven/email_collector_filters.yaml
Ensure the HostAgent process has permissions to read the directories referenced by simulate_path. For CI, commit .emlx fixtures into hostagent/Tests/HostHTTPTests/Fixtures and point the run request to the extracted fixture path on disk.
Current Limitations¶
realmode is stubbed until the Mail cache crawler (haven-54/30/31) is delivered.- Attachments are counted but not yet enriched; enrichment happens inside the collector implementation.
- The handler focuses on orchestration and reporting; ingestion to Gateway is handled by the downstream collector tasks (see Email Gateway Submission for payload details).