When aDocumentation Index
Fetch the complete documentation index at: https://trigger.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
chat.agent turn runs out of memory, the worker process dies and everything in it is gone: the in-flight LLM call, the accumulator, any tool execution mid-flight. By default, Trigger.dev surfaces the OOM as a run failure.
Setting oomMachine opts the agent into automatic recovery: the failed turn re-runs on a larger machine, picks up the user message that triggered the OOM (without re-processing earlier completed turns), and produces a normal response.
Setup
oomMachine set, the agent gets:
retry.maxAttempts: 2internally — one retry for OOM only; non-OOM errors don’t retry.retry.outOfMemory.machine: oomMachine— the fresh attempt boots on the larger machine.session.incursor recovery — the new attempt skips records belonging to turns that already completed on the prior attempt and only re-runs the OOM’d turn.
chat.agent does not expose generic retry options. OOM recovery is the only retry path because retrying an LLM-driven loop on non-OOM errors tends to be expensive and side-effecting. Drop down to a raw task() with chat primitives if you need richer retry semantics.
How recovery works
The recovery doesn’t need any customer-side persistence to avoid duplicate processing. It uses two pieces of durable state Trigger already maintains for every chat:session.out— the durable response stream. Every successful turn writes atrigger:turn-completechunk here.session.in— the durable input stream. Every user message after the first turn lands here as a record with a server-assigned timestamp.
- Scans
session.outfor the latesttrigger:turn-completechunk and reads its timestamp. Call thisT_last_complete. - Sets a per-stream filter on
session.inso any record withtimestamp <= T_last_completeis dropped before it reaches the turn loop. - Begins normal processing. The first record that passes the filter is the message that triggered the OOM (or any newer message that arrived during the retry window).
session.out is streaming and bounded in memory: each chunk is inspected and discarded one at a time, so a long-running chat doesn’t bloat the retry-boot worker. Bandwidth scales linearly with session.out size, but only on the OOM-retry path — a rare event.
With hydrateMessages
If your agent uses hydrateMessages to load the durable conversation history per turn, the OOM’d turn re-runs against the full prior accumulator: the model sees [u1, a1, u2, a2, ..., u_N] and responds in context. This is the recommended pattern for production chats.
Without hydrateMessages
Recovery boot reconstructs context automatically. The boot reads both the durable session.out snapshot (settled turns) and the session.out tail past the snapshot cursor (the partial assistant chunks the OOM’d turn streamed before dying). When the new attempt processes the OOM’d user message, the model sees the full prior conversation plus the partial assistant that was cut off — so a “keep going” follow-up continues naturally, and any other follow-up has the same context the original turn had.
hydrateMessages is still the right choice if you want a single source of truth in your own database (branching conversations, message-level access control, etc.). It’s no longer required for OOM continuity.
For full control over recovery — drop the partial, synthesize tool results for an interrupted tool call, emit a recovery banner to the UI — register onRecoveryBoot.
Tool execute idempotency
If an OOM hits mid-tool-execution, the new attempt re-runs the entire turn — including the tool call. Make toolexecute functions idempotent or checkpoint their progress externally. Trigger doesn’t roll back side effects automatically.
Limitations
- One OOM retry per run.
chat.agentsetsmaxAttempts: 2. If attempt 2 also OOMs, the run fails. Use a sufficiently largeoomMachineto avoid this. - Single fallback tier. Only one
oomMachine. There’s no “tiered retry” (small → medium → large). If you need that, drop down to a rawtask()with chat primitives and configureretrydirectly. - Non-OOM errors don’t retry. Schema errors, model-call rejections, tool throws, etc. fail the run as before. Out-of-memory is the only retry trigger.
- Tools mid-execution are not checkpointed. A partially-run tool re-runs from scratch on the new attempt. Make them idempotent.
See also
- Recovery boot — the underlying hook + smart default that gives OOM recovery its full-context behavior
- Lifecycle hooks —
onChatResumefires on every retry attempt withphase: "preload"or"turn" - Database persistence — the
hydrateMessagespattern for branching, ACL, and DB-as-source-of-truth scenarios

