onPreload / onChatStart hooks, and only then the LLM call. Two features address this from different angles.
Picking an approach
| Preload | Head Start | |
|---|---|---|
| What it does | Eagerly triggers the run before the first message | Runs step 1’s LLM call in your warm process while the agent boots in parallel |
| First-turn TTFC win | Hides agent boot if the user does send a message | ~50% reduction (LLM TTFB floor); boot fully overlaps with TTFB |
| When to fire | Page load / input focus — your call | First message arrival — automatic |
| Cost when user never sends | Idle compute until the preload window times out | Zero (no run was triggered) |
| Requires a warm server process | No — works for browser-only surfaces | Yes — your route handler runs step 1 |
| Requires LLM keys client-side? | No | No — keys stay in your warm server |
| Bundle constraints | None | Route handler must import schema-only tools (no heavy executes) |
Preload
Preload eagerly triggers a run for a chat before the first message is sent. Initialization (DB setup, context loading) happens while the user is still typing, reducing first-response latency.Frontend
Calltransport.preload(chatId) to start a run early:
accessToken callback receives { chatId } and is invoked the same way on preload as on any other refresh — no special branching by purpose. See TriggerChatTransport options.
Backend
TheonPreload hook fires immediately. The run then waits for the first message. When the user sends a message, onChatStart fires with preloaded: true so you can skip work that already ran:
chat.createSession() or raw tasks, check payload.trigger === "preload" and wait for the first message:
Head Start
Head Start runs step 1’s LLM call in your warm server process while the agent run boots in parallel. The user sees one continuous turn: text first from your server, then a clean handover to the agent for tool execution and any further steps. The agent you hand off to can be achat.agent, a chat.customAgent, or a chat.createSession loop (see Handover with custom agents).
chat.headStart returns a standard Web Fetch API handler — (req: Request) => Promise<Response> — so it slots into any runtime that speaks Web Fetch.
Verified runtimes: Node 18+, Bun, Deno, Cloudflare Workers, Vercel (Node and Edge), Netlify (Functions and Edge). The handler uses only fetch and Web ReadableStream / TransformStream (no node:* imports), and the S2 streaming dependency picks the right transport for each runtime automatically (HTTP/2 on Node/Deno, HTTP/1.1 on Bun/Workers/browsers).
Compatible frameworks (native Web Fetch): Next.js App Router, Hono, SvelteKit, Remix, React Router v7, TanStack Start, Astro, Nitro/Nuxt, Elysia. Mount the handler directly.
Node-only frameworks (Express, Fastify, Koa): the handler still works, but the framework gives you a Node IncomingMessage instead of a Web Request. Use a small adapter — examples in Mounting in your framework below.
When the first turn is pure text (no tool calls), the agent run boots and exits without ever calling an LLM. You only pay for what the conversation actually needed.
Measured TTFC
3 runs each, prompt"say hi in five words", same model both sides (Anthropic Claude Sonnet 4):
| Without Head Start | With Head Start | Δ | |
|---|---|---|---|
| TTFT (avg) | 2801 ms | 1218 ms | −57% |
| TTFT (range) | 2351–3101 ms | 1201–1252 ms | |
| Total turn | 4180 ms | 2345 ms | −44% |
How it works
Browser POSTs the first message to your route handler
The transport sees
headStart: "/api/chat" is set and there’s no session yet for this chat. It POSTs the wire payload (messages, chatId, metadata) to your route handler.Your handler creates the session and triggers the agent run
A single
apiClient.createSession round-trip both creates the chat session and triggers an agent run with trigger: "handover-prepare". The agent run boots into a wait state on session.in.Your handler runs streamText step 1
streamText runs in your warm process with stopWhen: stepCountIs(1). The output is streamed to the browser as SSE while the agent run boots in parallel. Boot time (~488ms) overlaps with LLM TTFB (~389ms), fully hidden.Mid-turn handover
On step 1’s
tool-calls finish, your handler signals the agent and the SDK splices the agent’s step-2+ stream into the same SSE response. On pure-text finish, your handler signals handover-skip and the agent run exits clean — no LLM call from the trigger side.Setup
Split your tool definitions into schemas + executes
Schemas in one module (light deps), executes in another (heavy deps). The agent task pulls in both; the route handler pulls in schemas only.
lib/chat-tools/schemas.ts
trigger/chat-tools.ts
Define your chat.agent (heavy executes)
The agent uses the full tool set — these are the executes that run when step 2+ needs them.
trigger/chat.ts
Build the head-start handler
Call Mount the handler in whatever framework you use — see Mounting in your framework below.
chat.headStart({ agentId, run }). It returns a standard Web Fetch handler: (req: Request) => Promise<Response>. Inside the run callback you call streamText yourself and spread chat.toStreamTextOptions({ tools }) to inherit the SDK-owned wiring (messages, schema-only tools, stopWhen: stepCountIs(1), abort signal). Add your own model and system on top.lib/chat-handler.ts
Mounting in your framework
chat.headStart returns a Web Fetch handler — (req: Request) => Promise<Response>. Frameworks that natively pass Web Request objects mount it as-is. Node-only frameworks (Express, Fastify, Koa) need a small adapter.
Web Fetch frameworks (recommended)
Edge / standalone runtimes
Node-only frameworks
Express, Fastify, and Koa pass NodeIncomingMessage / ServerResponse objects rather than Web Request / Response. The SDK ships chat.toNodeListener that wraps any Web Fetch handler as a Node (req, res) listener — body bytes are read upfront, headers translated, the response body streamed chunk-by-chunk, and client disconnect is propagated to the handler via AbortSignal.
Streaming response timeouts
The handler keeps the SSE response open until the agent run signals turn-complete (or skip, on a pure-text turn). Make sure your framework / serverless function timeout accommodates that:- Pure-text first turns: ~LLM TTFB (1–3 s typically).
- Tool-calling first turns: LLM step 1 + agent boot + tool execution + step 2 LLM call. Usually 5–15 s; longer for multi-step tool use.
- Vercel: default function timeout is 10 s on Hobby, 60 s on Pro. Set
export const maxDuration = N;on the route segment. - Cloudflare Workers: default 30 s CPU time (paid plans up to 5 min). Streaming wall time is generally not the bottleneck.
- AWS Lambda behind API Gateway: 29 s API Gateway hard limit; Lambda Function URL allows up to 15 min.
What gets routed where
| First turn (handover) | Subsequent turns | |
|---|---|---|
| Browser sends message via | POST to headStart URL | Direct write to session.in |
| Step 1 LLM call runs in | Your warm process | Trigger.dev agent run |
| Tool execution runs in | Trigger.dev agent run | Trigger.dev agent run |
| Step 2+ LLM call runs in | Trigger.dev agent run | Trigger.dev agent run |
onChatStart / onTurnStart fire | After handover signal arrives | Normally |
hydrateMessages fires (if registered) | After handover, with the first-turn history as incomingMessages | Normally |
onTurnComplete fires | After turn finishes (handover) or skipped (handover-skip) | Normally |
Persistence and the handover contract
A head-start turn persists exactly like a normal turn — the handover machinery is invisible to your hooks. The guarantees:- One stable assistant
messageIdacross the whole turn. The route handler generates the id, the handover signal carries it to the agent, and the agent’s step 2+ stream reuses it — so the browser merges step 1 and step 2+ into a single assistant message, and you can merge-by-id when persisting. onTurnCompleteis the canonical persistence point, same as any turn. It carries the full assistant message under that one id: step-1 text, reasoning, and tool calls plus step-2+ tool results and text. The database persistence patterns apply unchanged.- Reasoning parts survive the handover. When step 1 runs on an extended-thinking model, the reasoning streamed by your route handler lands in the durable session history (and
onTurnComplete) under the samemessageId, with provider metadata intact — Anthropic thinking signatures survive a replay back to the model. Step-2 reasoning appends to the same message rather than replacing it.
With hydrateMessages
Head Start composes with hydrateMessages. On the first turn, the hook receives the route handler’s first-turn history as incomingMessages — the canonical upsert-and-return pattern persists the user message exactly as it would on a direct-trigger turn. The runtime splices the warm handler’s partial assistant onto your hydrated chain after the hook returns, deduplicated by the assistant messageId, so your hook never needs to include the in-flight partial.
Your hydrate hook shapes model context, not the transcript — dropping reasoning-only entries or unresolved tool rows from the returned chain is fine and does not affect what onTurnComplete persists or what the UI renders.
Handover with custom agents
The route handler is backend-agnostic:agentId can point at a chat.agent, a chat.customAgent, or a chat.createSession loop. With chat.agent the handover is consumed for you (the steps above). The two hand-rolled backends consume it explicitly on turn 0.
chat.createSession
The turn iterator surfaces the handover asturn.handover. On a final (pure-text) handover, call turn.complete() with no source to finalize the warm partial without streaming; otherwise stream as usual. The iterator threads the spliced partial as originalMessages for you, so a resumed tool round merges into the handed-over assistant.
trigger/chat.ts
chat.customAgent
In a hand-rolled loop, callconversation.consumeHandover({ payload }) at the top of turn 0. It waits for the handover signal, seeds prior history from payload.headStartMessages, splices the warm step-1 partial into the accumulator, and returns { isFinal, skipped }.
trigger/chat.ts
trigger === "handover-prepare" — consumeHandover consumes the warm handover, not a normal first message. See Custom agents for the full loop (continuation seeding, stop handling, persistence). The lower-level chat.waitForHandover({ payload }) and accumulator.applyHandover(signal) are exported if you need to wait and splice in separate steps.
Always pass
originalMessages: conversation.uiMessages to pipeAndCapture in a custom loop. It keeps assistant message IDs stable across turns and lets a tool-approval or handover resume merge into the trailing assistant — the same threading chat.agent does internally.The chat.headStart API
triggerConfig sets run options on the auto-triggered handover-prepare run: tags, queue, machine, maxAttempts, maxDuration, region, and lockToVersion. The chat:{chatId} tag is prepended automatically. Because the session is created once on the first head-start turn (idempotent on the chat id), this is the only place to set those options for a head-start chat’s lifetime, mirroring what chat.createStartSessionAction sets for the direct-trigger path.
lib/chat-handler.ts
run callback receives:
messages: UIMessage[]— user messages parsed from the request body.signal: AbortSignal— fires when the request closes or the SDK times out the handover.chat: HeadStartChatHelper<TTools>— exposeschat.toStreamTextOptions({ tools })and achat.sessionescape hatch for power users.
chat.toStreamTextOptions({ tools }) returns options to spread into streamText. The SDK owns these keys — overriding them will break the protocol:
| Key | What the SDK sets | Why |
|---|---|---|
messages | convertToModelMessages(uiMessages) | First-turn user history |
tools | What you pass | Schema-only tools for step 1 |
stopWhen | stepCountIs(1) | Step 1 only — agent picks up step 2+ |
abortSignal | Combined request + idle timeout | Safe cleanup on disconnect |
model, system, providerOptions, prepareStep, anything else streamText accepts.
The transport option
useChat endpoint — it’s not the canonical request URL for every turn, just the first-turn shortcut.
Limitations
- First turn only. Step 2+ and turn 2+ run on the trigger side. There’s no per-turn “head start every turn” mode — the win comes from amortizing agent boot across the LLM call once.
- Single step on the warm-server side. The handler runs
stopWhen: stepCountIs(1). Multi-step handover (handler does step 1 + step 2 + …) is out of scope. - Your server needs an LLM provider key. The first-turn LLM call runs in your warm process, so that environment needs whatever keys the model requires. The agent’s executes still run on the Trigger.dev side with whatever environment variables they need there.
- Browser-only chat surfaces don’t apply. Without a warm server process, there’s nowhere to run step 1 ahead of the agent run. Use Preload or eat the cold-start tax.
- Streaming-capable runtime required. Your framework / runtime has to support streaming HTTP responses (Web Fetch
Responsebody or equivalent). Most modern hosts do — Next.js, Hono, SvelteKit, Workers, Bun, Deno, Vercel, etc. Some legacy platforms that buffer full responses won’t deliver chunks until the turn is over, which negates the TTFC benefit (correctness still holds). - Non-
useChatchat surfaces (Slack bots, Discord bots, custom protocols) don’t fit thechat.headStartshape — the API expects the AI SDK transport’s wire payload on input. For those, trigger the chat.agent directly from your bot handler.
Reference
chat.headStartfactory and types — full signatures forHeadStartRunArgs,HeadStartChatHelper,HeadStartSession,HeadStartHandlerOptions.headStarttransport option — alongsideaccessToken,startSession, etc.onPreloadhook — the backend hook that fires when a run is preloaded.- Custom agents — the
chat.customAgentandchat.createSessionloops thatconsumeHandover/turn.handoverplug into.

