The first turn of a brand-new conversation pays for the chat.agent run’s cold start: dequeue, process boot,Documentation Index
Fetch the complete documentation index at: https://trigger.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
onPreload / onChatStart hooks, and only then the LLM call. Two features address this from different angles.
Picking an approach
| Preload | Head Start | |
|---|---|---|
| What it does | Eagerly triggers the run before the first message | Runs step 1’s LLM call in your warm process while the agent boots in parallel |
| First-turn TTFC win | Hides agent boot if the user does send a message | ~50% reduction (LLM TTFB floor); boot fully overlaps with TTFB |
| When to fire | Page load / input focus — your call | First message arrival — automatic |
| Cost when user never sends | Idle compute until the preload window times out | Zero (no run was triggered) |
| Requires a warm server process | No — works for browser-only surfaces | Yes — your route handler runs step 1 |
| Requires LLM keys client-side? | No | No — keys stay in your warm server |
| Bundle constraints | None | Route handler must import schema-only tools (no heavy executes) |
Preload
Preload eagerly triggers a run for a chat before the first message is sent. Initialization (DB setup, context loading) happens while the user is still typing, reducing first-response latency.Frontend
Calltransport.preload(chatId) to start a run early:
accessToken callback receives { chatId } and is invoked the same way on preload as on any other refresh — no special branching by purpose. See TriggerChatTransport options.
Backend
TheonPreload hook fires immediately. The run then waits for the first message. When the user sends a message, onChatStart fires with preloaded: true so you can skip work that already ran:
chat.createSession() or raw tasks, check payload.trigger === "preload" and wait for the first message:
Head Start
Head Start runs step 1’s LLM call in your warm server process while the chat.agent run boots in parallel. The user sees one continuous turn: text first from your server, then a clean handover to the agent for tool execution and any further steps.chat.headStart returns a standard Web Fetch API handler — (req: Request) => Promise<Response> — so it slots into any runtime that speaks Web Fetch.
Verified runtimes: Node 18+, Bun, Deno, Cloudflare Workers, Vercel (Node and Edge), Netlify (Functions and Edge). The handler uses only fetch and Web ReadableStream / TransformStream (no node:* imports), and the S2 streaming dependency picks the right transport for each runtime automatically (HTTP/2 on Node/Deno, HTTP/1.1 on Bun/Workers/browsers).
Compatible frameworks (native Web Fetch): Next.js App Router, Hono, SvelteKit, Remix, React Router v7, TanStack Start, Astro, Nitro/Nuxt, Elysia. Mount the handler directly.
Node-only frameworks (Express, Fastify, Koa): the handler still works, but the framework gives you a Node IncomingMessage instead of a Web Request. Use a small adapter — examples in Mounting in your framework below.
When the first turn is pure text (no tool calls), the agent run boots and exits without ever calling an LLM. You only pay for what the conversation actually needed.
Measured TTFC
3 runs each, prompt"say hi in five words", same model both sides (Anthropic Claude Sonnet 4):
| Without Head Start | With Head Start | Δ | |
|---|---|---|---|
| TTFT (avg) | 2801 ms | 1218 ms | −57% |
| TTFT (range) | 2351–3101 ms | 1201–1252 ms | |
| Total turn | 4180 ms | 2345 ms | −44% |
How it works
Browser POSTs the first message to your route handler
The transport sees
headStart: "/api/chat" is set and there’s no session yet for this chat. It POSTs the wire payload (messages, chatId, metadata) to your route handler.Your handler creates the session and triggers the agent run
A single
apiClient.createSession round-trip both creates the chat session and triggers an agent run with trigger: "handover-prepare". The agent run boots into a wait state on session.in.Your handler runs streamText step 1
streamText runs in your warm process with stopWhen: stepCountIs(1). The output is streamed to the browser as SSE while the agent run boots in parallel. Boot time (~488ms) overlaps with LLM TTFB (~389ms), fully hidden.Mid-turn handover
On step 1’s
tool-calls finish, your handler signals the agent and the SDK splices the agent’s step-2+ stream into the same SSE response. On pure-text finish, your handler signals handover-skip and the agent run exits clean — no LLM call from the trigger side.Setup
Split your tool definitions into schemas + executes
Schemas in one module (light deps), executes in another (heavy deps). The agent task pulls in both; the route handler pulls in schemas only.
lib/chat-tools/schemas.ts
trigger/chat-tools.ts
Define your chat.agent (heavy executes)
The agent uses the full tool set — these are the executes that run when step 2+ needs them.
trigger/chat.ts
Build the head-start handler
Call Mount the handler in whatever framework you use — see Mounting in your framework below.
chat.headStart({ agentId, run }). It returns a standard Web Fetch handler: (req: Request) => Promise<Response>. Inside the run callback you call streamText yourself and spread chat.toStreamTextOptions({ tools }) to inherit the SDK-owned wiring (messages, schema-only tools, stopWhen: stepCountIs(1), abort signal). Add your own model and system on top.lib/chat-handler.ts
Mounting in your framework
chat.headStart returns a Web Fetch handler — (req: Request) => Promise<Response>. Frameworks that natively pass Web Request objects mount it as-is. Node-only frameworks (Express, Fastify, Koa) need a small adapter.
Web Fetch frameworks (recommended)
Edge / standalone runtimes
Node-only frameworks
Express, Fastify, and Koa pass NodeIncomingMessage / ServerResponse objects rather than Web Request / Response. The SDK ships chat.toNodeListener that wraps any Web Fetch handler as a Node (req, res) listener — body bytes are read upfront, headers translated, the response body streamed chunk-by-chunk, and client disconnect is propagated to the handler via AbortSignal.
Streaming response timeouts
The handler keeps the SSE response open until the agent run signals turn-complete (or skip, on a pure-text turn). Make sure your framework / serverless function timeout accommodates that:- Pure-text first turns: ~LLM TTFB (1–3 s typically).
- Tool-calling first turns: LLM step 1 + agent boot + tool execution + step 2 LLM call. Usually 5–15 s; longer for multi-step tool use.
- Vercel: default function timeout is 10 s on Hobby, 60 s on Pro. Set
export const maxDuration = N;on the route segment. - Cloudflare Workers: default 30 s CPU time (paid plans up to 5 min). Streaming wall time is generally not the bottleneck.
- AWS Lambda behind API Gateway: 29 s API Gateway hard limit; Lambda Function URL allows up to 15 min.
What gets routed where
| First turn (handover) | Subsequent turns | |
|---|---|---|
| Browser sends message via | POST to headStart URL | Direct write to session.in |
| Step 1 LLM call runs in | Your warm process | Trigger.dev agent run |
| Tool execution runs in | Trigger.dev agent run | Trigger.dev agent run |
| Step 2+ LLM call runs in | Trigger.dev agent run | Trigger.dev agent run |
onChatStart / onTurnStart fire | After handover signal arrives | Normally |
onTurnComplete fires | After turn finishes (handover) or skipped (handover-skip) | Normally |
The chat.headStart API
run callback receives:
messages: UIMessage[]— user messages parsed from the request body.signal: AbortSignal— fires when the request closes or the SDK times out the handover.chat: HeadStartChatHelper<TTools>— exposeschat.toStreamTextOptions({ tools })and achat.sessionescape hatch for power users.
chat.toStreamTextOptions({ tools }) returns options to spread into streamText. The SDK owns these keys — overriding them will break the protocol:
| Key | What the SDK sets | Why |
|---|---|---|
messages | convertToModelMessages(uiMessages) | First-turn user history |
tools | What you pass | Schema-only tools for step 1 |
stopWhen | stepCountIs(1) | Step 1 only — agent picks up step 2+ |
abortSignal | Combined request + idle timeout | Safe cleanup on disconnect |
model, system, providerOptions, prepareStep, anything else streamText accepts.
The transport option
useChat endpoint — it’s not the canonical request URL for every turn, just the first-turn shortcut.
Limitations
- First turn only. Step 2+ and turn 2+ run on the trigger side. There’s no per-turn “head start every turn” mode — the win comes from amortizing agent boot across the LLM call once.
- Single step on the warm-server side. The handler runs
stopWhen: stepCountIs(1). Multi-step handover (handler does step 1 + step 2 + …) is out of scope. - Your server needs an LLM provider key. The first-turn LLM call runs in your warm process, so that environment needs whatever keys the model requires. The agent’s executes still run on the Trigger.dev side with whatever environment variables they need there.
- Browser-only chat surfaces don’t apply. Without a warm server process, there’s nowhere to run step 1 ahead of the agent run. Use Preload or eat the cold-start tax.
- Streaming-capable runtime required. Your framework / runtime has to support streaming HTTP responses (Web Fetch
Responsebody or equivalent). Most modern hosts do — Next.js, Hono, SvelteKit, Workers, Bun, Deno, Vercel, etc. Some legacy platforms that buffer full responses won’t deliver chunks until the turn is over, which negates the TTFC benefit (correctness still holds). - Non-
useChatchat surfaces (Slack bots, Discord bots, custom protocols) don’t fit thechat.headStartshape — the API expects the AI SDK transport’s wire payload on input. For those, trigger the chat.agent directly from your bot handler.
Reference
chat.headStartfactory and types — full signatures forHeadStartRunArgs,HeadStartChatHelper,HeadStartSession,HeadStartHandlerOptions.headStarttransport option — alongsideaccessToken,startSession, etc.onPreloadhook — the backend hook that fires when a run is preloaded.

