In this customer story, Justin Sun, Co-founder and CTO of Capy, explains how they use Trigger.dev to orchestrate parallel agents through an automated triage system. After hitting serverless limits and struggling with manual coordination, they built a "PM that never sleeps"; a triage agent that manages dozens of concurrent coding agents. Trigger.dev provides the durable execution and observability that makes this possible at scale.
Identifying the problem
We noticed our team constantly running multiple Claude sessions in parallel, alt-tabbing between terminals, manually checking progress and restarting failed runs. It works, but it's clunky. That's when we realized: if coding agents are going to scale, they need proper orchestration that goes beyond "open more tabs."
At Capy, we'd already built VM infrastructure for computer-use agents (Scrapybara). But watching real usage patterns (1,000+ weekly active users doing general coding tasks, not just quick websites) showed us the real problem wasn't actually the agents themselves, it was managing many of them efficiently.
The limitations of our existing approach
We started hitting fundamental limits which led us to Trigger.dev:
- Serverless timeouts broke long-running tasks. Vercel's runtime caps meant agents were cut-off mid-execution on anything substantial.
- Streaming was a hack. Our Python websockets were brittle, unobservable, and fell apart with concurrent runs.
- Manual coordination didn't scale. Starting agents, checking progress, restarting failures; our operational overhead grew with usage.
- We felt that current interfaces aren't fit for purpose. CLIs and chat UIs dump walls of text. It's not the best user experience.
We needed infrastructure that could handle hours-long execution, coordinate parallel work, and to be able to present our results in a way developers actually use.
The triage agent concept
Our solution: a triage agent that acts like an engineering manager. It takes a queue of work (features, bugs, refactors), breaks it down, assigns tasks to specialized coding agents, monitors progress, and consolidates results. This is visualised as an intelligent kanban board, with the board itself being active. Instead of manually juggling terminal sessions, the triage agent:
- Spawns child coding agents with appropriate context
- Monitors their progress via structured events
- Handles retries, timeouts, and escalations
- Aggregates outputs into actionable summaries
Building it with Trigger.dev
Trigger.dev became the execution layer that made this architecture possible:
Developer-friendly integration
- TypeScript-native SDK fit our monorepo perfectly
- Frontend stays on Vercel; long-running work moves to Trigger
- Migration from our hacky Python scripts was straightforward
Durable execution beyond serverless limits
- Coding agents run for hours, not seconds
- Step-level checkpoints mean failures don't waste completed work
- Automatic retries with exponential backoff to handle transient issues
Observable, structured workflows
- The dashboard with a detailed runs list, trace views, logs, etc
- Alerts for when things go wrong
- Advanced filtering and search
First-class orchestration
- The triage agent uses Trigger's job scheduling to manage child agents
- Concurrency limits prevent resource exhaustion
- State management and queuing happen at the platform level, not in our code
The result
We went from brittle scripts and manual coordination to a self-managing system. The triage agent spawns dozens of coding tasks in parallel, each running as a durable workflow with full observability. When something fails, we see exactly where and why. When tasks complete, results flow back to a unified view.
More importantly, we can focus on making the agents smarter instead of keeping them running. The orchestration layer just works, with no more babysitting processes or opaque debugging.