Logs moved to ClickHouse

Logs are now stored in ClickHouse for faster performance and better scalability.

Eric Allam

Eric Allam

CTO, Trigger.dev

Image for Logs moved to ClickHouse

We’ve migrated run logs/spans to ClickHouse from PostgreSQL, which enables a faster dashboard experience and prepares us for a long requested feature: log search and filtering.

What’s new

Your open telemetry logs and spans associated with runs are now stored in ClickHouse, which is better suited for this type of data and allows more efficient querying and aggregation, which was previously not possible with logs and spans stored in PostgreSQL. Because of this move, we're able to increase the number of logs and spans we can display in the dashboard (up to 50,000 in the Trigger.dev Cloud).

Under the hood

This migration represents a fundamental shift in how we store and query OpenTelemetry data. Here's what changed architecturally:

Why ClickHouse over PostgreSQL?

Columnar vs Row-based storage: ClickHouse's columnar storage format is specifically designed for analytical workloads like log aggregation and time-series queries. Unlike PostgreSQL's row-oriented model (optimized for transactional operations), ClickHouse stores data by column, enabling:

  • Massive compression gains: Similar log attributes compress extremely well when stored together
  • Faster analytical queries: Only the columns needed for a query are read from disk
  • Better aggregation performance: Sum, count, and group-by operations are significantly faster

Schema design differences: Our new ClickHouse schema leverages native partitioning and advanced data types:


-- Actual ClickHouse table structure for task events
CREATE TABLE task_events_v1 (
environment_id String,
organization_id String,
project_id String,
task_identifier String CODEC(ZSTD(1)),
run_id String CODEC(ZSTD(1)),
start_time DateTime64(9) CODEC(Delta(8), ZSTD(1)),
trace_id String CODEC(ZSTD(1)),
span_id String CODEC(ZSTD(1)),
parent_span_id String CODEC(ZSTD(1)),
message String CODEC(ZSTD(1)),
-- Unified event type classification
kind LowCardinality(String) CODEC(ZSTD(1)),
status LowCardinality(String) CODEC(ZSTD(1)),
-- Native JSON type for complex attributes
attributes JSON CODEC(ZSTD(1)),
attributes_text String MATERIALIZED toJSONString(attributes),
metadata String CODEC(ZSTD(1)),
duration UInt64 CODEC(ZSTD(1)),
expires_at DateTime64(3),
-- Specialized indexes for different query patterns
INDEX idx_run_id run_id TYPE bloom_filter(0.001) GRANULARITY 1,
INDEX idx_span_id span_id TYPE bloom_filter(0.001) GRANULARITY 1,
INDEX idx_duration duration TYPE minmax GRANULARITY 1,
INDEX idx_attributes_text attributes_text TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8
) ENGINE = MergeTree
PARTITION BY toDate(start_time)
ORDER BY (environment_id, toUnixTimestamp(start_time), trace_id)
TTL toDateTime(expires_at) + INTERVAL 7 DAY;

Performance improvements: The migration solved the core limitation we hit with PostgreSQL around dynamic attribute search. PostgreSQL's JSON support made it difficult to efficiently search across the varied OpenTelemetry attributes that change per task and span - creating indexes for every possible attribute path wasn't feasible. Searching log messages and attributes required complex PostgreSQL queries that didn't scale well with billions of records, and grouping and filtering by dynamic JSON attributes was slow, making dashboard queries that aggregate across runs impossible.

Technical implementation details

Dual-store architecture: We implemented a feature-flagged approach that allows gradual migration:

  • Both PostgreSQL and ClickHouse event repositories implement the same IEventRepository interface
  • Environment variables control which store is used for reads/writes
  • This enables zero-downtime migration and easy rollback if needed

OpenTelemetry data handling: The new system better handles the unique characteristics of OTEL data:

  • Native JSON attributes: ClickHouse's JSON data type preserves the full structure of OpenTelemetry attributes without flattening, while the materialized attributes_text column enables full-text search
  • Advanced indexing: Bloom filters for exact ID lookups, token-based full-text search on attributes, and minmax indexes for duration queries
  • Compression optimizations: ZSTD compression with delta encoding for timestamps and specialized codecs for different data patterns
  • Time-based queries: Most log queries are time-range based, perfectly suited for ClickHouse's date partitioning

Data consistency model: Unlike PostgreSQL's ACID guarantees, ClickHouse provides eventual consistency, which is acceptable for observability data where slight delays are tolerable in exchange for massive performance gains.

Materialized views for metrics: The ClickHouse migration also enabled us to build real-time usage aggregations using materialized views:


-- Pre-aggregated usage metrics by minute and hour
CREATE MATERIALIZED VIEW mv_task_event_usage_by_minute_v1 AS
SELECT
organization_id,
project_id,
environment_id,
toStartOfMinute(start_time) AS bucket_start,
count() AS event_count
FROM task_events_v1
GROUP BY organization_id, project_id, environment_id, bucket_start;

These materialized views automatically maintain real-time usage statistics as events are inserted, powering an upcoming logs metrics feature that will give you insights into your task execution patterns and resource usage without expensive on-the-fly aggregations.

Migration challenges solved

During implementation, we addressed several technical hurdles:

  • Unicode handling: Implemented safeguards against unpaired Unicode surrogate pairs that could cause ClickHouse insert failures
  • Async insertions: Enabled asynchronous insertions to improve write throughput while maintaining data durability
  • DateTime precision: Ensured proper handling of nanosecond-precision timestamps (DateTime64(9)) for accurate trace correlation

The result is a system that can efficiently store and query millions of log entries while maintaining the real-time dashboard experience developers expect.

For self-hosted users

If you’re running Trigger.dev yourself, you’ll need to add a ClickHouse instance to your deployment and enable the new logging backend via environment variables. The migration is non-breaking, and the PostgreSQL event store remains supported until you switch over. Check the self-hosting guide for setup details and schema configuration.

Ready to start building?

Build and deploy your first task in 3 minutes.

Get started now