ClickHouse Integration
Monitor ClickHouse with MergeTree merge metrics, query performance percentiles, replication lag, and part count visibility — via the native Prometheus endpoint.
How It Works
Enable Prometheus Endpoint
Add a <prometheus> section to config.xml (or a file in config.d/) to expose ClickHouse metrics on a Prometheus-compatible HTTP endpoint. No external exporter process is required — ClickHouse has this built in.
Configure Metrics Exposure
Set endpoint, port, and the metrics/asynchronous_metrics/events/errors booleans in the prometheus config block to control which metric families are exposed. Enable all four for full visibility.
Add to TigerOps Scrape Config
Point the TigerOps Collector or your Prometheus remote_write configuration at the ClickHouse Prometheus endpoint. For replicated clusters, add a scrape target for each replica node.
Queries, Merges & Parts Flow
Within minutes TigerOps dashboards show query throughput, merge queue depth, active part counts, replication queue lag, memory usage per query, and disk I/O for MergeTree tables.
What You Get Out of the Box
MergeTree Merge Metrics
Active merge count, merge queue depth, bytes merged per second, and parts count per table from the system.merges table. Background merge storms that impact query latency are detected automatically.
Query Performance
Queries per second, query duration percentiles (P50, P95, P99), failed query count, and memory used per query from system.query_log. Slow queries are identified and surfaced in the TigerOps query explorer.
Replication Queue & Lag
Replication queue depth, replica lag in seconds, replication errors, and node-level replication health for ReplicatedMergeTree and ReplicatedReplacingMergeTree tables across all shards.
Part Count & Storage
Active part count, inactive part count, total bytes on disk, and bytes in memory for each MergeTree table. Excessive part counts that degrade query performance are flagged with recommended merge actions.
Memory & Resource Utilization
ClickHouse process memory usage, jemalloc allocator stats, background thread counts, and file descriptor usage. Memory-intensive queries are correlated with memory metric spikes.
ZooKeeper / ClickHouse Keeper
ZooKeeper/Keeper session count, outstanding requests, latency, and node count metrics for replicated table coordination. High ZooKeeper latency events are correlated with replication delays.
Enable Prometheus Endpoint
Add five lines to config.xml and ClickHouse starts exposing metrics immediately.
<!-- config.xml — enable the Prometheus endpoint -->
<clickhouse>
<prometheus>
<endpoint>/metrics</endpoint>
<port>9363</port>
<metrics>true</metrics>
<asynchronous_metrics>true</asynchronous_metrics>
<events>true</events>
<errors>true</errors>
</prometheus>
</clickhouse>
<!-- Verify metrics are exposed -->
<!-- curl http://localhost:9363/metrics | head -20 -->
# TigerOps Collector config (otel-collector.yaml)
receivers:
prometheus:
config:
scrape_configs:
- job_name: clickhouse
scrape_interval: 15s
static_configs:
- targets:
- clickhouse-shard1-replica1:9363
- clickhouse-shard1-replica2:9363
- clickhouse-shard2-replica1:9363
exporters:
otlphttp:
endpoint: https://ingest.tigerops.io/v1/metrics
headers:
Authorization: "Bearer ${TIGEROPS_API_KEY}"
service:
pipelines:
metrics:
receivers: [prometheus]
exporters: [otlphttp]
# Key metrics to alert on
# ClickHouseMetrics_ReplicasMaxQueueSize > 1000 → replication lag
# ClickHouseAsyncMetrics_NumberOfPartsTotal > 5000 → too many parts
# ClickHouseProfileEvents_QueryMemoryLimitExceeded → OOM queries
# ClickHouseMetrics_BackgroundMergesAndMutationsPoolTask → merge queueCommon Questions
Which ClickHouse versions support the built-in Prometheus endpoint?
The native Prometheus endpoint has been available since ClickHouse 20.1. For older versions, use the clickhouse_exporter third-party exporter. ClickHouse 22.x and 23.x are fully supported with the native endpoint.
Does TigerOps support ClickHouse Cloud?
ClickHouse Cloud exposes metrics via its monitoring API. TigerOps supports ClickHouse Cloud via a cloud metrics integration that polls the ClickHouse Cloud metrics API using your service credentials.
How does TigerOps handle ClickHouse clusters with multiple shards?
Add a scrape target for each ClickHouse shard replica. TigerOps aggregates metrics across shards and provides both cluster-level aggregate views and per-shard breakdowns in dashboards.
Can TigerOps alert on slow ClickHouse queries?
Yes. Enable the query_log system table and configure TigerOps to scrape system.query_log via the HTTP interface. Queries exceeding a configurable duration threshold trigger alerts with the query hash, user, and execution plan.
What is the performance impact of enabling all metric families?
Enabling all four metric families (metrics, asynchronous_metrics, events, errors) adds minimal overhead — under 1% CPU at 15-second scrape intervals. The asynchronous_metrics are pre-calculated by a background thread, so serving them is essentially free.
Full ClickHouse Observability via Native Prometheus
MergeTree metrics, query performance, replication lag, and part counts — five config lines to enable.