Apache Druid Integration
Monitor segment loading metrics, query broker latency, and ingestion task throughput across your Druid clusters. Get predictive alerts and AI root cause analysis before slow ingestion impacts your real-time analytics.
How It Works
Install the TigerOps Druid Agent
Deploy the TigerOps agent alongside your Druid coordinator and broker nodes. The agent auto-discovers all Druid services — historicals, middleManagers, routers, and overlords — via the service discovery API.
Enable Druid Emitter Metrics
Configure the Druid HTTP emitter to forward metrics to the TigerOps ingest endpoint. Set emitter=http in your common.runtime.properties and point it at your local TigerOps agent collector.
Configure Segment & Ingestion Alerts
Define alert thresholds for segment loading time, failed ingestion tasks, and broker query latency p99. TigerOps auto-creates dashboards for each data source it discovers in your cluster.
Correlate with Upstream Pipelines
TigerOps links Druid ingestion failures back to upstream Kafka topic lag or S3 batch job delays, giving you end-to-end pipeline visibility from raw data to query results.
What You Get Out of the Box
Segment Loading & Health
Track segment counts per data source, loading queue depth, historical node capacity utilization, and tier replication status across your entire Druid cluster.
Query Broker Latency
Per-data-source broker query latency at p50, p95, and p99. Monitor query merge time, cache hit rates, and subquery count to diagnose slow query patterns proactively.
Ingestion Task Throughput
Real-time and batch ingestion task event rates, row throughput, task failure counts, and middleManager slot utilization across all running supervisor specs.
Coordinator & Overlord Health
Coordinator duty run times, segment assignment latency, load queue size, and overlord task queue depth to ensure your cluster management plane is healthy.
Deep Storage I/O Metrics
Segment upload and download throughput to deep storage (S3, GCS, HDFS), push and pull latency, and merge task I/O rates for complete storage pipeline visibility.
AI Ingestion Root Cause Analysis
When ingestion tasks fail or fall behind, TigerOps AI correlates segment loading pressure, middleManager JVM GC pauses, and upstream Kafka lag to surface the root cause instantly.
Druid HTTP Emitter Configuration
Add these properties to your Druid common.runtime.properties to forward metrics to TigerOps.
# TigerOps Apache Druid HTTP Emitter Configuration
# Add to conf/druid/cluster/_common/common.runtime.properties
# Enable HTTP emitter
druid.emitter=http
druid.emitter.http.recipientBaseUrl=http://localhost:9411/druid/metrics
# TigerOps agent config (tigerops-agent.yaml)
receivers:
druid:
endpoint: "http://localhost:9411/druid/metrics"
auth_token: "${TIGEROPS_API_KEY}"
exporters:
tigerops:
endpoint: "https://ingest.atatus.net/api/v1/write"
bearer_token: "${TIGEROPS_API_KEY}"
send_interval: 15s
# Druid coordinator API for segment metadata
coordinator:
url: "http://druid-coordinator:8081"
poll_interval: 30s
# Overlord API for ingestion task monitoring
overlord:
url: "http://druid-overlord:8090"
poll_interval: 15s
# Alert thresholds
alerts:
segment_loading_queue_max: 500
query_latency_p99_ms: 2000
ingestion_task_failures_per_hour: 3
historical_capacity_warning_pct: 80Common Questions
Which Druid versions does TigerOps support?
TigerOps supports Apache Druid 0.22 and later via the built-in HTTP emitter. The TigerOps agent handles metric normalization across minor versions and supports both the legacy and current emitter metric namespaces automatically.
How do I monitor real-time ingestion lag from Kafka supervisors?
The TigerOps Druid agent reads supervisor status via the Druid overlord API and exposes Kafka consumer lag per topic partition. You can set lag-based alerts that fire before your ingestion falls more than N seconds behind real-time.
Can TigerOps alert me on segment availability during a rolling restart?
Yes. TigerOps tracks under-replicated and unavailable segments in real time. During rolling restarts you can configure a maintenance window that suppresses availability alerts while still recording the metric data for post-restart review.
Does TigerOps support multi-tier Druid deployments?
Yes. TigerOps discovers all historical tiers automatically and provides per-tier segment counts, capacity, and query routing metrics. You can create tier-specific alerts and dashboards to monitor hot, warm, and cold tier health independently.
How are Druid query anomalies correlated with other services?
TigerOps links Druid query latency spikes with upstream ingestion lag, historical node GC events, and downstream application error rates. The AI correlation engine surfaces the full causal chain so you know which layer is the root cause.
Stop Discovering Druid Ingestion Failures After the Fact
Segment health monitoring, ingestion task alerting, and AI root cause analysis. Deploy in 5 minutes.