DatabasesTigerOps agent + HTTP emitter

Apache Druid Integration

Monitor segment loading metrics, query broker latency, and ingestion task throughput across your Druid clusters. Get predictive alerts and AI root cause analysis before slow ingestion impacts your real-time analytics.

Monitor Apache Druid Book a Demo

Setup

How It Works

Install the TigerOps Druid Agent

Deploy the TigerOps agent alongside your Druid coordinator and broker nodes. The agent auto-discovers all Druid services — historicals, middleManagers, routers, and overlords — via the service discovery API.

Enable Druid Emitter Metrics

Configure the Druid HTTP emitter to forward metrics to the TigerOps ingest endpoint. Set emitter=http in your common.runtime.properties and point it at your local TigerOps agent collector.

Configure Segment & Ingestion Alerts

Define alert thresholds for segment loading time, failed ingestion tasks, and broker query latency p99. TigerOps auto-creates dashboards for each data source it discovers in your cluster.

Correlate with Upstream Pipelines

TigerOps links Druid ingestion failures back to upstream Kafka topic lag or S3 batch job delays, giving you end-to-end pipeline visibility from raw data to query results.

Capabilities

What You Get Out of the Box

Segment Loading & Health

Track segment counts per data source, loading queue depth, historical node capacity utilization, and tier replication status across your entire Druid cluster.

Query Broker Latency

Per-data-source broker query latency at p50, p95, and p99. Monitor query merge time, cache hit rates, and subquery count to diagnose slow query patterns proactively.

Ingestion Task Throughput

Real-time and batch ingestion task event rates, row throughput, task failure counts, and middleManager slot utilization across all running supervisor specs.

Coordinator & Overlord Health

Coordinator duty run times, segment assignment latency, load queue size, and overlord task queue depth to ensure your cluster management plane is healthy.

Deep Storage I/O Metrics

Segment upload and download throughput to deep storage (S3, GCS, HDFS), push and pull latency, and merge task I/O rates for complete storage pipeline visibility.

AI Ingestion Root Cause Analysis

When ingestion tasks fail or fall behind, TigerOps AI correlates segment loading pressure, middleManager JVM GC pauses, and upstream Kafka lag to surface the root cause instantly.

Configuration

Druid HTTP Emitter Configuration

Add these properties to your Druid common.runtime.properties to forward metrics to TigerOps.

common.runtime.properties

# TigerOps Apache Druid HTTP Emitter Configuration
# Add to conf/druid/cluster/_common/common.runtime.properties

# Enable HTTP emitter
druid.emitter=http
druid.emitter.http.recipientBaseUrl=http://localhost:9411/druid/metrics

# TigerOps agent config (tigerops-agent.yaml)
receivers:
  druid:
    endpoint: "http://localhost:9411/druid/metrics"
    auth_token: "${TIGEROPS_API_KEY}"

exporters:
  tigerops:
    endpoint: "https://ingest.atatus.net/api/v1/write"
    bearer_token: "${TIGEROPS_API_KEY}"
    send_interval: 15s

# Druid coordinator API for segment metadata
coordinator:
  url: "http://druid-coordinator:8081"
  poll_interval: 30s

# Overlord API for ingestion task monitoring
overlord:
  url: "http://druid-overlord:8090"
  poll_interval: 15s

# Alert thresholds
alerts:
  segment_loading_queue_max: 500
  query_latency_p99_ms: 2000
  ingestion_task_failures_per_hour: 3
  historical_capacity_warning_pct: 80

FAQ

Common Questions

Which Druid versions does TigerOps support?

TigerOps supports Apache Druid 0.22 and later via the built-in HTTP emitter. The TigerOps agent handles metric normalization across minor versions and supports both the legacy and current emitter metric namespaces automatically.

How do I monitor real-time ingestion lag from Kafka supervisors?

The TigerOps Druid agent reads supervisor status via the Druid overlord API and exposes Kafka consumer lag per topic partition. You can set lag-based alerts that fire before your ingestion falls more than N seconds behind real-time.

Can TigerOps alert me on segment availability during a rolling restart?

Yes. TigerOps tracks under-replicated and unavailable segments in real time. During rolling restarts you can configure a maintenance window that suppresses availability alerts while still recording the metric data for post-restart review.

Does TigerOps support multi-tier Druid deployments?

Yes. TigerOps discovers all historical tiers automatically and provides per-tier segment counts, capacity, and query routing metrics. You can create tier-specific alerts and dashboards to monitor hot, warm, and cold tier health independently.

How are Druid query anomalies correlated with other services?

TigerOps links Druid query latency spikes with upstream ingestion lag, historical node GC events, and downstream application error rates. The AI correlation engine surfaces the full causal chain so you know which layer is the root cause.

Get Started

Stop Discovering Druid Ingestion Failures After the Fact

Segment health monitoring, ingestion task alerting, and AI root cause analysis. Deploy in 5 minutes.

Start Free Talk to an Engineer

Explore More

Related Integrations

Neo4jDatabases

ElasticsearchDatabases

SingleStoreDatabases

SnowflakeDatabases

InfluxDBDatabases

Trino / PrestoDatabases

TemporalMessaging

EmailAlerting

View all 275+ integrations