All Integrations
DatabasesTigerOps agent

YugabyteDB Integration

Monitor tablet leader distribution, YSQL query performance, and DocDB compaction stats across your YugabyteDB clusters. Get AI-powered root cause analysis for distributed SQL and NoSQL workloads.

Setup

How It Works

01

Scrape YugabyteDB Prometheus Endpoints

YugabyteDB YB-Master and YB-TServer expose native Prometheus metrics on ports 7000 and 9000 respectively. The TigerOps agent scrapes both for complete cluster visibility — no additional exporters required.

02

Deploy TigerOps Agent

Install the TigerOps agent as a Kubernetes DaemonSet or on each YugabyteDB node. The agent enriches metrics with cluster topology labels — zone, region, and node type — from the YB-Master API.

03

Enable YSQL and YCQL Monitoring

Configure the agent to collect YSQL query latency metrics from pg_stat_statements and YCQL metrics from the YB-TServer endpoint. Both API layers are tracked independently for unified visibility.

04

Set Tablet and Latency Alerts

Define SLOs for tablet leader imbalance, YSQL p99 latency, and DocDB compaction backlog. TigerOps correlates Raft leader elections with query latency spikes to identify root causes instantly.

Capabilities

What You Get Out of the Box

Tablet Leader Distribution

Per-node tablet leader counts, under-replicated tablet tracking, and leader load balance scores. TigerOps detects leader imbalance across nodes and zones that causes uneven query latency.

YSQL Query Performance

Per-query p50/p95/p99 latency via pg_stat_statements, connection counts by state, and transaction throughput. Surface slow YSQL queries and correlate them with DocDB read/write amplification.

DocDB Compaction Statistics

Compaction pending bytes, compaction input/output throughput, SSTable file counts per level, and write amplification ratios. Alert when compaction debt is building faster than it is being resolved.

Raft Replication Health

Per-tablet Raft log lag, consensus round-trip latency, and leader election frequency. TigerOps tracks Raft heartbeat timeouts and surfaces tablets experiencing replication instability.

Read & Write Path Latency

DocDB read and write latency histograms at the RocksDB layer, including bloom filter miss rates, block cache hit ratios, and write buffer flush durations per tablet server.

AI Root Cause Analysis

When YSQL query latency degrades, TigerOps AI examines DocDB compaction pressure, Raft election events, tablet leader movement, and connection pool saturation simultaneously to identify the root cause.

Configuration

TigerOps Agent Config for YugabyteDB

Scrape YB-Master and YB-TServer Prometheus endpoints for complete YugabyteDB cluster visibility.

tigerops-yugabytedb.yaml
# TigerOps YugabyteDB integration config
# Place at /etc/tigerops/conf.d/yugabytedb.yaml

integrations:
  # YB-Master metrics (cluster topology, tablet distribution)
  - name: yugabyte-master
    type: prometheus_scrape
    config:
      targets:
        - yb-master-0.yb-masters.yb-demo.svc.cluster.local:7000
        - yb-master-1.yb-masters.yb-demo.svc.cluster.local:7000
        - yb-master-2.yb-masters.yb-demo.svc.cluster.local:7000
      labels:
        component: yb-master
      metric_filters:
        - leader_*
        - tablet_*
        - consensus_*
        - server_*
    scrape_interval: 15s

  # YB-TServer metrics (DocDB, YSQL, YCQL)
  - name: yugabyte-tserver
    type: prometheus_scrape
    config:
      targets:
        - yb-tserver-0.yb-tservers.yb-demo.svc.cluster.local:9000
        - yb-tserver-1.yb-tservers.yb-demo.svc.cluster.local:9000
        - yb-tserver-2.yb-tservers.yb-demo.svc.cluster.local:9000
      labels:
        component: yb-tserver
      metric_filters:
        - rocksdb_*          # DocDB storage metrics
        - handler_latency_*  # YSQL/YCQL query latency
        - rpcs_in_queue_*    # RPC queue depths
        - tablet_*           # Tablet-level metrics
        - raft_*             # Raft consensus metrics
    scrape_interval: 15s

  # YSQL query metrics via pg_stat_statements
  - name: yugabyte-ysql
    type: postgres
    config:
      host: yb-tservers.yb-demo.svc.cluster.local
      port: 5433
      database: yugabyte
      user: tigerops_monitor
      password: "${YUGABYTE_MONITOR_PASSWORD}"
      collect_pg_stat_statements: true
    scrape_interval: 60s

remote_write:
  endpoint: https://ingest.atatus.net/api/v1/write
  bearer_token: "${TIGEROPS_API_KEY}"
FAQ

Common Questions

Which YugabyteDB versions does TigerOps support?

TigerOps supports YugabyteDB 2.14 and later, including YugabyteDB Anywhere (managed) and YugabyteDB Aeon cloud service. The Prometheus endpoints used are available in all supported versions on both YB-Master and YB-TServer.

How does TigerOps monitor tablet leader rebalancing?

TigerOps tracks the leader_count metric per TServer and computes a leader balance score across the cluster. When the YB-Master initiates leader rebalancing (after node additions or failures), TigerOps records the event and shows its impact on query routing latency.

Can TigerOps detect DocDB write stalls?

Yes. TigerOps monitors rocksdb_write_stall and rocksdb_write_slowdown metrics from the YB-TServer endpoint. Write stalls are immediately correlated with compaction backlog metrics and YSQL write latency to confirm the impact on application performance.

How does TigerOps handle multi-zone YugabyteDB deployments?

The TigerOps agent reads zone and region placement labels from the YB-Master cluster configuration API and applies them to all metrics. You can filter dashboards by zone, compare cross-zone latency, and detect zone-level tablet leader skew without manual configuration.

Does TigerOps support YCQL monitoring alongside YSQL?

Yes. TigerOps collects YCQL-specific metrics including CQL request rates, request latency per operation type (SELECT, INSERT, UPDATE), and CQL connection counts from the YB-TServer Prometheus endpoint. Both YSQL and YCQL metrics appear in unified dashboards.

Get Started

Get Full-Stack Visibility Into Your YugabyteDB Cluster

Tablet leader monitoring, DocDB compaction tracking, and AI root cause analysis. Deploy in 5 minutes.