ScyllaDB Integration
Monitor shard-per-core metrics, reactor scheduling latency, and compaction throughput across your ScyllaDB clusters. Get AI-powered root cause analysis before latency SLOs are breached.
How It Works
Enable Prometheus Metrics Endpoint
ScyllaDB exposes Prometheus metrics natively on port 9180. Verify the endpoint is accessible and configure your TigerOps agent to scrape it — no additional exporter is required.
Deploy TigerOps Agent
Install the TigerOps agent on each ScyllaDB node or as a Kubernetes DaemonSet. The agent scrapes per-shard metrics and aggregates them with topology-aware labels for node, datacenter, and rack.
Configure Keyspace Monitoring
Specify keyspace and table patterns to monitor. TigerOps collects per-table read/write latency, SSTable counts, bloom filter hit rates, and compaction statistics at the shard level.
Set Latency and Compaction Alerts
Define p99 read/write latency SLOs and compaction backlog thresholds. TigerOps correlates reactor stalls with scheduling violations and compaction pressure to surface root causes instantly.
What You Get Out of the Box
Shard-Per-Core Visibility
Per-shard CPU utilization, task queue depth, and reactor utilization across every ScyllaDB core. Identify hot shards and cross-shard task imbalances that degrade throughput.
Reactor Scheduling Latency
Track reactor task scheduling delays (task_quota violations), stall detector activations, and preemption latency. TigerOps alerts when scheduling stalls breach your latency SLO.
Compaction Throughput & Backlog
Monitor compaction input/output byte rates, pending compaction bytes, SSTables per level, and tombstone ratios. Detect compaction debt before it causes read amplification.
Read & Write Latency Histograms
p50, p95, p99, and p999 coordinator and replica read/write latency per keyspace and table. TigerOps baselines seasonal patterns and alerts on statistical anomalies.
Cache Hit Ratios
Row cache and key cache hit rates, eviction counts, and memory utilization per keyspace. Track how cache pressure correlates with disk read amplification and latency spikes.
AI Root Cause Analysis
When latency degrades, TigerOps AI examines reactor stalls, compaction pressure, cache evictions, and network I/O concurrently to identify whether the root cause is CPU, disk, or network saturation.
TigerOps Agent Config for ScyllaDB
Configure the TigerOps agent to scrape ScyllaDB's native Prometheus endpoint on each node.
# TigerOps ScyllaDB integration config
# Place at /etc/tigerops/conf.d/scylladb.yaml
integrations:
- name: scylladb
type: prometheus_scrape
config:
# ScyllaDB exposes Prometheus metrics natively on port 9180
targets:
- scylla-node-1.internal:9180
- scylla-node-2.internal:9180
- scylla-node-3.internal:9180
# Add topology labels from Scylla REST API
topology_enrichment:
enabled: true
rest_api_port: 10000 # Scylla REST API port
# Keyspace/table filtering
metrics_filter:
keyspaces:
- my_app_keyspace
- analytics
# Per-table metrics (can be high cardinality)
per_table_metrics: true
# Metric families to collect
collect:
- scylla_reactor_*
- scylla_scheduler_*
- scylla_storage_proxy_*
- scylla_compaction_manager_*
- scylla_cache_*
- scylla_commitlog_*
scrape_interval: 15s
remote_write:
endpoint: https://ingest.atatus.net/api/v1/write
bearer_token: "${TIGEROPS_API_KEY}"
# Alert rules
alerts:
reactorStallMs: 50 # fire if any stall exceeds 50ms
compactionPendingGb: 10 # pending compaction backlog
readLatencyP99Ms: 10 # p99 read latency SLO
writeLatencyP99Ms: 5 # p99 write latency SLO
cacheHitRatioMin: 0.80 # alert if cache hit rate drops below 80%Common Questions
Which ScyllaDB versions does TigerOps support?
TigerOps supports ScyllaDB Open Source 4.x, 5.x, and 6.x, as well as ScyllaDB Enterprise and ScyllaDB Cloud. The integration uses the native Prometheus endpoint (port 9180), which is available in all supported versions.
How are per-shard metrics aggregated?
TigerOps collects raw per-shard metrics and stores them with a shard label for drill-down analysis. It also computes node-level and cluster-level aggregates automatically so dashboards work at any level of granularity without custom PromQL.
Can TigerOps detect reactor stalls in real time?
Yes. TigerOps monitors scylla_reactor_stalls and scylla_scheduler_stalls counters and fires alerts within one scrape interval (default 15s) of a stall being recorded. The alert payload includes the affected shard ID and concurrent metric context.
How does TigerOps handle multi-datacenter ScyllaDB deployments?
The TigerOps agent uses datacenter and rack labels from ScyllaDB gossip metadata to tag all metrics. You can scope dashboards and alert rules to a specific datacenter or compare cross-DC replication latency without manual label management.
Does TigerOps integrate with Scylla Manager?
Yes. TigerOps can ingest Scylla Manager repair and backup job metrics via its REST API bridge, giving you a unified view of repair progress, backup duration, and cluster health alerts alongside real-time performance metrics.
Get Full Shard-Level Visibility into ScyllaDB
Reactor scheduling latency, compaction backlog monitoring, and AI root cause analysis. Deploy in 5 minutes.