All Integrations
DatabasesAPI integration

Databricks Integration

Monitor cluster utilization, job run metrics, and Delta Lake table statistics across your Databricks workspaces. Get DBU cost anomaly detection and AI pipeline failure analysis without deploying any agents.

Setup

How It Works

01

Connect via Databricks REST API

Generate a Databricks personal access token or service principal and add it to TigerOps. The API integration requires no agent deployment — TigerOps polls your workspace REST API for cluster, job, and Delta Lake metrics.

02

Configure Workspace Scopes

Select which Databricks workspaces, job namespaces, and cluster tags to monitor. TigerOps auto-discovers all job definitions, cluster pools, and SQL warehouses in your workspace.

03

Set Cost & Runtime Alerts

Define DBU spend thresholds, job runtime SLOs, and cluster auto-termination compliance rules. TigerOps fires alerts when a job run exceeds its expected duration or DBU cost envelope.

04

Monitor Delta Lake Table Health

TigerOps polls the Delta Lake transaction log to track table growth, OPTIMIZE and VACUUM operation history, Z-order clustering effectiveness, and file count bloat that impacts query performance.

Capabilities

What You Get Out of the Box

Cluster Utilization Tracking

Driver and worker node CPU, memory, and disk utilization across all-purpose and job clusters. Track cluster uptime, DBU consumption per cluster, and autoscaling event history.

Job Run Metrics & SLOs

Job run durations, failure rates, retry counts, and task-level execution times. Set per-job runtime SLOs and get alerted when a critical pipeline job misses its completion window.

Delta Lake Table Statistics

Table file counts, total size, transaction log entry rates, OPTIMIZE and VACUUM run history, and Z-order effectiveness metrics to keep your lakehouse performant.

SQL Warehouse Performance

Serverless and classic SQL warehouse query queue depth, execution times, concurrency utilization, and warehouse auto-stop compliance for cost-effective analytics workloads.

DBU Cost Anomaly Detection

Daily and hourly DBU spend tracking with AI-powered anomaly detection. TigerOps alerts when spending deviates from historical baselines and attributes the spike to the responsible cluster or job.

AI Pipeline Failure Analysis

When a Databricks job fails, TigerOps AI examines the Spark event log, driver logs, and cluster metrics to identify whether the cause was OOM, data skew, or a downstream dependency failure.

Configuration

Databricks API Integration Setup

Connect TigerOps to your Databricks workspace using the REST API — no agent deployment required.

tigerops-databricks.yaml
# TigerOps Databricks API Integration
# No agent required — configure via TigerOps dashboard or YAML

integrations:
  databricks:
    workspaces:
      - name: "production"
        host: "https://adb-1234567890.12.azuredatabricks.net"
        token_env: DATABRICKS_TOKEN_PROD
        # Optional: filter by cluster tags
        cluster_tag_filters:
          environment: production
          team: data-engineering

      - name: "staging"
        host: "https://adb-9876543210.12.azuredatabricks.net"
        token_env: DATABRICKS_TOKEN_STAGING

    # Job monitoring
    jobs:
      poll_interval: 60s
      track_task_level_metrics: true
      # Alert on jobs missing their SLO window
      slo_overrides:
        - job_name_pattern: "nightly_etl_*"
          max_duration_minutes: 120
        - job_name_pattern: "hourly_aggregation_*"
          max_duration_minutes: 45

    # Delta Lake table health
    delta_lake:
      enabled: true
      scan_schedule: "0 */6 * * *"   # every 6 hours
      catalogs:
        - name: "main"
          schemas: ["bronze", "silver", "gold"]

exporters:
  tigerops:
    endpoint: "https://ingest.atatus.net/api/v1/write"
    bearer_token: "${TIGEROPS_API_KEY}"
FAQ

Common Questions

Does TigerOps require deploying an agent inside Databricks?

No. TigerOps uses the Databricks REST API and optionally the Ganglia metrics endpoint exposed on cluster nodes. For deeper Spark metrics, you can configure the TigerOps init script to install the agent on cluster startup — this takes under 30 seconds.

Can TigerOps monitor multiple Databricks workspaces?

Yes. TigerOps supports multi-workspace monitoring for enterprise accounts. Each workspace is configured with its own API token and optional tag filters. Cross-workspace dashboards let you compare cost and performance across environments.

How does TigerOps track Delta Lake table health?

TigerOps reads the Delta Lake transaction log (_delta_log) via a lightweight Databricks notebook job that runs on a schedule. It reports file counts, small file ratios, last OPTIMIZE timestamp, and row count estimates without impacting production queries.

Can I set alerts when a specific Databricks job fails?

Yes. TigerOps monitors the Databricks Jobs API and fires alerts on job run failure, timeout, or retry exhaustion. Alerts include the failed task name, error message, cluster ID, and a direct link to the job run in the Databricks UI.

Does TigerOps support Unity Catalog and workspace-level lineage?

TigerOps integrates with Unity Catalog to enrich metrics with catalog and schema context. Table-level metrics are tagged with their Unity Catalog path so you can filter dashboards by catalog, schema, or table name for precise data observability.

Get Started

Stop Discovering Databricks Cost Spikes on Your Monthly Bill

DBU cost anomaly detection, job SLO monitoring, and Delta Lake health tracking. Connect in 5 minutes.