All Integrations
CloudCloudWatch Metric Streams + IAM

AWS Glue Integration

Monitor ETL job run metrics, DPU utilization, and crawler execution across your AWS Glue data integration pipelines. Get AI-powered duration baselines and data quality correlation before pipeline delays impact downstream consumers.

Setup

How It Works

01

Enable Glue Job Metrics

Enable continuous logging and monitoring for your Glue jobs. Set the --enable-metrics job argument to activate CloudWatch metrics publishing to the Glue namespace.

02

Deploy CloudWatch Metric Streams

Run the TigerOps CloudFormation stack to stream the Glue namespace metrics to TigerOps. ETL job run stats, DPU consumption, and crawler results begin flowing immediately.

03

Tag Jobs and Crawlers

Apply pipeline and team tags to your Glue jobs and crawlers. TigerOps uses these tags to route alerts and build cost attribution dashboards per data pipeline.

04

Configure Job Duration and Failure Alerts

Set SLOs on job run duration and failure rates. TigerOps fires alerts when jobs exceed historical duration baselines or when failure rates spike above configured thresholds.

Capabilities

What You Get Out of the Box

ETL Job Run Metrics

Job run duration, bytes read/written, records processed, and shuffle bytes per Glue job. Track ETL throughput trends and detect performance regressions in data pipelines.

DPU Utilization Tracking

Data Processing Unit consumption per job run with historical trending. Identify jobs consuming more DPUs than expected and optimize worker allocation to reduce costs.

Crawler Execution Monitoring

Crawler run duration, tables created/updated/deleted, and partition counts per crawl. Detect schema changes and unexpected table drops from catalog crawler runs.

Job Failure Rate Analysis

Track Glue job failure rates by job name, failure reason, and error class. Group systemic failures to distinguish data quality issues from infrastructure problems.

Spark Driver & Executor Metrics

Spark driver memory, executor active tasks, shuffle read/write bytes, and GC time for Glue 3.0+ jobs. Deep JVM visibility for Spark-based ETL workloads.

AI Pipeline Duration Baselines

TigerOps builds per-job duration baselines from historical runs and alerts when a job runs significantly longer than expected, catching data skew and hot partition issues early.

Configuration

CloudFormation Stack for Glue Metric Streams

Deploy the TigerOps CloudFormation stack to stream Glue ETL job and crawler metrics in minutes.

tigerops-glue-streams.yaml
# TigerOps CloudFormation — AWS Glue Metric Streams
# aws cloudformation deploy \
#   --template-file tigerops-glue-streams.yaml \
#   --stack-name tigerops-glue \
#   --capabilities CAPABILITY_IAM

Parameters:
  TigerOpsApiKey:
    Type: String
    NoEcho: true

Resources:
  TigerOpsGlueStream:
    Type: AWS::CloudWatch::MetricStream
    Properties:
      Name: tigerops-glue-stream
      FirehoseArn: !GetAtt TigerOpsDeliveryStream.Arn
      RoleArn: !GetAtt MetricStreamRole.Arn
      OutputFormat: opentelemetry0.7
      IncludeFilters:
        - Namespace: Glue
        - Namespace: AWS/Glue

  TigerOpsDeliveryStream:
    Type: AWS::KinesisFirehose::DeliveryStream
    Properties:
      HttpEndpointDestinationConfiguration:
        EndpointConfiguration:
          Url: https://ingest.atatus.net/api/v1/cloudwatch
          AccessKey: !Ref TigerOpsApiKey
        RequestConfiguration:
          CommonAttributes:
            - AttributeName: service
              AttributeValue: glue
        RetryOptions:
          DurationInSeconds: 60

# Enable metrics on existing Glue jobs via CLI:
# aws glue update-job --job-name my-etl-job \
#   --job-update DefaultArguments='{
#     "--enable-metrics": "",
#     "--enable-continuous-cloudwatch-log": "true",
#     "--enable-continuous-log-filter": "true"
#   }'
FAQ

Common Questions

What Glue metrics does TigerOps collect?

TigerOps collects Glue job system metrics (--enable-metrics) including glue.driver.aggregate.bytesRead, glue.driver.aggregate.recordsRead, glue.driver.aggregate.shuffleBytesWritten, glue.driver.jvm.heap.usage, and executor metrics, plus crawler run CloudWatch metrics from the AWS/Glue namespace.

Does TigerOps support Glue Streaming jobs?

Yes. Glue Streaming ETL jobs emit continuous metrics including micro-batch processing time, records per batch, and backlog bytes. TigerOps provides dedicated streaming job dashboards separate from batch ETL job dashboards.

Can TigerOps monitor Glue Data Quality results?

Yes. Glue Data Quality rule evaluation results are published to CloudWatch as custom metrics. TigerOps ingests these alongside operational metrics to give you data quality pass/fail rates alongside job performance in the same dashboard.

How does TigerOps handle Glue job bookmarks and reprocessing detection?

TigerOps tracks the bytes read per job run over time. When a bookmark reset causes a job to reprocess a significantly larger dataset, TigerOps detects the anomaly in DPU consumption and run duration and surfaces it as an alert.

Can I correlate Glue job performance with downstream data freshness?

Yes. TigerOps lets you link Glue job completion events with downstream metrics from Redshift, Athena, or Glue tables. If a Glue ETL delay causes stale data in downstream systems, TigerOps captures the full dependency chain.

Get Started

Stop Discovering Glue Pipeline Delays After Downstream Data Goes Stale

ETL job metrics, DPU cost tracking, and AI duration anomaly detection. Deploy in 5 minutes.