AWS Glue Integration
Monitor ETL job run metrics, DPU utilization, and crawler execution across your AWS Glue data integration pipelines. Get AI-powered duration baselines and data quality correlation before pipeline delays impact downstream consumers.
How It Works
Enable Glue Job Metrics
Enable continuous logging and monitoring for your Glue jobs. Set the --enable-metrics job argument to activate CloudWatch metrics publishing to the Glue namespace.
Deploy CloudWatch Metric Streams
Run the TigerOps CloudFormation stack to stream the Glue namespace metrics to TigerOps. ETL job run stats, DPU consumption, and crawler results begin flowing immediately.
Tag Jobs and Crawlers
Apply pipeline and team tags to your Glue jobs and crawlers. TigerOps uses these tags to route alerts and build cost attribution dashboards per data pipeline.
Configure Job Duration and Failure Alerts
Set SLOs on job run duration and failure rates. TigerOps fires alerts when jobs exceed historical duration baselines or when failure rates spike above configured thresholds.
What You Get Out of the Box
ETL Job Run Metrics
Job run duration, bytes read/written, records processed, and shuffle bytes per Glue job. Track ETL throughput trends and detect performance regressions in data pipelines.
DPU Utilization Tracking
Data Processing Unit consumption per job run with historical trending. Identify jobs consuming more DPUs than expected and optimize worker allocation to reduce costs.
Crawler Execution Monitoring
Crawler run duration, tables created/updated/deleted, and partition counts per crawl. Detect schema changes and unexpected table drops from catalog crawler runs.
Job Failure Rate Analysis
Track Glue job failure rates by job name, failure reason, and error class. Group systemic failures to distinguish data quality issues from infrastructure problems.
Spark Driver & Executor Metrics
Spark driver memory, executor active tasks, shuffle read/write bytes, and GC time for Glue 3.0+ jobs. Deep JVM visibility for Spark-based ETL workloads.
AI Pipeline Duration Baselines
TigerOps builds per-job duration baselines from historical runs and alerts when a job runs significantly longer than expected, catching data skew and hot partition issues early.
CloudFormation Stack for Glue Metric Streams
Deploy the TigerOps CloudFormation stack to stream Glue ETL job and crawler metrics in minutes.
# TigerOps CloudFormation — AWS Glue Metric Streams
# aws cloudformation deploy \
# --template-file tigerops-glue-streams.yaml \
# --stack-name tigerops-glue \
# --capabilities CAPABILITY_IAM
Parameters:
TigerOpsApiKey:
Type: String
NoEcho: true
Resources:
TigerOpsGlueStream:
Type: AWS::CloudWatch::MetricStream
Properties:
Name: tigerops-glue-stream
FirehoseArn: !GetAtt TigerOpsDeliveryStream.Arn
RoleArn: !GetAtt MetricStreamRole.Arn
OutputFormat: opentelemetry0.7
IncludeFilters:
- Namespace: Glue
- Namespace: AWS/Glue
TigerOpsDeliveryStream:
Type: AWS::KinesisFirehose::DeliveryStream
Properties:
HttpEndpointDestinationConfiguration:
EndpointConfiguration:
Url: https://ingest.atatus.net/api/v1/cloudwatch
AccessKey: !Ref TigerOpsApiKey
RequestConfiguration:
CommonAttributes:
- AttributeName: service
AttributeValue: glue
RetryOptions:
DurationInSeconds: 60
# Enable metrics on existing Glue jobs via CLI:
# aws glue update-job --job-name my-etl-job \
# --job-update DefaultArguments='{
# "--enable-metrics": "",
# "--enable-continuous-cloudwatch-log": "true",
# "--enable-continuous-log-filter": "true"
# }'Common Questions
What Glue metrics does TigerOps collect?
TigerOps collects Glue job system metrics (--enable-metrics) including glue.driver.aggregate.bytesRead, glue.driver.aggregate.recordsRead, glue.driver.aggregate.shuffleBytesWritten, glue.driver.jvm.heap.usage, and executor metrics, plus crawler run CloudWatch metrics from the AWS/Glue namespace.
Does TigerOps support Glue Streaming jobs?
Yes. Glue Streaming ETL jobs emit continuous metrics including micro-batch processing time, records per batch, and backlog bytes. TigerOps provides dedicated streaming job dashboards separate from batch ETL job dashboards.
Can TigerOps monitor Glue Data Quality results?
Yes. Glue Data Quality rule evaluation results are published to CloudWatch as custom metrics. TigerOps ingests these alongside operational metrics to give you data quality pass/fail rates alongside job performance in the same dashboard.
How does TigerOps handle Glue job bookmarks and reprocessing detection?
TigerOps tracks the bytes read per job run over time. When a bookmark reset causes a job to reprocess a significantly larger dataset, TigerOps detects the anomaly in DPU consumption and run duration and surfaces it as an alert.
Can I correlate Glue job performance with downstream data freshness?
Yes. TigerOps lets you link Glue job completion events with downstream metrics from Redshift, Athena, or Glue tables. If a Glue ETL delay causes stale data in downstream systems, TigerOps captures the full dependency chain.
Stop Discovering Glue Pipeline Delays After Downstream Data Goes Stale
ETL job metrics, DPU cost tracking, and AI duration anomaly detection. Deploy in 5 minutes.