AWS EMR Integration
Monitor cluster step metrics, HDFS utilization, and Spark/MapReduce job performance across your EMR big data clusters. Get AI-powered job duration baselines and spot interruption correlation for resilient data processing.
How It Works
Create IAM Role for Metric Streams
Provision an IAM role with CloudWatch permissions scoped to the AWS/ElasticMapReduce namespace. TigerOps uses this role to deliver EMR cluster and step metrics.
Deploy CloudWatch Metric Streams
Run the TigerOps CloudFormation stack to stream the AWS/ElasticMapReduce namespace. Cluster health, HDFS, and YARN metrics begin flowing to TigerOps within minutes.
Enable Spark History Server Metrics
Configure your EMR cluster to publish Spark metrics to CloudWatch using the Spark metrics sink. TigerOps ingests executor, driver, and job-level Spark metrics automatically.
Set Cluster Saturation Alerts
Configure alerts on HDFS utilization, YARN memory pending, and task failure rates. TigerOps correlates saturation signals with specific running steps or Spark applications.
What You Get Out of the Box
Cluster Step Metrics
Running, pending, and completed step counts with step duration tracking. Receive alerts when steps fail or when step queue depth indicates a processing backlog.
HDFS Utilization
HDFS bytes read/written, HDFS utilization percentage, missing blocks, and corrupt blocks. Alert before HDFS saturation causes task failures or data loss.
YARN Resource Metrics
YARN memory available, memory allocated, pending containers, and vCores available per cluster. Identify resource contention causing job queuing or slow task scheduling.
Spark Application Performance
Active executors, failed tasks, shuffle read/write bytes, and Spark SQL query execution time. Detect data skew and executor failures in long-running Spark applications.
Node Health & Instance Metrics
Core and task node CPU utilization, memory usage, and disk I/O per instance group. Identify hot nodes causing HDFS imbalance or task execution bottlenecks.
AI Job Duration Baseline Detection
TigerOps builds per-application duration baselines and fires alerts when Spark or MapReduce jobs exceed expected runtime, catching data volume anomalies and skewed partitions.
CloudFormation Stack for EMR Metric Streams
Deploy the TigerOps CloudFormation stack and enable Spark metrics for complete EMR visibility.
# TigerOps CloudFormation — EMR Metric Streams
# aws cloudformation deploy \
# --template-file tigerops-emr-streams.yaml \
# --stack-name tigerops-emr \
# --capabilities CAPABILITY_IAM
Parameters:
TigerOpsApiKey:
Type: String
NoEcho: true
Resources:
TigerOpsEMRStream:
Type: AWS::CloudWatch::MetricStream
Properties:
Name: tigerops-emr-stream
FirehoseArn: !GetAtt TigerOpsDeliveryStream.Arn
RoleArn: !GetAtt MetricStreamRole.Arn
OutputFormat: opentelemetry0.7
IncludeFilters:
- Namespace: AWS/ElasticMapReduce
- Namespace: AWS/EMR-on-EKS
TigerOpsDeliveryStream:
Type: AWS::KinesisFirehose::DeliveryStream
Properties:
HttpEndpointDestinationConfiguration:
EndpointConfiguration:
Url: https://ingest.atatus.net/api/v1/cloudwatch
AccessKey: !Ref TigerOpsApiKey
RequestConfiguration:
CommonAttributes:
- AttributeName: service
AttributeValue: emr
RetryOptions:
DurationInSeconds: 60
# Enable Spark metrics on EMR cluster launch:
# EMR configuration classification for spark-metrics:
# [
# {
# "Classification": "spark-metrics",
# "Properties": {
# "*.sink.cloudwatch.class":
# "org.apache.spark.metrics.sink.CloudWatchSink",
# "*.sink.cloudwatch.namespace": "SparkMetrics"
# }
# }
# ]Common Questions
Which EMR metrics does TigerOps collect?
TigerOps collects all AWS/ElasticMapReduce CloudWatch metrics including IsIdle, CoreNodesRunning, HDFSUtilization, MissingBlocks, CapacityRemainingGB, YARNMemoryAvailablePercentage, ContainerAllocated, and AppsPending, plus Spark-level metrics when the CloudWatch metrics sink is configured.
Does TigerOps support EMR on EKS?
Yes. EMR on EKS virtual clusters emit metrics to CloudWatch under the AWS/EMR-on-EKS namespace. TigerOps includes this namespace in the Metric Stream filter and provides dedicated dashboards for EKS-based Spark job monitoring.
How do I get Spark executor metrics into TigerOps?
Configure the Spark CloudWatch metrics sink in your EMR cluster bootstrap action or EMR configuration classification. Set spark.metrics.conf to publish executor and driver metrics to the custom namespace, and TigerOps picks them up via the Metric Stream.
Can TigerOps monitor long-running EMR clusters vs transient clusters?
Yes. For long-running clusters, TigerOps tracks continuous metrics and applies rolling anomaly baselines. For transient job clusters, TigerOps supports job-scoped dashboards that capture the full cluster lifecycle from launch to termination.
Does TigerOps alert on EMR spot node interruptions?
Yes. TigerOps monitors the CoreNodesRunning and TaskNodesRunning metrics for sudden drops caused by spot reclamation. It correlates node count drops with increases in failed tasks and YARN container requeue events to confirm spot-related job impact.
Stop Losing Hours Debugging Slow EMR Spark Jobs Without Metrics
Cluster health, HDFS utilization, Spark metrics, and AI job baseline detection. Deploy in 5 minutes.