All Integrations
CloudCloudWatch Metric Streams + IAM

AWS EMR Integration

Monitor cluster step metrics, HDFS utilization, and Spark/MapReduce job performance across your EMR big data clusters. Get AI-powered job duration baselines and spot interruption correlation for resilient data processing.

Setup

How It Works

01

Create IAM Role for Metric Streams

Provision an IAM role with CloudWatch permissions scoped to the AWS/ElasticMapReduce namespace. TigerOps uses this role to deliver EMR cluster and step metrics.

02

Deploy CloudWatch Metric Streams

Run the TigerOps CloudFormation stack to stream the AWS/ElasticMapReduce namespace. Cluster health, HDFS, and YARN metrics begin flowing to TigerOps within minutes.

03

Enable Spark History Server Metrics

Configure your EMR cluster to publish Spark metrics to CloudWatch using the Spark metrics sink. TigerOps ingests executor, driver, and job-level Spark metrics automatically.

04

Set Cluster Saturation Alerts

Configure alerts on HDFS utilization, YARN memory pending, and task failure rates. TigerOps correlates saturation signals with specific running steps or Spark applications.

Capabilities

What You Get Out of the Box

Cluster Step Metrics

Running, pending, and completed step counts with step duration tracking. Receive alerts when steps fail or when step queue depth indicates a processing backlog.

HDFS Utilization

HDFS bytes read/written, HDFS utilization percentage, missing blocks, and corrupt blocks. Alert before HDFS saturation causes task failures or data loss.

YARN Resource Metrics

YARN memory available, memory allocated, pending containers, and vCores available per cluster. Identify resource contention causing job queuing or slow task scheduling.

Spark Application Performance

Active executors, failed tasks, shuffle read/write bytes, and Spark SQL query execution time. Detect data skew and executor failures in long-running Spark applications.

Node Health & Instance Metrics

Core and task node CPU utilization, memory usage, and disk I/O per instance group. Identify hot nodes causing HDFS imbalance or task execution bottlenecks.

AI Job Duration Baseline Detection

TigerOps builds per-application duration baselines and fires alerts when Spark or MapReduce jobs exceed expected runtime, catching data volume anomalies and skewed partitions.

Configuration

CloudFormation Stack for EMR Metric Streams

Deploy the TigerOps CloudFormation stack and enable Spark metrics for complete EMR visibility.

tigerops-emr-streams.yaml
# TigerOps CloudFormation — EMR Metric Streams
# aws cloudformation deploy \
#   --template-file tigerops-emr-streams.yaml \
#   --stack-name tigerops-emr \
#   --capabilities CAPABILITY_IAM

Parameters:
  TigerOpsApiKey:
    Type: String
    NoEcho: true

Resources:
  TigerOpsEMRStream:
    Type: AWS::CloudWatch::MetricStream
    Properties:
      Name: tigerops-emr-stream
      FirehoseArn: !GetAtt TigerOpsDeliveryStream.Arn
      RoleArn: !GetAtt MetricStreamRole.Arn
      OutputFormat: opentelemetry0.7
      IncludeFilters:
        - Namespace: AWS/ElasticMapReduce
        - Namespace: AWS/EMR-on-EKS

  TigerOpsDeliveryStream:
    Type: AWS::KinesisFirehose::DeliveryStream
    Properties:
      HttpEndpointDestinationConfiguration:
        EndpointConfiguration:
          Url: https://ingest.atatus.net/api/v1/cloudwatch
          AccessKey: !Ref TigerOpsApiKey
        RequestConfiguration:
          CommonAttributes:
            - AttributeName: service
              AttributeValue: emr
        RetryOptions:
          DurationInSeconds: 60

# Enable Spark metrics on EMR cluster launch:
# EMR configuration classification for spark-metrics:
# [
#   {
#     "Classification": "spark-metrics",
#     "Properties": {
#       "*.sink.cloudwatch.class":
#         "org.apache.spark.metrics.sink.CloudWatchSink",
#       "*.sink.cloudwatch.namespace": "SparkMetrics"
#     }
#   }
# ]
FAQ

Common Questions

Which EMR metrics does TigerOps collect?

TigerOps collects all AWS/ElasticMapReduce CloudWatch metrics including IsIdle, CoreNodesRunning, HDFSUtilization, MissingBlocks, CapacityRemainingGB, YARNMemoryAvailablePercentage, ContainerAllocated, and AppsPending, plus Spark-level metrics when the CloudWatch metrics sink is configured.

Does TigerOps support EMR on EKS?

Yes. EMR on EKS virtual clusters emit metrics to CloudWatch under the AWS/EMR-on-EKS namespace. TigerOps includes this namespace in the Metric Stream filter and provides dedicated dashboards for EKS-based Spark job monitoring.

How do I get Spark executor metrics into TigerOps?

Configure the Spark CloudWatch metrics sink in your EMR cluster bootstrap action or EMR configuration classification. Set spark.metrics.conf to publish executor and driver metrics to the custom namespace, and TigerOps picks them up via the Metric Stream.

Can TigerOps monitor long-running EMR clusters vs transient clusters?

Yes. For long-running clusters, TigerOps tracks continuous metrics and applies rolling anomaly baselines. For transient job clusters, TigerOps supports job-scoped dashboards that capture the full cluster lifecycle from launch to termination.

Does TigerOps alert on EMR spot node interruptions?

Yes. TigerOps monitors the CoreNodesRunning and TaskNodesRunning metrics for sudden drops caused by spot reclamation. It correlates node count drops with increases in failed tasks and YARN container requeue events to confirm spot-related job impact.

Get Started

Stop Losing Hours Debugging Slow EMR Spark Jobs Without Metrics

Cluster health, HDFS utilization, Spark metrics, and AI job baseline detection. Deploy in 5 minutes.