All Integrations
CloudCloudWatch Metric Streams + IAM

AWS Step Functions Integration

Monitor workflow execution metrics, state transition latency, and failure tracking. X-Ray integration gives you per-state latency breakdowns with AI regression detection for workflow deployments.

Setup

How It Works

01

Deploy CloudFormation Stack

Launch the TigerOps CloudFormation template to configure Metric Streams for the AWS/States namespace and grant read permissions for state machine execution history.

02

Enable X-Ray Tracing

Enable AWS X-Ray tracing on your state machines. TigerOps ingests the X-Ray traces and correlates them with execution metrics to give per-state-transition latency breakdowns.

03

Configure Execution Logging

Enable Step Functions execution logging to CloudWatch Logs at the ERROR or ALL level. TigerOps subscribes to the log group and indexes execution events for fast incident investigation.

04

Set SLA Alerts per State Machine

Define execution duration SLOs and failure rate thresholds per state machine. TigerOps fires alerts when execution duration breaches your SLA and provides the failed execution ARN for immediate investigation.

Capabilities

What You Get Out of the Box

Execution Success & Failure Rates

ExecutionsSucceeded, ExecutionsFailed, ExecutionsTimedOut, and ExecutionsAborted per state machine. TigerOps alerts on failure rate increases and links each failure to the execution ARN and error cause.

Execution Duration Tracking

p50, p95, and p99 execution duration per state machine. TigerOps alerts when execution duration SLOs are breached and identifies whether the slowdown is in a specific state transition or the overall orchestration.

State-Level Latency

Per-state duration tracked via X-Ray integration. TigerOps identifies the slowest states in your workflows — typically Lambda invoke timeouts or external service waits — and surfaces them in a state-level flame chart.

Failed Execution Analysis

Failed execution events enriched with the error cause, caught exception message, and the state where the failure occurred. TigerOps groups repeated failures by error type for efficient triage.

Concurrent Execution Visibility

ExecutionsThrottled and concurrent execution count tracked against your account limits. TigerOps alerts when concurrency approaches limits and projects when throttling will impact workflow throughput.

AI Workflow Regression Detection

TigerOps AI baselines execution duration distributions per state machine and alerts when a deployment causes statistically significant execution slowdowns or increased failure rates.

Configuration

Step Functions Logging & Tracing Setup

Enable execution logging and X-Ray tracing on your state machines.

step-functions-setup.sh
# Enable execution logging and X-Ray tracing on a state machine
aws stepfunctions update-state-machine \
  --state-machine-arn arn:aws:states:us-east-1:123456789:stateMachine:my-workflow \
  --logging-configuration '{
    "level": "ERROR",
    "includeExecutionData": false,
    "destinations": [{
      "cloudWatchLogsLogGroup": {
        "logGroupArn": "arn:aws:logs:us-east-1:123456789:log-group:/aws/states/my-workflow:*"
      }
    }]
  }' \
  --tracing-configuration '{"enabled": true}'

# Deploy TigerOps Step Functions monitoring stack
aws cloudformation deploy \
  --template-url https://tigerops-cfn.s3.amazonaws.com/stepfunctions-integration.yaml \
  --stack-name tigerops-stepfunctions \
  --capabilities CAPABILITY_IAM \
  --parameter-overrides \
    TigerOpsApiKey=${TIGEROPS_API_KEY} \
    ExecutionDurationP99ThresholdSeconds=30 \
    FailureRateWarningPercent=1 \
    FailureRateCriticalPercent=5

# Create CloudWatch log group for execution logs
aws logs create-log-group --log-group-name /aws/states/my-workflow
aws logs put-retention-policy \
  --log-group-name /aws/states/my-workflow \
  --retention-in-days 14
FAQ

Common Questions

Does TigerOps support both Standard and Express Step Functions workflows?

Yes. For Standard workflows, TigerOps collects CloudWatch metrics and can query execution history via the Step Functions API. For Express workflows, TigerOps relies on CloudWatch Logs (execution logging must be enabled) since Express execution history is not stored in the Step Functions API.

How does TigerOps get per-state latency for Step Functions?

TigerOps uses AWS X-Ray traces from Step Functions. When X-Ray is enabled on a state machine, each state transition generates a subsegment with its duration. TigerOps ingests these traces and aggregates per-state latency statistics.

Can TigerOps alert on a specific state failing repeatedly?

Yes. TigerOps parses Step Functions execution logs from CloudWatch and can alert when a specific state (by name) has a failure rate above your configured threshold within a rolling time window.

How does TigerOps correlate Step Functions failures with downstream service issues?

When a Step Functions execution fails in a Lambda invoke state, TigerOps links the execution failure to the corresponding Lambda function invocation error via X-Ray trace IDs. This gives you the Lambda error message and logs in the same incident view.

Does TigerOps support monitoring Step Functions activity tasks?

Yes. Activity task metrics (ActivitiesScheduled, ActivitiesStarted, ActivitiesSucceeded, ActivitiesFailed, ActivityScheduleTime) are collected via Metric Streams. TigerOps alerts when the activity schedule time grows, indicating workers are not polling fast enough.

Get Started

Full Visibility Into Every Workflow Execution

Execution metrics, state-level latency, and failure analysis. Connect your state machines in minutes.