AWS Step Functions Integration
Monitor workflow execution metrics, state transition latency, and failure tracking. X-Ray integration gives you per-state latency breakdowns with AI regression detection for workflow deployments.
How It Works
Deploy CloudFormation Stack
Launch the TigerOps CloudFormation template to configure Metric Streams for the AWS/States namespace and grant read permissions for state machine execution history.
Enable X-Ray Tracing
Enable AWS X-Ray tracing on your state machines. TigerOps ingests the X-Ray traces and correlates them with execution metrics to give per-state-transition latency breakdowns.
Configure Execution Logging
Enable Step Functions execution logging to CloudWatch Logs at the ERROR or ALL level. TigerOps subscribes to the log group and indexes execution events for fast incident investigation.
Set SLA Alerts per State Machine
Define execution duration SLOs and failure rate thresholds per state machine. TigerOps fires alerts when execution duration breaches your SLA and provides the failed execution ARN for immediate investigation.
What You Get Out of the Box
Execution Success & Failure Rates
ExecutionsSucceeded, ExecutionsFailed, ExecutionsTimedOut, and ExecutionsAborted per state machine. TigerOps alerts on failure rate increases and links each failure to the execution ARN and error cause.
Execution Duration Tracking
p50, p95, and p99 execution duration per state machine. TigerOps alerts when execution duration SLOs are breached and identifies whether the slowdown is in a specific state transition or the overall orchestration.
State-Level Latency
Per-state duration tracked via X-Ray integration. TigerOps identifies the slowest states in your workflows — typically Lambda invoke timeouts or external service waits — and surfaces them in a state-level flame chart.
Failed Execution Analysis
Failed execution events enriched with the error cause, caught exception message, and the state where the failure occurred. TigerOps groups repeated failures by error type for efficient triage.
Concurrent Execution Visibility
ExecutionsThrottled and concurrent execution count tracked against your account limits. TigerOps alerts when concurrency approaches limits and projects when throttling will impact workflow throughput.
AI Workflow Regression Detection
TigerOps AI baselines execution duration distributions per state machine and alerts when a deployment causes statistically significant execution slowdowns or increased failure rates.
Step Functions Logging & Tracing Setup
Enable execution logging and X-Ray tracing on your state machines.
# Enable execution logging and X-Ray tracing on a state machine
aws stepfunctions update-state-machine \
--state-machine-arn arn:aws:states:us-east-1:123456789:stateMachine:my-workflow \
--logging-configuration '{
"level": "ERROR",
"includeExecutionData": false,
"destinations": [{
"cloudWatchLogsLogGroup": {
"logGroupArn": "arn:aws:logs:us-east-1:123456789:log-group:/aws/states/my-workflow:*"
}
}]
}' \
--tracing-configuration '{"enabled": true}'
# Deploy TigerOps Step Functions monitoring stack
aws cloudformation deploy \
--template-url https://tigerops-cfn.s3.amazonaws.com/stepfunctions-integration.yaml \
--stack-name tigerops-stepfunctions \
--capabilities CAPABILITY_IAM \
--parameter-overrides \
TigerOpsApiKey=${TIGEROPS_API_KEY} \
ExecutionDurationP99ThresholdSeconds=30 \
FailureRateWarningPercent=1 \
FailureRateCriticalPercent=5
# Create CloudWatch log group for execution logs
aws logs create-log-group --log-group-name /aws/states/my-workflow
aws logs put-retention-policy \
--log-group-name /aws/states/my-workflow \
--retention-in-days 14Common Questions
Does TigerOps support both Standard and Express Step Functions workflows?
Yes. For Standard workflows, TigerOps collects CloudWatch metrics and can query execution history via the Step Functions API. For Express workflows, TigerOps relies on CloudWatch Logs (execution logging must be enabled) since Express execution history is not stored in the Step Functions API.
How does TigerOps get per-state latency for Step Functions?
TigerOps uses AWS X-Ray traces from Step Functions. When X-Ray is enabled on a state machine, each state transition generates a subsegment with its duration. TigerOps ingests these traces and aggregates per-state latency statistics.
Can TigerOps alert on a specific state failing repeatedly?
Yes. TigerOps parses Step Functions execution logs from CloudWatch and can alert when a specific state (by name) has a failure rate above your configured threshold within a rolling time window.
How does TigerOps correlate Step Functions failures with downstream service issues?
When a Step Functions execution fails in a Lambda invoke state, TigerOps links the execution failure to the corresponding Lambda function invocation error via X-Ray trace IDs. This gives you the Lambda error message and logs in the same incident view.
Does TigerOps support monitoring Step Functions activity tasks?
Yes. Activity task metrics (ActivitiesScheduled, ActivitiesStarted, ActivitiesSucceeded, ActivitiesFailed, ActivityScheduleTime) are collected via Metric Streams. TigerOps alerts when the activity schedule time grows, indicating workers are not polling fast enough.
Full Visibility Into Every Workflow Execution
Execution metrics, state-level latency, and failure analysis. Connect your state machines in minutes.