All Integrations
ContainersTigerOps agent

Containerd Integration

Monitor low-level container runtime metrics, lifecycle events, and cgroup resource usage for containerd. Full visibility into image pulls, sandbox health, and shim stability.

Setup

How It Works

01

Install the TigerOps Agent

Deploy the TigerOps agent on each node running containerd. The agent auto-connects to the containerd gRPC socket at /run/containerd/containerd.sock and begins scraping runtime metrics immediately.

02

Enable CRI Metrics Endpoint

Configure containerd to expose its metrics endpoint by setting metrics.address in /etc/containerd/config.toml. TigerOps scrapes this Prometheus-compatible endpoint every 15 seconds.

03

Configure cgroup Scraping

Enable cgroup v2 metrics collection in the TigerOps agent config to capture per-container CPU throttling, memory pressure, and blkio stats directly from the Linux cgroup hierarchy.

04

Set Runtime Alerts

Define alert thresholds for container restart rates, image pull failures, and sandbox creation latency. TigerOps correlates runtime anomalies with Kubernetes pod events for full context.

Capabilities

What You Get Out of the Box

Container Lifecycle Event Tracking

Track container create, start, stop, and delete events in real time. TigerOps records lifecycle event latency and surfaces containers with abnormal churn rates or repeated crash loops.

Image Pull Latency Monitoring

Monitor image pull duration, layer download throughput, and snapshot unpack time. Identify slow registry responses and large image layers that degrade pod startup performance.

cgroup Resource Metrics

Per-container CPU quota usage, throttle percentage, memory working set, cache, and swap from cgroup v2. Detect containers approaching resource limits before OOMKill events occur.

Sandbox & Shim Health

Monitor containerd-shim process counts, sandbox creation success rates, and pause container health. Alert on shim crashes that indicate runtime instability at the node level.

Snapshot & Content Store Metrics

Track overlay snapshot creation latency, content store utilization, and garbage collection duration. Identify disk pressure caused by orphaned snapshots and un-GCed image layers.

AI Runtime Anomaly Detection

TigerOps AI baselines container startup times, image pull rates, and resource usage per workload. Automatic alerts fire when runtime behavior deviates from established patterns.

Configuration

TigerOps Agent Config for Containerd

Configure the TigerOps agent to scrape containerd metrics and cgroup stats on each node.

tigerops-containerd.yaml
# TigerOps Agent — containerd integration config
# Place at /etc/tigerops/agent.yaml on each node

containerd:
  enabled: true
  socket: /run/containerd/containerd.sock
  # Namespaces to monitor (empty = all namespaces)
  namespaces:
    - k8s.io
    - default

  # Scrape the built-in Prometheus metrics endpoint
  metricsEndpoint:
    address: "127.0.0.1:1338"
    scrapeInterval: 15s

  # Lifecycle event subscription
  events:
    enabled: true
    topics:
      - /tasks/start
      - /tasks/exit
      - /containers/create
      - /containers/delete
      - /images/pull

# cgroup resource collection (v1 or v2 auto-detected)
cgroups:
  enabled: true
  scrapeInterval: 15s
  # Collect per-container breakdown
  perContainer: true

remoteWrite:
  endpoint: https://ingest.atatus.net/api/v1/write
  bearerToken: "${TIGEROPS_API_KEY}"

# Alert thresholds
alerts:
  containerRestartRatePerMin: 5
  imagePullLatencySeconds: 30
  cgroupMemoryUsagePct: 90
  sandboxCreationFailures: 1
FAQ

Common Questions

Does TigerOps support both cgroup v1 and cgroup v2 for containerd monitoring?

Yes. The TigerOps agent detects the cgroup version automatically. For cgroup v1, it reads subsystem files from /sys/fs/cgroup. For cgroup v2 (unified hierarchy), it reads from the unified mount point. Both paths produce equivalent CPU, memory, and blkio metric sets.

How does TigerOps connect to the containerd gRPC socket?

The TigerOps agent mounts the containerd socket (/run/containerd/containerd.sock) via a hostPath volume when running in Kubernetes, or directly on the host. It uses the containerd gRPC API to subscribe to events and query namespace-scoped container state.

Can I monitor containerd on nodes not running Kubernetes?

Yes. TigerOps supports standalone containerd deployments. Install the agent directly on the host, point it at the containerd socket, and it will discover all namespaces and containers without any Kubernetes context required.

How are containerd metrics correlated with Kubernetes pod metrics?

TigerOps joins containerd container IDs with Kubernetes pod metadata via the CRI API. This maps low-level runtime metrics (shim health, cgroup usage) to pod names, namespaces, and deployments so you can trace runtime issues to specific workloads.

What containerd versions are supported?

TigerOps supports containerd 1.5 and later. Versions 1.6+ with the built-in Prometheus metrics endpoint at /metrics are fully supported. For older versions, the agent falls back to direct cgroup and event subscription collection.

Get Started

Get Full Visibility Into Your Container Runtime Layer

Lifecycle events, cgroup metrics, and image pull telemetry — all correlated with your Kubernetes workloads. Deploy in minutes.