All Integrations
CloudAzure Monitor + Service Principal

Azure AKS Integration

Monitor Azure Kubernetes Service clusters with full node, workload, and network telemetry. Get predictive alerts and AI root cause analysis before incidents impact your workloads.

Setup

How It Works

01

Create a Service Principal

Register an Azure Service Principal with the Monitoring Reader role scoped to your AKS resource group. TigerOps uses this identity to pull metrics from Azure Monitor without cluster-level credentials.

02

Enable Azure Monitor Diagnostics

Turn on Diagnostic Settings for your AKS cluster and route kube-apiserver, kube-controller-manager, and node metrics to a Log Analytics workspace or Event Hub.

03

Deploy the TigerOps Collector

Install the TigerOps Helm chart into your AKS cluster. The collector scrapes kubelet, cAdvisor, and kube-state-metrics endpoints and forwards data to TigerOps remote-write.

04

Configure Alert Policies

Set node CPU, memory, and pod restart thresholds. TigerOps AI correlates pod eviction events with node resource pressure and surfaces root cause automatically.

Capabilities

What You Get Out of the Box

Node-Level Resource Metrics

Track CPU, memory, disk I/O, and network throughput per node and node pool. Spot resource imbalances across system and user pools before workloads are evicted.

Workload Health Tracking

Deployment, DaemonSet, StatefulSet, and Job status with pod restart counts, OOMKill events, and container crash loop detection across every namespace.

Network Telemetry

Ingress controller request rates, pod-to-pod latency via eBPF, and Azure CNI plugin metrics including IP allocation exhaustion warnings.

Control Plane Visibility

API server request latency, etcd health, scheduler queue depth, and controller-manager reconciliation lag pulled directly from Azure Monitor Diagnostics.

Node Pool Autoscaler Insights

Cluster Autoscaler decisions, scale-out trigger reasons, and node provisioning duration. Understand why your cluster scaled and whether the scale event resolved the pressure.

AI Root Cause Correlation

When a pod enters CrashLoopBackOff, TigerOps AI links the event to node memory pressure, upstream throttling, or config changes deployed in the same window.

Configuration

ARM Template — Diagnostic Settings

Enable Azure Monitor Diagnostic Settings on your AKS cluster to stream logs and metrics to TigerOps.

aks-diagnostics.json
// ARM template — AKS Diagnostic Settings for TigerOps
{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "aksClusterName": { "type": "string" },
    "workspaceId":    { "type": "string" }
  },
  "resources": [
    {
      "type": "Microsoft.ContainerService/managedClusters/providers/diagnosticSettings",
      "apiVersion": "2021-05-01-preview",
      "name": "[concat(parameters('aksClusterName'), '/Microsoft.Insights/tigerops')]",
      "properties": {
        "workspaceId": "[parameters('workspaceId')]",
        "logs": [
          { "category": "kube-apiserver",          "enabled": true },
          { "category": "kube-controller-manager", "enabled": true },
          { "category": "kube-scheduler",          "enabled": true },
          { "category": "kube-audit",              "enabled": true }
        ],
        "metrics": [
          { "category": "AllMetrics", "enabled": true, "retentionPolicy": { "days": 30, "enabled": true } }
        ]
      }
    }
  ]
}

# TigerOps collector Helm install
helm repo add tigerops https://charts.atatus.net
helm install tigerops-collector tigerops/collector \
  --namespace tigerops --create-namespace \
  --set remoteWrite.endpoint=https://ingest.atatus.net/api/v1/write \
  --set remoteWrite.bearerToken="${TIGEROPS_API_KEY}" \
  --set cluster.name="${AKS_CLUSTER_NAME}" \
  --set azure.subscriptionId="${AZURE_SUBSCRIPTION_ID}"
FAQ

Common Questions

Does TigerOps support AKS with Azure Active Directory integration?

Yes. TigerOps authenticates via a Service Principal with Monitoring Reader permissions. It does not need AAD pod identity or Workload Identity for metric collection, though Workload Identity is supported if you prefer it for the collector pod.

Can I monitor multiple AKS clusters in one TigerOps workspace?

Yes. Each cluster is identified by a cluster label injected at collection time. You can filter dashboards by cluster, region, or environment tag and set independent alert policies per cluster.

How does TigerOps handle AKS node pool upgrades?

The TigerOps collector uses a DaemonSet that reschedules automatically as nodes drain and reprovision during an upgrade. Metric continuity is maintained and upgrade-related pod disruptions are annotated on your dashboards.

What is the overhead of the TigerOps collector on AKS?

The collector DaemonSet requests 50m CPU and 64Mi memory per node by default. It scrapes metrics every 15 seconds and batches remote-write payloads to minimise egress cost from Azure.

Does TigerOps support Windows node pools in AKS?

Yes. The TigerOps Windows agent collects CPU, memory, and disk metrics from Windows node pools. Container-level metrics are available for Windows containers running on those nodes via the Windows Container runtime metrics endpoint.

Get Started

Full Visibility Into Your AKS Clusters

Node telemetry, workload health, and AI-powered root cause analysis. Deploy in 5 minutes.