The Problem: $50K/Month Kubernetes Bill
This is a real case study (details anonymized) from a Series B SaaS company running 200+ microservices on EKS. Their monthly infrastructure bill was $52,000. After a 3-month optimization sprint, they reduced it to $23,000 β a 56% reduction β without any application performance degradation.
The waste was typical of fast-growing startups: engineers requested generous resource limits to avoid getting paged, nobody tracked actual utilization, and the cluster was never right-sized after the initial setup. The average CPU utilization was 12% across all nodes.
Step 1: Measure Before You Cut β Getting Actual Utilization
# Install Metrics Server if not present
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Top nodes β current CPU/memory utilization
kubectl top nodes
# Top pods β find the highest consumers
kubectl top pods -A --sort-by=cpu | head -30
kubectl top pods -A --sort-by=memory | head -30
# Find pods with very low utilization vs their requests
kubectl get pods -A -o json | jq '
.items[] |
select(.status.phase == "Running") |
{
name: .metadata.name,
namespace: .metadata.namespace,
cpu_request: .spec.containers[0].resources.requests.cpu,
mem_request: .spec.containers[0].resources.requests.memory
}
' | head -50
# Prometheus query to find over-provisioned deployments
# (Run in Grafana Explore or via promtool)
#
# CPU efficiency (actual/requested):
# sum(rate(container_cpu_usage_seconds_total[5m])) by (namespace, pod)
# /
# sum(kube_pod_container_resource_requests{resource="cpu"}) by (namespace, pod)
#
# Memory efficiency:
# sum(container_memory_working_set_bytes) by (namespace, pod)
# /
# sum(kube_pod_container_resource_requests{resource="memory"}) by (namespace, pod)
Step 2: Fix Resource Requests with VPA
# Install VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh
# Or via Helm
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install vpa fairwinds-stable/vpa --namespace kube-system --set updater.enabled=true --set recommender.enabled=true --set admissionController.enabled=true
# Deploy VPA in Recommendation-Only mode first (safe β no restarts)
# Apply to your highest-cost deployments first
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: payment-service-vpa
namespace: production
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: payment-service
updatePolicy:
updateMode: "Off" # Only recommend β no automatic changes yet
---
# After reviewing recommendations for 1 week, switch to Auto
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: payment-service-vpa
namespace: production
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: payment-service
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: payment-service
minAllowed:
cpu: "50m"
memory: "64Mi"
maxAllowed:
cpu: "2"
memory: "2Gi"
controlledResources: ["cpu", "memory"]
# Check VPA recommendations after 7+ days
kubectl describe vpa payment-service-vpa -n production
# Look for:
# Recommendation:
# Container Recommendations:
# Container Name: payment-service
# Lower Bound: cpu: 12m, memory: 105M
# Target: cpu: 25m, memory: 262M β This is what VPA will set
# Upper Bound: cpu: 108m, memory: 768M
#
# Original request was: cpu: 500m, memory: 1Gi
# VPA recommends: cpu: 25m, memory: 262M
# That is a 20x CPU reduction and 4x memory reduction!
Step 3: Replace Cluster Autoscaler with Karpenter
# Cluster Autoscaler limitations:
# - Provisions from fixed node groups (slow, 2-5 min)
# - Cannot select cheapest instance type automatically
# - No automatic node consolidation
# - Does not understand pod-level constraints well
# Karpenter advantages:
# - Provisions in ~60 seconds
# - Automatically selects cheapest instance that fits the workload
# - Consolidates underutilized nodes automatically
# - Understands topology spread, affinities, taints
# Install Karpenter (EKS)
export CLUSTER_NAME=my-cluster
export AWS_REGION=us-east-1
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export KARPENTER_VERSION=1.1.0
# Create IAM role for Karpenter controller
eksctl create iamserviceaccount --cluster "CLUSTER_NAME" --name karpenter --namespace karpenter --role-name "CLUSTER_NAME-karpenter" --attach-policy-arn "arn:aws:iam::AWS_ACCOUNT_ID:policy/KarpenterControllerPolicy-CLUSTER_NAME" --approve
# Install via Helm
helm install karpenter oci://public.ecr.aws/karpenter/karpenter --version "KARPENTER_VERSION" --namespace karpenter --create-namespace --set serviceAccount.annotations."eks.amazonaws.com/role-arn"="arn:aws:iam::AWS_ACCOUNT_ID:role/CLUSTER_NAME-karpenter" --set settings.clusterName="CLUSTER_NAME" --set settings.interruptionQueue="CLUSTER_NAME"
# Karpenter NodePool β cost-optimized configuration
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
metadata:
labels:
workload-type: general
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
requirements:
# Allow all modern generation instances
- key: "karpenter.k8s.aws/instance-generation"
operator: Gt
values: ["5"]
# Prefer ARM64 (Graviton) for 20% cost savings
- key: "kubernetes.io/arch"
operator: In
values: ["arm64", "amd64"]
# Prefer Spot, fallback to On-Demand
- key: "karpenter.sh/capacity-type"
operator: In
values: ["spot", "on-demand"]
# Use compute/memory/general purpose instance families
- key: "karpenter.k8s.aws/instance-category"
operator: In
values: ["c", "m", "r"]
# Avoid very small instances (overhead per node is too high)
- key: "karpenter.k8s.aws/instance-size"
operator: NotIn
values: ["nano", "micro", "small"]
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 1m # Very aggressive bin-packing
expireAfter: 168h # Recycle nodes weekly
limits:
cpu: "1000" # Maximum cluster size
memory: "4000Gi"
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2023 # Amazon Linux 2023 (latest)
role: "KarpenterNodeRole-CLUSTER_NAME"
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "CLUSTER_NAME"
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "CLUSTER_NAME"
# Larger root volume for container image caching
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 50Gi
volumeType: gp3
iops: 3000
encrypted: true
Step 4: Namespace Resource Quotas
# Without quotas, any deployment can request unlimited resources
# Setting quotas forces teams to be intentional about resource requests
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-alpha-quota
namespace: team-alpha
spec:
hard:
requests.cpu: "20" # Total CPU requests allowed in namespace
requests.memory: "40Gi" # Total memory requests
limits.cpu: "40" # Total CPU limits
limits.memory: "80Gi" # Total memory limits
pods: "100" # Max pod count
count/deployments.apps: "20"
count/services: "20"
---
# LimitRange: set defaults so containers without explicit requests get sensible defaults
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: team-alpha
spec:
limits:
- type: Container
default: # Applied if no limits specified
cpu: "500m"
memory: "256Mi"
defaultRequest: # Applied if no requests specified
cpu: "100m"
memory: "128Mi"
max: # Container cannot exceed these
cpu: "4"
memory: "8Gi"
min: # Container must request at least this
cpu: "10m"
memory: "32Mi"
Step 5: Priority Classes for Spot Eviction Handling
# When Spot instances are reclaimed, high-priority pods are rescheduled first
# Low-priority batch jobs can be safely evicted
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: critical
value: 1000000
globalDefault: false
description: "Critical production workloads β never evict"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high
value: 100000
globalDefault: false
description: "Production services"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: low
value: 1000
globalDefault: true
description: "Default for new workloads"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: batch
value: 100
preemptionPolicy: Never # Never preempt other pods
description: "Batch jobs, ML training β okay to evict and reschedule"
---
# Apply to workloads:
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-api
spec:
template:
spec:
priorityClassName: critical # Never evict this
nodeSelector:
karpenter.sh/capacity-type: on-demand # Always on-demand
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: report-generator
spec:
template:
spec:
priorityClassName: batch # Can be evicted
tolerations:
- key: "karpenter.sh/capacity-type"
operator: Equal
value: "spot"
effect: NoSchedule
Results After 3-Month Optimization
Before optimization:
Total EKS nodes: 45 Γ m5.2xlarge (8 vCPU, 32GB)
Average CPU utilization: 12%
Average memory utilization: 22%
Monthly compute cost: $48,000
After optimization:
Karpenter-managed mix:
- 8Γ r7g.xlarge (Graviton, On-Demand) for critical workloads
- 12-35Γ mixed Spot (c7g.xlarge, m7g.xlarge, r7g.xlarge) for general
- Scales down to 8 nodes at night (from 45 permanent)
Average CPU utilization: 48% (VPA corrected requests)
Average memory utilization: 61%
Monthly compute cost: $21,500
Savings breakdown:
VPA right-sizing (smaller pods = denser packing): -$8,000
Karpenter consolidation (fewer, fuller nodes): -$5,000
Spot instances (mix of spot/on-demand): -$7,000
Graviton (ARM) for 60% of workload: -$4,500
Night-time scale-down (staging workloads): -$2,000
Total monthly savings: $27,000 (52% reduction)
Additional: Moved to 1-year Savings Plans for On-Demand baseline
Additional saving: $3,500/month
Total: $30,500/month savings (59% reduction)
Common Mistakes to Avoid
Mistake 1: Running stateful workloads on Spot
Spot instances get 2-minute termination notice.
Databases, Kafka brokers, Elasticsearch nodes = must be On-Demand.
Stateless apps, batch jobs, CI/CD workers = ideal for Spot.
Mistake 2: VPA + HPA on the same deployment
VPA changes pod resources (restarts pods).
HPA changes replica count.
Running both on the same CPU metric = conflicts.
Solution: Use VPA for memory only, HPA for CPU scaling.
Or use KEDA for event-driven scaling instead of HPA.
Mistake 3: Too-aggressive consolidation causing churn
consolidateAfter: 1s causes constant pod rescheduling.
Set to at least 30s-5m depending on your tolerance.
Monitor Karpenter metrics for disruption rate.
Mistake 4: Not accounting for node overhead
Each node has OS, kubelet, kube-proxy overhead: ~300m CPU, 1GB memory.
A t3.small (2 vCPU, 2GB) has only 1.7 vCPU, 1GB available for pods.
Small instances are inefficient β use xlarge or larger for cost efficiency.
Mistake 5: Cutting resource requests too low
Under-provisioned containers = OOMKilled or CPU-throttled.
Set requests at p50 usage, limits at p99 usage.
Never set requests=limits for CPU (causes hard throttling).
Conclusion
Kubernetes cost optimization follows a clear playbook: measure actual usage, right-size with VPA, enable intelligent node scaling with Karpenter, shift appropriate workloads to Spot, and enforce resource governance with quotas. The technical work takes 4-8 weeks. The cultural work β getting teams to care about resource efficiency β is ongoing.
The 56% reduction shown here is achievable at most organizations. The typical starting point is 10-15% CPU utilization; the target is 40-60%. Every percentage point of utilization improvement directly translates to fewer nodes and lower bills.
Marcus Rodriguez
Lead DevOps Engineer specializing in CI/CD pipelines, container orchestration, and infrastructure automation.