RDS and Aurora Cost Optimization: Right-Sizing, Reserved Instances, and Serverless Migration

Why Database Costs Are Hard to Optimize

Unlike EC2 compute — which you can terminate and replace — databases hold persistent state. You cannot just switch instance types without a maintenance window, and getting it wrong means customer-facing downtime. This caution leads most teams to over-provision databases and never touch them once they are running. The result: databases running at 10-15% CPU utilization on expensive db.r6g.4xlarge instances.

The good news: RDS and Aurora have gotten better at online modifications. Many right-sizing operations can be done with minimal downtime using blue-green deployments, read replica promotion, or Aurora's online resize capability.

Audit Your Current Database Utilization

# Get CPU utilization for all RDS instances in last 30 days
aws cloudwatch get-metric-statistics   --namespace AWS/RDS   --metric-name CPUUtilization   --dimensions Name=DBInstanceIdentifier,Value=my-postgres-db   --start-time 2026-02-10T00:00:00Z   --end-time 2026-03-10T00:00:00Z   --period 86400   --statistics Average Maximum   --output table

# Check FreeableMemory — how much RAM is unused
aws cloudwatch get-metric-statistics   --namespace AWS/RDS   --metric-name FreeableMemory   --dimensions Name=DBInstanceIdentifier,Value=my-postgres-db   --start-time 2026-02-10T00:00:00Z   --end-time 2026-03-10T00:00:00Z   --period 86400   --statistics Average Minimum   --output table

# Key metrics to gather:
# CPUUtilization: if avg < 20%, strong right-size candidate
# FreeableMemory: if consistently > 50% of total, too much RAM
# DatabaseConnections: if well under max_connections, smaller instance fine
# ReadIOPS / WriteIOPS: if well under provisioned, over-provisioned storage
# ReplicaLag: if replica lag exists, primary is probably correctly sized

-- Query from inside PostgreSQL to understand actual load
-- Run on your RDS instance

-- Top slow queries
SELECT
    query,
    calls,
    total_exec_time / calls AS avg_time_ms,
    rows / calls AS avg_rows,
    stddev_exec_time AS stddev_ms
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- Connection pool usage
SELECT 
    count(*) as total_connections,
    count(*) FILTER (WHERE state = 'active') as active,
    count(*) FILTER (WHERE state = 'idle') as idle,
    count(*) FILTER (WHERE state = 'idle in transaction') as idle_in_tx
FROM pg_stat_activity;

-- Database sizes
SELECT 
    datname,
    pg_size_pretty(pg_database_size(datname)) as size
FROM pg_database
ORDER BY pg_database_size(datname) DESC;

-- Table bloat check (may need pg_bloat_check extension)
SELECT
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as total_size,
    pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) as table_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 20;

Right-Sizing Decision Matrix

Decision framework based on CloudWatch metrics:

CPU avg < 10%, Memory freeable > 60%:
  Action: Downsize 2 tiers (e.g., r6g.4xlarge → r6g.xlarge)
  Method: RDS Blue/Green Deployment (zero downtime)
  Savings: ~75%

CPU avg 10-25%, Memory freeable 40-60%:
  Action: Downsize 1 tier
  Method: Blue/Green or maintenance window resize
  Savings: ~50%

CPU avg 25-50%, Memory freeable 20-40%:
  Action: Current size appropriate, consider Graviton upgrade
  Method: Upgrade to r7g/r6g (ARM) for 20% cost reduction
  Savings: ~20%

CPU avg > 70% or Memory freeable < 10%:
  Action: Consider scaling up OR query optimization first
  Check: Is this a sustained pattern or a daily peak?

Instance family considerations:
  m7g (general): 8GB/vCPU — good for balanced OLTP
  r7g (memory): 16GB/vCPU — for large working sets
  c7g (compute): 2GB/vCPU — for CPU-intensive query workloads

For PostgreSQL: RAM = cache = performance
  Working set fits in RAM = CPU-bound queries fast
  Working set exceeds RAM = constant disk I/O = VERY slow
  
  Rule: Freeable memory should be 20-30% of total, not 60-80%
  If freeable is 70%: you are over-provisioned on memory

RDS Blue/Green Deployment for Zero-Downtime Right-Sizing

# Blue/Green is the safest way to resize production databases
# Creates a staging (green) environment, syncs via logical replication,
# lets you test, then switches over with < 1 second downtime

# Create Green environment (new, smaller instance)
aws rds create-blue-green-deployment   --blue-green-deployment-name prod-resize   --source arn:aws:rds:us-east-1:123456789:db:my-postgres-prod   --target-db-instance-class db.r7g.xlarge   # Smaller Graviton instance
  --target-db-parameter-group-name prod-pg16-params   --target-engine-version "16.4"

# Monitor sync status
aws rds describe-blue-green-deployments   --query 'BlueGreenDeployments[0].{Status:Status,SwitchoverDetails:SwitchoverDetails}'

# Run performance tests against Green endpoint
# Green endpoint: my-postgres-prod-green.cluster-xxxxx.us-east-1.rds.amazonaws.com

# When ready, switchover (takes 30-60 seconds)
aws rds switchover-blue-green-deployment   --blue-green-deployment-identifier bgd-xxxxxxxxx   --switchover-timeout 300  # Max wait in seconds

# After switchover: old Blue environment still exists for rollback
# Delete when confident:
aws rds delete-blue-green-deployment   --blue-green-deployment-identifier bgd-xxxxxxxxx   --delete-target  # Also delete the old Blue DB instance

Aurora vs RDS: When Aurora Saves Money

Aurora advantages for cost:
  1. Storage autoscaling (no pre-provisioning needed)
     RDS: You pre-provision storage + IOPS, pay for max capacity
     Aurora: Pay for what you use, scales automatically to 128TB
  
  2. Read replicas are cheaper
     RDS Read Replica: full instance price (same as primary)
     Aurora Replica: ~20% less than primary (shared storage layer)
  
  3. Aurora I/O-Optimized (2023+)
     For I/O heavy workloads: flat per-GB price, no per-I/O charge
     Traditional Aurora: $0.20 per million I/Os
     Aurora I/O-Optimized: higher storage price but no I/O charges
     Break-even: when I/O charges exceed 25% of total Aurora cost
  
  4. Global Database — cheaper than manual cross-region replication
  
  5. Aurora Serverless v2 — scale to zero for dev/test

When RDS is cheaper than Aurora:
  - Small databases (< 50GB) — Aurora minimum storage overhead hurts
  - Postgres-specific features Aurora doesn't support
  - Predictable, steady I/O patterns (Aurora I/O charges add up)
  - Single-instance dev/test (Aurora requires at minimum 2 instances for HA)

Aurora vs RDS for 500GB PostgreSQL database, 100M I/Os/month:
  RDS db.r6g.2xlarge Multi-AZ: $1,154/month
  Aurora PostgreSQL r6g.2xlarge (1 writer + 1 replica): $1,048/month
  Aurora I/O-Optimized r6g.2xlarge: $1,125/month (if I/O heavy)
  
  Aurora is ~10% cheaper for typical workloads, more for read-heavy.

Aurora Serverless v2 for Variable Workloads

# Aurora Serverless v2 pricing model:
# $0.12 per ACU-hour
# 1 ACU = 2GB RAM + proportional CPU
# Min 0.5 ACU = $0.06/hr = $43/month floor
# Max 128 ACU = $15.36/hr (only at peak burst)

# Use case evaluation:
workloads = {
    "e-commerce (Black Friday peaks)": {
        "normal_acus": 4,           # $0.48/hr = $346/month
        "peak_acus": 64,            # $7.68/hr (only 2 days/year)
        "provisioned_needed": "db.r6g.8xlarge",  # $2,700/month
        "serverless_cost": "$346/month avg + $370 peak burst = ~$450/month",
        "provisioned_cost": "$2,700/month",
        "verdict": "Serverless WINS — 83% savings"
    },
    "24/7 saas app": {
        "normal_acus": 8,
        "steady_utilization": True,
        "serverless_cost": "$0.96/hr * 730 = $701/month",
        "provisioned_cost": "db.r6g.2xlarge = $540/month",
        "verdict": "Provisioned WINS for steady 24/7 load"
    },
    "dev/staging database": {
        "usage_hours": "9am-7pm weekdays = 50 hrs/week",
        "normal_acus": 2,
        "serverless_cost": "50 hrs * $0.24 = $12/week = $52/month",
        "provisioned_cost": "db.t4g.medium = $60/month (always on)",
        "verdict": "Serverless slightly cheaper + scales for load tests"
    }
}

resource "aws_rds_cluster" "serverless" {
  cluster_identifier = "app-db-serverless"
  engine             = "aurora-postgresql"
  engine_mode        = "provisioned"  # Required for Serverless v2
  engine_version     = "16.4"
  
  serverlessv2_scaling_configuration {
    min_capacity = 0.5    # Minimum 0.5 ACU — scales down to this when idle
    max_capacity = 64     # Max ACUs during burst
  }
  
  backup_retention_period     = 7
  preferred_backup_window     = "03:00-04:00"
  preferred_maintenance_window = "sun:05:00-sun:06:00"
  
  # Auto-pause after 5 minutes idle (dev/test only — NOT for production)
  # Note: auto-pause adds cold start delay (15-30 seconds to wake up)
  # Only enable for non-production databases
  enable_http_endpoint = true  # Data API for connection-less queries
  
  master_username             = "admin"
  manage_master_user_password = true  # Secrets Manager auto-rotation
}

resource "aws_rds_cluster_instance" "serverless" {
  identifier         = "app-db-serverless-1"
  cluster_identifier = aws_rds_cluster.serverless.id
  instance_class     = "db.serverless"
  engine             = aws_rds_cluster.serverless.engine
  engine_version     = aws_rds_cluster.serverless.engine_version
}

Storage Optimization

# RDS storage is often the second-largest cost after instance
# gp2 → gp3 migration saves 20% with better baseline performance

# List all RDS instances with storage type and size
aws rds describe-db-instances   --query 'DBInstances[*].[DBInstanceIdentifier,DBInstanceClass,AllocatedStorage,StorageType]'   --output table

# Modify storage type from gp2 to gp3 (online, no restart needed)
aws rds modify-db-instance   --db-instance-identifier my-postgres-db   --storage-type gp3   --iops 3000        # gp3 includes 3000 IOPS free
  --storage-throughput 125   # gp3 includes 125 MB/s free
  --apply-immediately

# For Aurora: storage is automatically managed and is Aurora Volume
# You cannot choose storage type — it is always Aurora's distributed storage
# Aurora charges: $0.10/GB-month for standard, $0.225/GB for I/O-Optimized

# Check if you have unused allocated storage on RDS
# aws rds describe-db-instances shows AllocatedStorage (what you pay for)
# But actual used storage is in FreeStorageSpace CloudWatch metric

aws cloudwatch get-metric-statistics   --namespace AWS/RDS   --metric-name FreeStorageSpace   --dimensions Name=DBInstanceIdentifier,Value=my-postgres-db   --start-time 2026-03-01T00:00:00Z   --end-time 2026-03-10T00:00:00Z   --period 86400   --statistics Average   --output table
# If FreeStorageSpace avg is 80%+ of AllocatedStorage: you over-provisioned

Reserved Instances for Databases

Database Reserved Instance savings (1-year, no upfront):

PostgreSQL db.r7g.2xlarge (8 vCPU, 64GB):
  On-demand: $0.960/hr = $700/month
  1-yr reserved: $0.576/hr = $420/month (40% savings)
  3-yr reserved: $0.384/hr = $280/month (60% savings)

Strategy: Reserve your production databases on 1-year terms
  - After 6 months of operation (know your load profile)
  - Reserve for the right-sized instance (resize BEFORE reserving)
  - Multi-AZ reservations: reserve the primary instance class
    (standby is the same class but Aurora/Multi-AZ billing is auto-discounted)

What NOT to reserve:
  - Development databases (use Serverless or small on-demand)
  - Staging databases
  - Databases you plan to migrate to Aurora Serverless
  - Any database where workload is uncertain

Reserved Instance Marketplace:
  You can sell unused reservations back — typically 30-40% of remaining value
  Useful if you right-size and need to replace a reservation

Conclusion

Database cost optimization requires care that compute optimization does not — but the payoff is significant. A typical over-provisioned RDS setup costs 3-4x what it should. Blue/Green Deployments make right-sizing production databases safer than ever before. Aurora Serverless v2 eliminates the need to predict database load, paying only for what you use.

The formula: audit with CloudWatch, right-size with Blue/Green, migrate variable workloads to Aurora Serverless v2, purchase Reserved Instances for stable production loads, and migrate to gp3 storage across the board. These five steps typically achieve 40-60% database cost reduction with zero performance impact.

RDS and Aurora Cost Optimization: Right-Sizing, Reserved Instances, and Serverless Migration

Why Database Costs Are Hard to Optimize

Audit Your Current Database Utilization

Right-Sizing Decision Matrix

RDS Blue/Green Deployment for Zero-Downtime Right-Sizing

Aurora vs RDS: When Aurora Saves Money

Aurora Serverless v2 for Variable Workloads

Storage Optimization

Reserved Instances for Databases

Conclusion

Tag

Articoli correlati

AWS Budgets and Cost Anomaly Detection: Automated FinOps Guardrails

Cloud Tagging Strategy at Scale: Enforcing Cost Allocation Across 100+ AWS Accounts

Multi-Cloud Networking Costs: Transit Gateway, VPC Peering, and Cross-Cloud Egress

Pronto a trasformare la tua infrastruttura?