Why Database Costs Are Hard to Optimize
Unlike EC2 compute — which you can terminate and replace — databases hold persistent state. You cannot just switch instance types without a maintenance window, and getting it wrong means customer-facing downtime. This caution leads most teams to over-provision databases and never touch them once they are running. The result: databases running at 10-15% CPU utilization on expensive db.r6g.4xlarge instances.
The good news: RDS and Aurora have gotten better at online modifications. Many right-sizing operations can be done with minimal downtime using blue-green deployments, read replica promotion, or Aurora's online resize capability.
Audit Your Current Database Utilization
# Get CPU utilization for all RDS instances in last 30 days
aws cloudwatch get-metric-statistics --namespace AWS/RDS --metric-name CPUUtilization --dimensions Name=DBInstanceIdentifier,Value=my-postgres-db --start-time 2026-02-10T00:00:00Z --end-time 2026-03-10T00:00:00Z --period 86400 --statistics Average Maximum --output table
# Check FreeableMemory — how much RAM is unused
aws cloudwatch get-metric-statistics --namespace AWS/RDS --metric-name FreeableMemory --dimensions Name=DBInstanceIdentifier,Value=my-postgres-db --start-time 2026-02-10T00:00:00Z --end-time 2026-03-10T00:00:00Z --period 86400 --statistics Average Minimum --output table
# Key metrics to gather:
# CPUUtilization: if avg < 20%, strong right-size candidate
# FreeableMemory: if consistently > 50% of total, too much RAM
# DatabaseConnections: if well under max_connections, smaller instance fine
# ReadIOPS / WriteIOPS: if well under provisioned, over-provisioned storage
# ReplicaLag: if replica lag exists, primary is probably correctly sized
-- Query from inside PostgreSQL to understand actual load
-- Run on your RDS instance
-- Top slow queries
SELECT
query,
calls,
total_exec_time / calls AS avg_time_ms,
rows / calls AS avg_rows,
stddev_exec_time AS stddev_ms
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
-- Connection pool usage
SELECT
count(*) as total_connections,
count(*) FILTER (WHERE state = 'active') as active,
count(*) FILTER (WHERE state = 'idle') as idle,
count(*) FILTER (WHERE state = 'idle in transaction') as idle_in_tx
FROM pg_stat_activity;
-- Database sizes
SELECT
datname,
pg_size_pretty(pg_database_size(datname)) as size
FROM pg_database
ORDER BY pg_database_size(datname) DESC;
-- Table bloat check (may need pg_bloat_check extension)
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as total_size,
pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) as table_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 20;
Right-Sizing Decision Matrix
Decision framework based on CloudWatch metrics:
CPU avg < 10%, Memory freeable > 60%:
Action: Downsize 2 tiers (e.g., r6g.4xlarge → r6g.xlarge)
Method: RDS Blue/Green Deployment (zero downtime)
Savings: ~75%
CPU avg 10-25%, Memory freeable 40-60%:
Action: Downsize 1 tier
Method: Blue/Green or maintenance window resize
Savings: ~50%
CPU avg 25-50%, Memory freeable 20-40%:
Action: Current size appropriate, consider Graviton upgrade
Method: Upgrade to r7g/r6g (ARM) for 20% cost reduction
Savings: ~20%
CPU avg > 70% or Memory freeable < 10%:
Action: Consider scaling up OR query optimization first
Check: Is this a sustained pattern or a daily peak?
Instance family considerations:
m7g (general): 8GB/vCPU — good for balanced OLTP
r7g (memory): 16GB/vCPU — for large working sets
c7g (compute): 2GB/vCPU — for CPU-intensive query workloads
For PostgreSQL: RAM = cache = performance
Working set fits in RAM = CPU-bound queries fast
Working set exceeds RAM = constant disk I/O = VERY slow
Rule: Freeable memory should be 20-30% of total, not 60-80%
If freeable is 70%: you are over-provisioned on memory
RDS Blue/Green Deployment for Zero-Downtime Right-Sizing
# Blue/Green is the safest way to resize production databases
# Creates a staging (green) environment, syncs via logical replication,
# lets you test, then switches over with < 1 second downtime
# Create Green environment (new, smaller instance)
aws rds create-blue-green-deployment --blue-green-deployment-name prod-resize --source arn:aws:rds:us-east-1:123456789:db:my-postgres-prod --target-db-instance-class db.r7g.xlarge # Smaller Graviton instance
--target-db-parameter-group-name prod-pg16-params --target-engine-version "16.4"
# Monitor sync status
aws rds describe-blue-green-deployments --query 'BlueGreenDeployments[0].{Status:Status,SwitchoverDetails:SwitchoverDetails}'
# Run performance tests against Green endpoint
# Green endpoint: my-postgres-prod-green.cluster-xxxxx.us-east-1.rds.amazonaws.com
# When ready, switchover (takes 30-60 seconds)
aws rds switchover-blue-green-deployment --blue-green-deployment-identifier bgd-xxxxxxxxx --switchover-timeout 300 # Max wait in seconds
# After switchover: old Blue environment still exists for rollback
# Delete when confident:
aws rds delete-blue-green-deployment --blue-green-deployment-identifier bgd-xxxxxxxxx --delete-target # Also delete the old Blue DB instance
Aurora vs RDS: When Aurora Saves Money
Aurora advantages for cost:
1. Storage autoscaling (no pre-provisioning needed)
RDS: You pre-provision storage + IOPS, pay for max capacity
Aurora: Pay for what you use, scales automatically to 128TB
2. Read replicas are cheaper
RDS Read Replica: full instance price (same as primary)
Aurora Replica: ~20% less than primary (shared storage layer)
3. Aurora I/O-Optimized (2023+)
For I/O heavy workloads: flat per-GB price, no per-I/O charge
Traditional Aurora: $0.20 per million I/Os
Aurora I/O-Optimized: higher storage price but no I/O charges
Break-even: when I/O charges exceed 25% of total Aurora cost
4. Global Database — cheaper than manual cross-region replication
5. Aurora Serverless v2 — scale to zero for dev/test
When RDS is cheaper than Aurora:
- Small databases (< 50GB) — Aurora minimum storage overhead hurts
- Postgres-specific features Aurora doesn't support
- Predictable, steady I/O patterns (Aurora I/O charges add up)
- Single-instance dev/test (Aurora requires at minimum 2 instances for HA)
Aurora vs RDS for 500GB PostgreSQL database, 100M I/Os/month:
RDS db.r6g.2xlarge Multi-AZ: $1,154/month
Aurora PostgreSQL r6g.2xlarge (1 writer + 1 replica): $1,048/month
Aurora I/O-Optimized r6g.2xlarge: $1,125/month (if I/O heavy)
Aurora is ~10% cheaper for typical workloads, more for read-heavy.
Aurora Serverless v2 for Variable Workloads
# Aurora Serverless v2 pricing model:
# $0.12 per ACU-hour
# 1 ACU = 2GB RAM + proportional CPU
# Min 0.5 ACU = $0.06/hr = $43/month floor
# Max 128 ACU = $15.36/hr (only at peak burst)
# Use case evaluation:
workloads = {
"e-commerce (Black Friday peaks)": {
"normal_acus": 4, # $0.48/hr = $346/month
"peak_acus": 64, # $7.68/hr (only 2 days/year)
"provisioned_needed": "db.r6g.8xlarge", # $2,700/month
"serverless_cost": "$346/month avg + $370 peak burst = ~$450/month",
"provisioned_cost": "$2,700/month",
"verdict": "Serverless WINS — 83% savings"
},
"24/7 saas app": {
"normal_acus": 8,
"steady_utilization": True,
"serverless_cost": "$0.96/hr * 730 = $701/month",
"provisioned_cost": "db.r6g.2xlarge = $540/month",
"verdict": "Provisioned WINS for steady 24/7 load"
},
"dev/staging database": {
"usage_hours": "9am-7pm weekdays = 50 hrs/week",
"normal_acus": 2,
"serverless_cost": "50 hrs * $0.24 = $12/week = $52/month",
"provisioned_cost": "db.t4g.medium = $60/month (always on)",
"verdict": "Serverless slightly cheaper + scales for load tests"
}
}
resource "aws_rds_cluster" "serverless" {
cluster_identifier = "app-db-serverless"
engine = "aurora-postgresql"
engine_mode = "provisioned" # Required for Serverless v2
engine_version = "16.4"
serverlessv2_scaling_configuration {
min_capacity = 0.5 # Minimum 0.5 ACU — scales down to this when idle
max_capacity = 64 # Max ACUs during burst
}
backup_retention_period = 7
preferred_backup_window = "03:00-04:00"
preferred_maintenance_window = "sun:05:00-sun:06:00"
# Auto-pause after 5 minutes idle (dev/test only — NOT for production)
# Note: auto-pause adds cold start delay (15-30 seconds to wake up)
# Only enable for non-production databases
enable_http_endpoint = true # Data API for connection-less queries
master_username = "admin"
manage_master_user_password = true # Secrets Manager auto-rotation
}
resource "aws_rds_cluster_instance" "serverless" {
identifier = "app-db-serverless-1"
cluster_identifier = aws_rds_cluster.serverless.id
instance_class = "db.serverless"
engine = aws_rds_cluster.serverless.engine
engine_version = aws_rds_cluster.serverless.engine_version
}
Storage Optimization
# RDS storage is often the second-largest cost after instance
# gp2 → gp3 migration saves 20% with better baseline performance
# List all RDS instances with storage type and size
aws rds describe-db-instances --query 'DBInstances[*].[DBInstanceIdentifier,DBInstanceClass,AllocatedStorage,StorageType]' --output table
# Modify storage type from gp2 to gp3 (online, no restart needed)
aws rds modify-db-instance --db-instance-identifier my-postgres-db --storage-type gp3 --iops 3000 # gp3 includes 3000 IOPS free
--storage-throughput 125 # gp3 includes 125 MB/s free
--apply-immediately
# For Aurora: storage is automatically managed and is Aurora Volume
# You cannot choose storage type — it is always Aurora's distributed storage
# Aurora charges: $0.10/GB-month for standard, $0.225/GB for I/O-Optimized
# Check if you have unused allocated storage on RDS
# aws rds describe-db-instances shows AllocatedStorage (what you pay for)
# But actual used storage is in FreeStorageSpace CloudWatch metric
aws cloudwatch get-metric-statistics --namespace AWS/RDS --metric-name FreeStorageSpace --dimensions Name=DBInstanceIdentifier,Value=my-postgres-db --start-time 2026-03-01T00:00:00Z --end-time 2026-03-10T00:00:00Z --period 86400 --statistics Average --output table
# If FreeStorageSpace avg is 80%+ of AllocatedStorage: you over-provisioned
Reserved Instances for Databases
Database Reserved Instance savings (1-year, no upfront):
PostgreSQL db.r7g.2xlarge (8 vCPU, 64GB):
On-demand: $0.960/hr = $700/month
1-yr reserved: $0.576/hr = $420/month (40% savings)
3-yr reserved: $0.384/hr = $280/month (60% savings)
Strategy: Reserve your production databases on 1-year terms
- After 6 months of operation (know your load profile)
- Reserve for the right-sized instance (resize BEFORE reserving)
- Multi-AZ reservations: reserve the primary instance class
(standby is the same class but Aurora/Multi-AZ billing is auto-discounted)
What NOT to reserve:
- Development databases (use Serverless or small on-demand)
- Staging databases
- Databases you plan to migrate to Aurora Serverless
- Any database where workload is uncertain
Reserved Instance Marketplace:
You can sell unused reservations back — typically 30-40% of remaining value
Useful if you right-size and need to replace a reservation
Conclusion
Database cost optimization requires care that compute optimization does not — but the payoff is significant. A typical over-provisioned RDS setup costs 3-4x what it should. Blue/Green Deployments make right-sizing production databases safer than ever before. Aurora Serverless v2 eliminates the need to predict database load, paying only for what you use.
The formula: audit with CloudWatch, right-size with Blue/Green, migrate variable workloads to Aurora Serverless v2, purchase Reserved Instances for stable production loads, and migrate to gp3 storage across the board. These five steps typically achieve 40-60% database cost reduction with zero performance impact.
Priya Sharma
Full-Stack Developer and open-source contributor with a passion for performance and developer experience.