your gke bill keeps climbing and traffic is completely normal- here's why that happens
this catches a lot of teams off guard. costs go up, nothing obviously changed on the product side, and yet the invoice is higher than last month. it's a pretty common pattern with kubernetes on google cloud and it usually isn't one big problem, it tends to be several smaller ones running quietly at the same time.
i've spent a good amount of time reading through how GKE billing actually works and where waste tends to hide. this post is the explanation i wish had existed when i first started trying to understand it.
(kubernetes is supposed to make infrastructure efficient. i know. believe me, i know.)
okay so. the core thing nobody tells you upfront
google bills you for what you've reserved, not necessarily what you're running. a node sitting at 3am with literally zero pods doing anything useful? still billing. full price. the meter never stops.
when i actually broke down our bill it was three things:
compute- this was the big one. nodes we weren't actually using
storage- persistent volumes that devs had provisioned at maximum "just in case" sizes and never touched again
network egress- honestly the sneakiest one. data leaving google's network costs money and it hides in the bill really well
👀 the fee that got usevery single cluster costs $0.10 per hour just to keep the lights on- that's roughly $73/month per cluster, whether you're using it or not. google waives this for one cluster per billing account. it's extremely common for teams to have several forgotten dev or test clusters sitting there. multiply $73 by however many you find. the number is usually uncomfortable.
stuff you can fix this week, no drama
i'm putting the easy wins first because honestly these alone cut a noticeable chunk out of our bill before we even touched anything complicated:
N1 → E2 machine types.E2 runs up to 31% cheaper for standard web workloads with comparable performance for most applications. it's a config change, not a rewrite.
turn on the optimize-utilization autoscaler profile.by default the autoscaler kind of... waits around politely before shutting down empty servers. this setting makes it actually aggressive about packing pods and killing unused nodes fast.
go find your zombie clusters right now.if something has had zero external traffic for a week it's a zombie. teams regularly find forgotten test environments that have been billing for months. deleting them stops the cost immediately.
~10%
of GKE clusters sitting completely idle at any moment, per google's own internal data. someone on your team left a test environment running. probably more than one someone.
the over-provisioning thing- this one is genuinely psychological
i don't want to be mean about this but the reason so many pods are over-provisioned is fear. engineers (including me, i've done this) ask for way more CPU and memory than the app needs because crashes are embarrassing and nobody wants to be the person who under-provisioned.
completely understandable. also extremely expensive at scale.
the fix - called rightsizing- is just looking at two weeks of actual usage history and setting Pod Requests based on what the app genuinely uses rather than worst-case guessing. wild concept i know.
⚠ this one bites people in productionif a dev doesn't set both a Request AND a Limit on a pod, kubernetes quietly gives it a classification called "Best Effort." sounds fine. it's not fine. when a node runs low on memory, the system starts killing pods and Best Effort pods go first, always. if your checkout flow or anything revenue-critical is running as Best Effort... it dies during your traffic spike. engineers then add more servers to "fix the instability." the actual fix was just setting proper limits from the start.
the three tiers are: Guaranteed (both Request and Limit set, equal) → Burstable (Request lower than Limit) → Best Effort (nothing set, living chaotically). most orgs have way more Best Effort pods in production than anyone on the team realizes.
the autoscaler conflict situation — genuinely chaotic when it happens
this one took me a while to understand because it seems like it shouldn't be a problem. like why would two autoscalers fight each other? but they do and it's bad.
HPA (Horizontal Pod Autoscaler) adds more copies of your app when load goes up. VPA (Vertical Pod Autoscaler) gives more power to existing copies when load goes up. if you enable both on the same CPU metric- which is easy to accidentally do- they literally work against each other. HPA spins up pods, VPA is simultaneously resizing them, neither gets to finish what it started, and you're paying for the chaos the whole time.
the answer is the Multidimensional Pod Autoscaler- it divides responsibilities properly. HPA owns CPU scaling (number of pods), VPA owns memory scaling (size of pods). they stop arguing. your bill stops spiking randomly.
the bigger architectural stuff - worth it if you're operating at any real scale
these take more planning but the savings are significant:
Spot VMs for background work- google will terminate these without warning, which sounds bad but if you're running batch jobs, report generation, data pipelines- stuff that can restart cleanly- they're up to 91% cheaper than normal nodes. just never run customer-facing services on them. taints and tolerations handle the routing.
Pause Pods- honestly clever. you park a lightweight placeholder pod on a node. when a real high-priority pod needs to spin up instantly during a traffic spike, it evicts the pause pod and takes the space. your users get zero cold-start delay. the cluster quietly boots a new node in the background to replace the evicted pause pod.
preStop hooks in your app code- when kubernetes wants to remove a node to save money it sends SIGTERM to your app. if your app ignores that and just drops live connections, you'll have 5xx errors, angry users, and an on-call engineer who will turn off cost-saving automation to make the errors stop. preStop hooks let the app drain connections gracefully before shutting down. this is what makes safe automated scale-down actually possible.
NodeLocal DNSCache- small one but real. stops pods from firing external DNS queries constantly. less network traffic. less egress cost. adds up across a big cluster.
discounts- and why the order you do things matters
two types worth knowing:
Sustained Use Discounts are automatic, run a VM for more than 25% of the month and google gradually discounts the price. nothing to configure.
Committed Use Discounts are contracts, you promise google you'll use a certain amount for 1 or 3 years, they give you a significant price cut. the mistake people make is signing CUDs before rightsizing, which means they're committing to paying for more than they actually need. rightsize first. get an honest baseline. then commit.
💡 the order matters more than people thinkrightsize, establish real baseline, buy CUDs at that baseline. buying CUDs first is like signing a lease on an apartment before you know how many rooms you actually need. only commit to your minimum, not your peak- peaks are unpredictable, baselines aren't.
being honest: doing all of this manually is genuinely a lot
everything above is real and it works. but across a big org with hundreds of workloads and multiple teams constantly deploying things, doing all of this by hand is basically a part-time job that nobody officially has. pod requests drift. someone spins up a cluster for a demo and it just... stays. CUD timing is hard to get right when your baseline keeps moving.
there are tools built specifically for this- Costimizer being one of them — that continuously scan for the stuff described above, flag it, and can execute low-risk fixes automatically once you trust the recommendations. the part that actually matters to engineering teams isn't just "the savings number", it's reclaiming the hours that would otherwise go toward cloud cost archaeology every month instead of building actual product.
anyway if you read this whole thing- genuinely go count your clusters right now. open the billing console. i'll be here.
— research sourced from costimizer.ai/blogs/gke-cost-optimization
















