Seamless scaling with VPA In-place Pod Resize on GKE

--- title: Seamless scaling with VPA In-place Pod Resize on GKE published: true description: Learn how VPA In-place Pod Resize can help seamlessly vertically scale workloads on Google Kubernetes Engine (GKE). tags: kubernetes, ai, gke, googlecloud cover_image: https://dev-to-uploads.s3.amazonaws.com/uploads/articles/uqzknnjyuuueceq6xotm.png # Use a ratio of 100:42 for best results. # published_at: 2026-05-20 20:24 +0000 --- Right-sizing Kubernetes workloads is a common platform engineering challenge. Set your requests too high, and you burn cloud budgets on idle capacity; set your limits too low, and your applications face throttling or dreaded OOMKills. For years, the [**Vertical Pod Autoscaler (VPA)**](https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler?utm_campaign=CDR_0x5723eddc_default_b464422378&utm_medium=external&utm_source=blog) has been the standard answer to this problem, automatically adjusting CPU and memory requirements based on actual usage. However, this method of scaling came with a significant catch that prevented widespread adoption for critical workloads: applying new resource parameters required evicting and restarting the pod. This disruption was often unacceptable for stateful applications, long-running connections, or latency-sensitive services. ## Introducing In-place Pod Resize (IPPR) on GKE [**In-place Pod Resize (IPPR)**](https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler?utm_campaign=CDR_0x5723eddc_default_b464422378&utm_medium=external&utm_source=blog#inplaceorrecreate_mode) changes the game by allowing Kubernetes to modify resource requests and limits on live, running containers directly through the underlying container runtime, *without* triggering a restart. By combining the intelligence of VPA with the non-disruptive nature of IPPR, GKE users finally have a viable path to dynamic, seamless, and automated right-sizing. *Note: As of writing, VPA IPPR is in Preview on GKE. While it is a massive step forward, I recommend evaluating it in staging environments before rolling it out to production workloads.* ## Getting started with IPPR To use In-place Pod Resize, you need a [GKE cluster](https://docs.cloud.google.com/kubernetes-engine/docs/concepts/choose-cluster-mode?utm_campaign=CDR_0x5723eddc_default_b464422378&utm_medium=external&utm_source=blog) running version **1.34.0-gke.2201000 or later**. * **GKE Autopilot:** VPA is enabled by default. * **GKE Standard:** Requires the Vertical Pod Autoscaling feature to be enabled. ### 1. Enable the feature If you aren't using Autopilot, ensure your cluster is created or updated with the necessary feature flags: ```shell gcloud container clusters create CLUSTER_NAME \ --project=PROJECT_ID \ --location=us-east1 \ --release-channel=rapid \ --enable-vertical-pod-autoscaling ``` ### 2. Define your VPA object Create a `VerticalPodAutoscaler` resource targeting your Deployment or StatefulSet. The crucial element here is setting `spec.updatePolicy.updateMode` to `InPlaceOrRecreate`. ```yaml apiVersion: "autoscaling.k8s.io/v1" kind: "VerticalPodAutoscaler" metadata: name: "my-vpa" spec: targetRef: apiVersion: "apps/v1" kind: "Deployment" name: "my-deployment" updatePolicy: updateMode: "InPlaceOrRecreate" ``` ### 3. Watch it scale Apply the resource to your cluster and monitor your application under load. Instead of watching Pods terminate and recreate, you can watch the resources modify live using `kubectl describe`. ```shell kubectl describe pod POD_NAME ``` Look for the *AllocatedResources* field or check the events section. You will see the requests change in real-time to match the VPA recommendations, while the *Restart Count* remains exactly the same. **The "Or Recreate" Fallback:** Keep in mind that physics still apply. If VPA recommends a resource size that exceeds the remaining capacity of the Node your Pod is currently running on, an in-place resize is impossible. In this scenario, VPA will fall back to evicting and recreating the Pod so it can be scheduled onto a larger or emptier Node. ## Ready to dive deeper? While this introduction covers the basics of IPPR, right-sizing is just one part of a robust scaling strategy. Implementing VPA often goes hand-in-hand with horizontal scaling and cluster autoscaling. Check out the guide to master scaling on GKE: [Run full-stack workloads at scale on GKE](https://cloud.google.com/kubernetes-engine/docs/tutorials/full-stack-scale?utm_campaign=CDR_0x5723eddc_default_b464422378&utm_medium=external&utm_source=blog).

Seamless scaling with VPA In-place Pod Resize on GKE

Tags

Comments

More Blog

Minimalist EKS: The Easy Way

Never forget to enter the Stern Grove lottery again!

A Free Screenshot Editor That Never Uploads Your Image

I built a CLI to break my highlights out of Apple Books

A Developer's Guide to Agent Hooks in Antigravity CLI

Tactical vs. Strategic Agentic AI Development — A Playbook for Developers