---
title: Seamless scaling with VPA In-place Pod Resize on GKE
published: true
description: Learn how VPA In-place Pod Resize can help seamlessly vertically scale workloads on Google Kubernetes Engine (GKE).
tags: kubernetes, ai, gke, googlecloud
cover_image: https://dev-to-uploads.s3.amazonaws.com/uploads/articles/uqzknnjyuuueceq6xotm.png
# Use a ratio of 100:42 for best results.
# published_at: 2026-05-20 20:24 +0000
---
Right-sizing Kubernetes workloads is a common platform engineering challenge. Set your requests too high, and you burn cloud budgets on idle capacity; set your limits too low, and your applications face throttling or dreaded OOMKills.
For years, the [**Vertical Pod Autoscaler (VPA)**](https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler?utm_campaign=CDR_0x5723eddc_default_b464422378&utm_medium=external&utm_source=blog) has been the standard answer to this problem, automatically adjusting CPU and memory requirements based on actual usage. However, this method of scaling came with a significant catch that prevented widespread adoption for critical workloads: applying new resource parameters required evicting and restarting the pod.
This disruption was often unacceptable for stateful applications, long-running connections, or latency-sensitive services.
## Introducing In-place Pod Resize (IPPR) on GKE
[**In-place Pod Resize (IPPR)**](https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler?utm_campaign=CDR_0x5723eddc_default_b464422378&utm_medium=external&utm_source=blog#inplaceorrecreate_mode) changes the game by allowing Kubernetes to modify resource requests and limits on live, running containers directly through the underlying container runtime, *without* triggering a restart.
By combining the intelligence of VPA with the non-disruptive nature of IPPR, GKE users finally have a viable path to dynamic, seamless, and automated right-sizing.
*Note: As of writing, VPA IPPR is in Preview on GKE. While it is a massive step forward, I recommend evaluating it in staging environments before rolling it out to production workloads.*
## Getting started with IPPR
To use In-place Pod Resize, you need a [GKE cluster](https://docs.cloud.google.com/kubernetes-engine/docs/concepts/choose-cluster-mode?utm_campaign=CDR_0x5723eddc_default_b464422378&utm_medium=external&utm_source=blog) running version **1.34.0-gke.2201000 or later**.
* **GKE Autopilot:** VPA is enabled by default.
* **GKE Standard:** Requires the Vertical Pod Autoscaling feature to be enabled.
### 1. Enable the feature
If you aren't using Autopilot, ensure your cluster is created or updated with the necessary feature flags:
```shell
gcloud container clusters create CLUSTER_NAME \
--project=PROJECT_ID \
--location=us-east1 \
--release-channel=rapid \
--enable-vertical-pod-autoscaling
```
### 2. Define your VPA object
Create a `VerticalPodAutoscaler` resource targeting your Deployment or StatefulSet. The crucial element here is setting `spec.updatePolicy.updateMode` to `InPlaceOrRecreate`.
```yaml
apiVersion: "autoscaling.k8s.io/v1"
kind: "VerticalPodAutoscaler"
metadata:
name: "my-vpa"
spec:
targetRef:
apiVersion: "apps/v1"
kind: "Deployment"
name: "my-deployment"
updatePolicy:
updateMode: "InPlaceOrRecreate"
```
### 3. Watch it scale
Apply the resource to your cluster and monitor your application under load. Instead of watching Pods terminate and recreate, you can watch the resources modify live using `kubectl describe`.
```shell
kubectl describe pod POD_NAME
```
Look for the *AllocatedResources* field or check the events section. You will see the requests change in real-time to match the VPA recommendations, while the *Restart Count* remains exactly the same.
**The "Or Recreate" Fallback:** Keep in mind that physics still apply. If VPA recommends a resource size that exceeds the remaining capacity of the Node your Pod is currently running on, an in-place resize is impossible. In this scenario, VPA will fall back to evicting and recreating the Pod so it can be scheduled onto a larger or emptier Node.
## Ready to dive deeper?
While this introduction covers the basics of IPPR, right-sizing is just one part of a robust scaling strategy. Implementing VPA often goes hand-in-hand with horizontal scaling and cluster autoscaling. Check out the guide to master scaling on GKE: [Run full-stack workloads at scale on GKE](https://cloud.google.com/kubernetes-engine/docs/tutorials/full-stack-scale?utm_campaign=CDR_0x5723eddc_default_b464422378&utm_medium=external&utm_source=blog).