Understanding Gitaly and Kernel Memory Consumption in Kubernetes on Self-Hosted GitLab — DeepSeek Blog | Neura Market
    Neura MarketNeura Market/DeepSeek
    ChatGPTChatGPTClaudeClaudeGeminiGeminiCursorCursorGrokGrokPerplexityPerplexityDeepSeekDeepSeek
    CoPilotCoPilotStable DiffusionStable DiffusionMidjourneyMidjourney
    View All Directories
    OverviewRulesPromptsMCPsAgentsBlogVideosGuidesCoursesCommunityTrendingGenerate
    DeepSeekBlogUnderstanding Gitaly and Kernel Memory Consumption in Kubernetes on Self-Hosted GitLab
    Back to Blog
    Understanding Gitaly and Kernel Memory Consumption in Kubernetes on Self-Hosted GitLab
    gitlab

    Understanding Gitaly and Kernel Memory Consumption in Kubernetes on Self-Hosted GitLab

    Camila February 24, 2026
    0 views

    During the early hours of the morning, I started receiving Gitaly alerts — memory spikes that weren't...

    During the early hours of the morning, I started receiving Gitaly alerts — memory spikes that weren't being released automatically after the daily backup. This article is about a **self-hosted GitLab on EKS** and the behavior of some GitLab components in Kubernetes. If you also run GitLab on Kubernetes, it's worth understanding what's really happening — and why **Cgroup v2** is the definitive solution to this kind of problem. ### **GITALY** Gitaly is the GitLab component responsible for all Git operations: clone, push, pull, merge, diff, and blame. It isolates repository storage from the web application and communicates with other services via gRPC, optimizing performance and concurrency control. ``` ┌─────────────────────┐ │ GitLab Webservice │ │ │ └──────────┬──────────┘ │ gRPC ↓ ┌─────────────────────┐ │ Gitaly │ │ - Git operations │ │ - Repository access │ └──────────┬──────────┘ ↓ ┌─────────────────────┐ │ Persistent Volume │ │ /home/git/repos │ └─────────────────────┘ ``` ### **GITLAB TOOLBOX BACKUP** It's a GitLab component used to perform backups in Kubernetes environments (specifically deployed using Helm charts). It's a pod/container that contains tools and scripts to execute GitLab backup and restore operations. When does it interact with Gitaly? During repository backups. > 1. **Connects to Gitaly via gRPC** > 2. **Requests a backup of each repository** > 3. **Receives Git bundles from Gitaly** > 4. **Processes and compresses the data** > 5. **Sends everything to object storage (S3, GCS, etc.)** During the execution of the **gitlab-toolbox-backup** cronjob in the early morning, I observed high memory usage on the Gitaly pod. This consumption is caused by the default behavior of the **Linux kernel**, which uses RAM as a cache for files read from disk (page cache). In Kubernetes environments, this behavior can create resource allocation problems, since the kernel is shared across all pods on the node. ### **Symptoms:** High memory usage ![High memory usage](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dqcsee4k2a4xunv21yu2.png) ![High memory usage](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/blvkpjpej8kef2z576r6.png) **Critical implications:** 1. The kernel is **shared** across the entire node 2. Page Cache is global and shared 3. Cgroups v1 only limits how much each container *can* use 4. **The kernel has no concept of "pod" or "container" — if the node has plenty of RAM, the kernel considers memory available even when a pod is about to be OOM-killed.** ``` ┌─────────────────────────────────────────────────┐ │ NODE │ │ │ │ ┌──────────────────────────────────────────┐ │ │ │ LINUX KERNEL (single) │ │ │ │ - Manages ALL node RAM │ │ │ │ - Page Cache is SHARED │ │ │ │ - Has no concept of "pod" │ │ │ └──────────────────────────────────────────┘ │ │ │ │ ┌────────────┐ ┌────────────┐ ┌──────────┐ │ │ │ Pod A │ │ Pod B │ │ Pod C │ │ │ │ (Gitaly) │ │ (Redis) │ │ (Web) │ │ │ │ │ │ │ │ │ │ │ │ Sees: │ │ Sees: │ │ Sees: │ │ │ │ Limit:8GB │ │ Limit:4GB │ │ Limit:2GB│ │ │ └────────────┘ └────────────┘ └──────────┘ │ │ │ │ Total RAM: 32GB │ │ Total Cache: 20GB (visible to ALL) │ └─────────────────────────────────────────────────┘ ``` _Note: Page Cache is the **RAM used by the kernel to cache files from disk**._ ### **Backup Flow** ![Image description](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ch89rjmyfegi0b2tcio7.png) ### **What happens?** **During the backup (1h):** 1. Gitaly reads hundreds of Git repositories 2. Kernel caches everything: "I'll keep these .git files in RAM" 3. Backup ends: Gitaly process returns to normal (195MB) 4. Kernel doesn't clean up: Cache stays marked as "active_file" = 35.6GB 5. Kubernetes sees: Pod using 37GB → OOM danger! **Why doesn't it clean up automatically?** The cache is marked as "active" (not "inactive"), so the kernel thinks: > "These files were recently used" > "They'll probably be used again soon" > "I'll keep them in RAM" But since this is a backup that runs once a day, those files won't be accessed again until tomorrow! ### Possible solutions evaluated: | Option | Effort | Benefit | Recommendation | |:------|:--------|:-----------|:--------------| | _Migrate to cgroup v2_ | High (node reboot) | Definitive fix | Best long-term option | | _Privileged CronJob to drop cache_ | Low (15min) | Solves the problem | If you need a quick fix | | _DaemonSet monitor_ | Medium (1h) | Automated | Optional | | _Increase memory limit_ | Low | Temporary workaround | Emergency only | As we can see, there are workarounds — but the best long-term option is Cgroup v2. It requires a bit more effort to implement, but the benefits make it stand out. **Current Cgroup v1 data:** ``` cache: 38829035520 # 36.2 GB !!!!! rss: 204779520 # 195 MB inactive_file: 568246272 # 542 MB active_file: 38260654080 # 35.6 GB !!!!! **35.6GB of `active_file`** = actively cached files (page cache)! Breakdown: - Gitaly process (RSS): 195 MB - Active file cache: 35.6 GB ← HERE! - Inactive file cache: 542 MB - Total cache: 36.2 GB ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Total pod usage: ~37 GB ``` ### **Cgroup v2** Cgroup v2 has a feature called PSI that detects when there is memory "pressure": ``` # cgroup v2 exposes: /sys/fs/cgroup/memory.pressure # Content: some avg10=0.00 avg60=0.00 avg300=0.00 total=0 full avg10=0.00 avg60=0.00 avg300=0.00 total=0 ``` When pressure is detected, the **kernel automatically** releases cache even if it's marked as "active"! Cgroup v2 is the second generation of the Linux kernel's control groups system, with significant improvements over v1. Our GitLab EKS cluster is currently running Cgroup v1. **Cgroup v1** has multiple independent hierarchies (memory, cpu, io), which can cause inconsistencies. Cgroup v2 uses a single unified tree: ![Image description](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/eu3im3dtnt76jolslsdz.png) That's it! I had the chance to dig into this topic this week and wanted to share what I learned. Docs: [backup-restore](https://docs.gitlab.com/charts/backup-restore/) [Kernel Tuning and Optimization for Kubernetes: A Guide](https://overcast.blog/kernel-tuning-and-optimization-for-kubernetes-a-guide-a3bdc8f7d255) [Linux Kernel Version Requirements](https://kubernetes.io/docs/reference/node/kernel-version-requirements/)

    Tags

    gitlablinuxkerneldevops

    Comments

    More Blog

    View all
    How I'm using ASTs and Gemini to solve the "Codebase Onboarding" problem 🧠ai

    How I'm using ASTs and Gemini to solve the "Codebase Onboarding" problem 🧠

    Hi everyone! 👋 I’m Tara, a Senior Software Engineer and Consultant. Over the years, I've jumped...

    T
    tworrell
    Local AI Will Save Us All (The Math Says So, Trust Me)ai

    Local AI Will Save Us All (The Math Says So, Trust Me)

    Every few weeks a take goes viral in tech circles making the case for ditching cloud AI and running...

    S
    Sebastian Schürmann
    Lost in the AI Hype, I Started Smallai

    Lost in the AI Hype, I Started Small

    And it helped me get back into tech without drowning TL;DR at the end Coming back to...

    R
    Rohini Gaonkar
    Building a Replay-Tested Interactive Brokers Client in Gogo

    Building a Replay-Tested Interactive Brokers Client in Go

    I wanted an IBKR library that felt like Go and had testing I could trust. So I wrote one.

    T
    Thomas Marcelis
    Playwright in Pictures: Fully Parallel Modeplaywright

    Playwright in Pictures: Fully Parallel Mode

    Playwright’s fullyParallel mode is often treated as a simple performance switch. In practice, it...

    V
    Vitaliy Potapov
    Designing a CLI for Both Humans and Agentscli

    Designing a CLI for Both Humans and Agents

    Learn how Alpic designed its CLI for both human developers and AI agents — covering tradeoffs like polling, context windows, interactivity, and statelessness.

    J
    Julien Vallini

    Stay up to date

    Get the latest DeepSeek prompts, rules, and resources delivered to your inbox weekly.

    Neura Market LogoNeura Market

    Discover the best AI prompts, plugins, and resources for DeepSeek and more.

    Content Types

    • Rules
    • Prompts
    • MCPs
    • Agents
    • Guides

    Platforms

    • ChatGPT Directory
    • Claude Directory
    • Gemini Directory
    • Cursor Directory
    • Grok Directory
    • Perplexity Directory
    • DeepSeek Directory
    • CoPilot Directory
    • Stable Diffusion Directory
    • Midjourney Directory
    • All Directories

    Resources

    • Blog
    • Documentation
    • Help Center
    • Marketplace

    Legal

    • Privacy Policy
    • Terms of Service

    © 2026 Neura Market. All rights reserved.

    |

    Not affiliated with any AI platform vendors.