How I Spent a Day Trying to Recover a Crashed OpenStack Environment — And What I Learned — DeepSeek Blog | Neura Market
    Neura MarketNeura Market/DeepSeek
    ChatGPTChatGPTClaudeClaudeGeminiGeminiCursorCursorGrokGrokPerplexityPerplexityDeepSeekDeepSeek
    CoPilotCoPilotStable DiffusionStable DiffusionMidjourneyMidjourney
    View All Directories
    OverviewRulesPromptsMCPsAgentsBlogVideosGuidesCoursesCommunityTrendingGenerate
    DeepSeekBlogHow I Spent a Day Trying to Recover a Crashed OpenStack Environment — And What I Learned
    Back to Blog
    How I Spent a Day Trying to Recover a Crashed OpenStack Environment — And What I Learned
    linux

    How I Spent a Day Trying to Recover a Crashed OpenStack Environment — And What I Learned

    Oyoh Edmond April 2, 2026
    0 views

    A real-world incident report for engineers dealing with filesystem corruption on production Linux...

    *A real-world incident report for engineers dealing with filesystem corruption on production Linux servers* --- ## The Problem It started with a simple complaint: our company's OpenStack Horizon portal was unreachable. The browser returned `ERR_CONNECTION_TIMED_OUT`. No warning, no gradual degradation — just gone. We had two physical HPE ProLiant DL380 Gen10 servers running the environment, accessible only via HP iLO 5 remote console. No physical access. No one near the data centre. Just me, a browser, and an iLO HTML5 console. This is the story of what happened, what we tried, what failed, and what every engineer should know before they find themselves in the same situation. --- ## The Environment - **Controller Node**: HPE ProLiant DL380 Gen10 (12-core) - **Compute Node**: HPE ProLiant DL380 Gen10 (10-core) - **OS**: Ubuntu 22.04 LTS - **Storage**: LVM on top of hardware RAID (HPE Smart Array P408i-a) - **Access**: HP iLO 5 remote console (HTML5) - **VPN**: FortiClient VPN required to reach internal network --- ## Step 1 — Diagnosing the Problem The first thing I noticed was that pinging the servers returned `Destination host unreachable` even on VPN. This ruled out a simple service crash — something was fundamentally wrong at the OS level. Opening the iLO console for the controller node revealed the server was stuck in a **BusyBox initramfs emergency shell** with the following critical errors: ```plaintext UNEXPECTED INCONSISTENCY: RUN fsck MANUALLY Failure: File system check of the root filesystem failed The root filesystem requires a manual fsck ``` **Lesson #1**: Always check iLO/IPMI console first. The OS may be completely down while the management interface is still accessible. --- ## Step 2 — The Filesystem Corruption The root filesystem was on an LVM logical volume. The initramfs had tried to run an automatic fsck and failed. The errors pointed to: 1. **Superblock corruption** — the filesystem size recorded in the superblock was larger than the actual LVM volume 2. **Journal corruption** — e2fsck could not set superblock flags 3. **Thousands of corrupted inodes** — invalid flags, bad extended attributes, wrong inode sizes The size mismatch error was particularly telling: ```plaintext The filesystem size is 288358400 blocks The physical size of the device is 285474816 blocks Either the superblock or the partition table is likely to be corrupt ``` **Lesson #2**: A filesystem size larger than the physical device usually means the LVM volume was shrunk without first shrinking the filesystem, or the superblock was corrupted during an unclean shutdown. --- ## Step 3 — Recovery Attempts in initramfs The initramfs environment is extremely limited. Here is what we tried and the results: ### Activating LVM Volumes ```bash vgchange -ay ``` ✅ This worked and activated all volume groups. ### Running e2fsck with Backup Superblock ```bash e2fsck -y -b 32768 /dev/mapper/<your-lv-name> ``` ⚠️ This started working but kept getting killed by the OOM (Out of Memory) killer because initramfs has very limited RAM available for processes. ### Extending the LVM Volume ```bash lvm lvextend -l +100%FREE /dev/<vg-name>/<lv-name> ``` ✅ This successfully extended the volume to match what the filesystem expected. ### Rewriting the Superblock ```bash mke2fs -S -b 4096 /dev/mapper/<your-lv-name> ``` ✅ The superblock was rewritten. e2fsck then started making real progress fixing inodes. ### Creating Swap to Help with OOM ```bash dd if=/dev/zero of=/swapfile bs=1048576 count=4096 mkswap /swapfile # swapon /swapfile — NOT AVAILABLE in initramfs ``` ❌ `swapon` is not available in initramfs. This is a critical limitation. **Lesson #3**: The initramfs environment is missing many essential tools including `swapon`, `resize2fs`, `tune2fs`, `debugfs`, and `lvextend`. Plan for this limitation before you need it. --- ## Step 4 — The OOM Problem Every time e2fsck got deep into repairing the large volume, the kernel OOM killer terminated it: ```plaintext Out of memory: Killed process (e2fsck) ``` The server had significant RAM but initramfs was only making a small portion available for user processes. Without swap, e2fsck couldn't complete the repair. **Lesson #4**: For large filesystems (500GB+), e2fsck requires significant RAM. Always ensure swap is available before running fsck on large volumes. If you're in initramfs without swap, you need a different approach. --- ## Step 5 — Attempting to Boot from Live ISO We tried to boot Ubuntu 20.04 Live Server from an ISO mounted via iLO Virtual Media. This would have given us a full Ubuntu environment with all tools. The challenges we encountered: - iLO Virtual Media URL-based ISO streaming was too slow - Local ISO file mounting via iLO HTML5 console worked better - The ISO was detected as a Virtual CD-ROM by the kernel - However, the server's UEFI boot order did not include the virtual CD-ROM - The virtual CD-ROM did not appear in the UEFI one-time boot menu **Lesson #5**: Test your iLO Virtual Media boot process BEFORE you need it in an emergency. Know whether your server's UEFI will boot from iLO virtual media and in what order. --- ## Step 6 — UEFI Shell to the Rescue (Partially) We discovered the HPE Embedded UEFI Shell under: **System Utilities → Embedded Applications → Embedded UEFI Shell** From there we could launch the GRUB bootloader directly: ```console fs0: cd EFI\ubuntu shimx64.efi ``` This gave us access to the GRUB menu and boot parameter editing. We modified the boot parameters to skip fsck: ```console linux /vmlinuz-<version>-generic root=/dev/mapper/<lv-name> ro fsck.mode=skip ``` Unfortunately the filesystem was too corrupted to mount even with fsck skipped. **Lesson #6**: The HPE Embedded UEFI Shell is a powerful recovery tool. Learn how to use it. It can launch bootloaders directly from the EFI partition without needing a working boot order. --- ## Step 7 — The Final Verdict After extensive repair attempts, the final error was: ```plaintext EXT4-fs error: inode #2: special inode unallocated get root inode failed mount failed ``` **Inode #2 is the root directory inode** — the most critical inode in any ext4 filesystem. When this is destroyed, the filesystem cannot be mounted under any circumstances without specialist data recovery tools. **Lesson #7**: If `inode #2` is corrupted, you need either a backup restore or professional data recovery. No amount of e2fsck will fix a destroyed root inode. --- ## What Should Have Been Done Differently ### Before the Incident 1. **Regular backups** — snapshots of the LVM volume or VM-level backups 2. **Monitoring** — disk health monitoring (smartctl), filesystem error monitoring 3. **Documentation** — record all credentials, architecture diagrams, and recovery procedures 4. **Test recovery** — periodically test that backups can actually be restored 5. **Swap space** — ensure servers have adequate swap configured ### During the Incident 1. **Boot from USB first** — don't spend hours in initramfs; immediately boot from a live USB with full tools 2. **Create swap immediately** — before running e2fsck on large volumes, ensure swap is available 3. **Use a higher-level backup superblock** — if 32768 doesn't work, try 98304 or 163840 4. **Document every command** — keep a log of everything you try ### Tools You Need Available - A bootable Ubuntu Live USB drive (or ISO ready for iLO virtual media) - `resize2fs`, `tune2fs`, `debugfs` — not available in initramfs - `swapon` — not available in initramfs - Adequate RAM (at least 8GB free) for e2fsck on large volumes --- ## Key Commands Reference ```bash # Activate LVM volumes from initramfs vgchange -ay # List mapper devices ls /dev/mapper/ # Find backup superblocks dumpe2fs /dev/mapper/<device> | grep -i superblock # Run fsck with backup superblock e2fsck -y -b 32768 /dev/mapper/<device> # Extend LVM volume (using lvm wrapper in initramfs) lvm lvextend -l +100%FREE /dev/<vg-name>/<lv-name> # Rewrite superblock (does NOT destroy data) mke2fs -S -b 4096 /dev/mapper/<device> # Create swap file dd if=/dev/zero of=/swapfile bs=1048576 count=4096 mkswap /swapfile swapon /swapfile # (not available in initramfs) # Mount filesystem read-only mount -o ro /dev/mapper/<device> /mnt # Chroot into recovered system chroot /mnt /bin/bash ``` --- ## Conclusion Filesystem corruption at the inode level is one of the most serious failures a Linux system administrator can face. The key takeaways from this incident are: 1. **Backups are not optional** — this entire incident would have been resolved in minutes with a good backup 2. **Know your recovery tools** — understand the limitations of initramfs before you need it 3. **iLO/IPMI is your lifeline** — invest time in learning your server's management interface 4. **Large filesystems need special care** — e2fsck on a 1TB+ volume needs RAM, swap, and time 5. **Document everything** — credentials, architecture, and recovery procedures must be documented and accessible If you find yourself in a similar situation, I hope this article saves you some of the hours I spent learning these lessons the hard way. --- *If this article helped you, please clap and share. If you have questions or have been through a similar experience, leave a comment below.* *Tags: #Linux #OpenStack #SysAdmin #DevOps #DisasterRecovery #Ubuntu #LVM #Filesystem #HPE #iLO*

    Tags

    linuxopenstacksysadmindevops

    Comments

    More Blog

    View all
    How I'm using ASTs and Gemini to solve the "Codebase Onboarding" problem 🧠ai

    How I'm using ASTs and Gemini to solve the "Codebase Onboarding" problem 🧠

    Hi everyone! 👋 I’m Tara, a Senior Software Engineer and Consultant. Over the years, I've jumped...

    T
    tworrell
    Local AI Will Save Us All (The Math Says So, Trust Me)ai

    Local AI Will Save Us All (The Math Says So, Trust Me)

    Every few weeks a take goes viral in tech circles making the case for ditching cloud AI and running...

    S
    Sebastian Schürmann
    Lost in the AI Hype, I Started Smallai

    Lost in the AI Hype, I Started Small

    And it helped me get back into tech without drowning TL;DR at the end Coming back to...

    R
    Rohini Gaonkar
    Building a Replay-Tested Interactive Brokers Client in Gogo

    Building a Replay-Tested Interactive Brokers Client in Go

    I wanted an IBKR library that felt like Go and had testing I could trust. So I wrote one.

    T
    Thomas Marcelis
    Playwright in Pictures: Fully Parallel Modeplaywright

    Playwright in Pictures: Fully Parallel Mode

    Playwright’s fullyParallel mode is often treated as a simple performance switch. In practice, it...

    V
    Vitaliy Potapov
    Designing a CLI for Both Humans and Agentscli

    Designing a CLI for Both Humans and Agents

    Learn how Alpic designed its CLI for both human developers and AI agents — covering tradeoffs like polling, context windows, interactivity, and statelessness.

    J
    Julien Vallini

    Stay up to date

    Get the latest DeepSeek prompts, rules, and resources delivered to your inbox weekly.

    Neura Market LogoNeura Market

    Discover the best AI prompts, plugins, and resources for DeepSeek and more.

    Content Types

    • Rules
    • Prompts
    • MCPs
    • Agents
    • Guides

    Platforms

    • ChatGPT Directory
    • Claude Directory
    • Gemini Directory
    • Cursor Directory
    • Grok Directory
    • Perplexity Directory
    • DeepSeek Directory
    • CoPilot Directory
    • Stable Diffusion Directory
    • Midjourney Directory
    • All Directories

    Resources

    • Blog
    • Documentation
    • Help Center
    • Marketplace

    Legal

    • Privacy Policy
    • Terms of Service

    © 2026 Neura Market. All rights reserved.

    |

    Not affiliated with any AI platform vendors.