How a "Simple" QR Code Generator Ate All My RAM: A Tale of 50,000 QR Codes — DeepSeek Blog | Neura Market
    Neura MarketNeura Market/DeepSeek
    ChatGPTChatGPTClaudeClaudeGeminiGeminiCursorCursorGrokGrokPerplexityPerplexityDeepSeekDeepSeek
    CoPilotCoPilotStable DiffusionStable DiffusionMidjourneyMidjourney
    View All Directories
    OverviewRulesPromptsMCPsAgentsBlogVideosGuidesCoursesCommunityTrendingGenerate
    DeepSeekBlogHow a "Simple" QR Code Generator Ate All My RAM: A Tale of 50,000 QR Codes
    Back to Blog
    How a "Simple" QR Code Generator Ate All My RAM: A Tale of 50,000 QR Codes
    python

    How a "Simple" QR Code Generator Ate All My RAM: A Tale of 50,000 QR Codes

    Budi Widhiyanto February 24, 2026
    0 views

    Sometimes the simplest tasks can become the biggest headaches. Here's how I learned that data size...

    *Sometimes the simplest tasks can become the biggest headaches. Here's how I learned that data size matters more than code complexity.* --- ## The Innocent Beginning It started with a straightforward request: generate 50,000 unique QR codes for a project. "How hard could it be?" I thought. Python has excellent libraries for this. A quick script, a PDF output, done by lunch. I was wrong. Very wrong. What I didn't anticipate was that my "simple" script would consume every byte of RAM on my machine, freeze my computer, and teach me an important lesson about thinking at scale. Let me walk you through what happened, how I fixed it, and what you can learn from my mistakes. ## The Original Approach: Looks Good on Paper Here's the approach I initially took. Generate all the QR codes first, cache them in memory, then write them to a PDF. It sounds logical, right? Pre-compute everything, then assemble the final output. ```python def generate_pdf(output_path: str, total: int = 50000): ids = generate_unique_ids(total) # Pre-generate ALL QR codes in parallel for "speed" print(f"Pre-generating {total} QR codes in parallel...") num_workers = cpu_count() # Split IDs into batches for parallel processing batch_size = max(1, total // (num_workers * 4)) batches = [ids[i:i + batch_size] for i in range(0, len(ids), batch_size)] # Generate QR codes in parallel using multiprocessing qr_cache = {} with Pool(num_workers) as pool: results = list(tqdm( pool.imap(generate_qr_batch, batches), total=len(batches), desc="Generating QR codes" )) # Store ALL images in memory for batch_result in results: for uid, img_bytes in batch_result: buf = io.BytesIO(img_bytes) qr_cache[uid] = ImageReader(buf) # NOW create the PDF using cached images # ... PDF generation code ... ``` I was proud of this code. Multiprocessing! Parallel execution! Batch processing! All the buzzwords that make you feel like a "real" programmer. Then I ran it. ## The Disaster Unfolds The script started running. Progress bars moved. CPU usage spiked to 100% across all cores. "Excellent," I thought, "parallel processing doing its thing." Then I noticed my system getting sluggish. Browser tabs stopped responding. My IDE froze. I opened the system monitor and watched in horror as my RAM usage climbed: - 2 GB... - 4 GB... - 8 GB... - 12 GB... My laptop has 16 GB of RAM. The script was devouring it all. Before I could react, the OOM (Out of Memory) killer struck. Process terminated. No PDF. Just a frozen computer and a lesson learned the hard way. ## Understanding the Problem After my system recovered, I sat down to analyze what went wrong. Let me break down the math: Each QR code image: - Resolution: 400 × 400 pixels - Format: PNG in memory - Approximate size: 15-30 KB per image (compressed) - But in memory as a PIL Image object: ~500 KB - 1 MB Scale it up: - 50,000 QR codes × ~500 KB = ~25 GB of RAM Even with the compressed PNG byte representation, we're looking at: - 50,000 × 20 KB = ~1 GB just for the image bytes - Plus the ImageReader objects - Plus the BytesIO buffers - Plus Python's memory overhead - Plus multiprocessing duplicating data across workers The actual memory consumption was somewhere between 2-4 GB, which was still way more than what should be acceptable for such a "simple" task. The fundamental flaw in my approach was this: I was optimizing for speed when I should have been optimizing for resource consumption. ## The Fix: Think Like a Stream, Not a Lake The solution was embarrassingly simple once I understood the problem. Instead of loading all 50,000 QR codes into memory at once (a "lake" of data), I needed to process them as a stream—one page at a time. Here's the key insight: A PDF with 50,000 QR codes has about 1,667 pages (30 QR codes per page). I only need to hold 30 QR codes in memory at any given time—the ones for the current page. Here's the refactored approach: ```python def generate_pdf(output_path: str, total: int = 50000): ids = generate_unique_ids(total) total_pages = (total + PER_PAGE - 1) // PER_PAGE # Create PDF canvas c = canvas.Canvas(output_path, pagesize=A4) # Process ONE PAGE at a time for page_start in tqdm(range(0, total, PER_PAGE), desc="Generating PDF pages"): page_ids = ids[page_start : page_start + PER_PAGE] # Generate QR codes ONLY for this page page_qr_cache = {} for uid in page_ids: img = make_qr_image(uid) page_qr_cache[uid] = img_to_reader(img) # Draw this page for idx, uid in enumerate(page_ids): # ... draw QR code to PDF ... c.drawImage(page_qr_cache[uid], qr_x, qr_y, ...) c.showPage() # CRITICAL: Clear the cache after each page! page_qr_cache.clear() c.save() ``` The key changes: 1. Generate per-page: Only create QR codes for the 30 items on the current page 2. Clear after use: Explicitly clear the page cache after each page is written 3. No multiprocessing overhead: Removed the parallel processing that was duplicating data ## The Trade-off: Speed vs. Safety Let's be honest about the trade-offs: | Metric | Original (Parallel) | Optimized (Per-Page) | | ------------ | ------------------------- | --------------------- | | Memory Usage | 2-4 GB | 50-100 MB | | Speed | Faster (theoretically) | Slower | | Stability | Crashes on large datasets | Stable | | Scalability | Limited by RAM | Limited by disk space | Yes, the optimized version is slower. Without parallel processing, we're generating QR codes sequentially. For 50,000 codes, the execution time went from "crash before completion" to "about 30-45 minutes of stable execution." But here's the thing: a slow script that completes is infinitely faster than a fast script that crashes. I ran the optimized version overnight. When I woke up, both PDF files (100,000 QR codes total) were sitting there, ready to use. My computer was fine. No crashes. No freezing. Just steady, predictable progress. ## Lessons Learned ### 1. Data Size Changes Everything A script that works perfectly for 100 items might explode at 10,000 items. Always ask yourself: "What happens when this scales 10x? 100x? 1000x?" In my case, the script probably worked fine during testing with small batches. It was only at production scale that the memory issue became catastrophic. ### 2. Memory is Not Infinite This sounds obvious, but it's easy to forget when you're writing code. Every object you create lives somewhere in memory. When you're dealing with images, those objects can be surprisingly large. ```python # This innocent-looking line... qr_cache[uid] = ImageReader(buf) # ...executed 50,000 times becomes a memory bomb ``` ### 3. Parallel ≠ Better Parallel processing is great for CPU-bound tasks where you have enough memory to support multiple workers. But when each worker is creating large objects, parallelism can actually make things worse by multiplying memory usage. Sometimes, a simple sequential loop is the right answer. ### 4. Clear Your References Python's garbage collector is good, but it's not magic. If you're holding references to large objects in a dictionary or list, that memory won't be freed until you explicitly remove those references. ```python # This single line saved gigabytes of RAM page_qr_cache.clear() ``` ### 5. Progress Bars Are Your Friend When you're running long-executing tasks, always add progress bars. The `tqdm` library makes this trivially easy: ```python for page_start in tqdm(range(0, total, PER_PAGE), desc="Generating PDF pages"): # ... your code ... ``` Not only does this give you feedback on how long the task will take, but it also helps you identify when something is wrong. If the progress bar stalls, you know there's a problem. ## The Bigger Picture: Thinking About Resources This experience changed how I approach coding problems. Now, before I write any code that deals with data at scale, I ask myself three questions: 1. What's the memory footprint per item? 2. How many items will I process? 3. Can I process items one at a time instead of all at once? This is especially important in scenarios like: - Image processing: Images are memory-hungry - Data pipelines: Processing large CSV/JSON files - API responses: Paginating through thousands of records - File operations: Reading/writing large files The pattern is always the same: stream when you can, batch when you must, and never load everything into memory unless you absolutely have to. ## Practical Tips for Your Own Projects If you're working on a similar task—generating large numbers of images, processing big datasets, or handling any kind of bulk operation—here are some practical tips: ### Use Generators Instead of Lists ```python # Bad: Creates a list of 50,000 items in memory ids = [generate_id() for _ in range(50000)] # Better: Generates one at a time def id_generator(count): for _ in range(count): yield generate_id() ``` ### Process in Chunks ```python # Instead of processing all at once for item in huge_list: process(item) # Process in manageable chunks chunk_size = 100 for i in range(0, len(huge_list), chunk_size): chunk = huge_list[i:i + chunk_size] for item in chunk: process(item) # Clean up after each chunk gc.collect() # Force garbage collection if needed ``` ### Monitor Your Memory Usage Add memory monitoring to long-running scripts: ```python import psutil import os def get_memory_usage(): process = psutil.Process(os.getpid()) return process.memory_info().rss / 1024 / 1024 # MB # In your loop for i, item in enumerate(items): process(item) if i % 1000 == 0: print(f"Processed {i} items, Memory: {get_memory_usage():.1f} MB") ``` ### Set Memory Limits For critical scripts, you can set memory limits to prevent runaway consumption: ```python import resource # Limit memory to 1GB resource.setrlimit(resource.RLIMIT_AS, (1024 * 1024 * 1024, -1)) ``` ## Conclusion My "simple" QR code generator turned into a valuable lesson about resource management. The original code was clever—parallel processing, batch operations, caching. But clever code that doesn't work is worse than simple code that does. The final version generates 100,000 QR codes across two PDF files. It takes about an hour to run. It uses less than 100 MB of RAM. And most importantly, it completes successfully every single time. Sometimes the best optimization isn't making your code faster—it's making it actually work. The next time you're writing code that processes data at scale, remember: think about memory first, speed second. A slow script that completes is infinitely more valuable than a fast script that crashes. --- TL;DR: I tried to generate 50,000 QR codes by loading them all into memory at once. My computer ran out of RAM and crashed. The fix was simple: generate QR codes one page at a time (30 at a time instead of 50,000). It's slower, but it works. Always consider memory usage when working with data at scale.

    Tags

    pythonperformanceoptimization

    Comments

    More Blog

    View all
    How I'm using ASTs and Gemini to solve the "Codebase Onboarding" problem 🧠ai

    How I'm using ASTs and Gemini to solve the "Codebase Onboarding" problem 🧠

    Hi everyone! 👋 I’m Tara, a Senior Software Engineer and Consultant. Over the years, I've jumped...

    T
    tworrell
    Local AI Will Save Us All (The Math Says So, Trust Me)ai

    Local AI Will Save Us All (The Math Says So, Trust Me)

    Every few weeks a take goes viral in tech circles making the case for ditching cloud AI and running...

    S
    Sebastian Schürmann
    Lost in the AI Hype, I Started Smallai

    Lost in the AI Hype, I Started Small

    And it helped me get back into tech without drowning TL;DR at the end Coming back to...

    R
    Rohini Gaonkar
    Building a Replay-Tested Interactive Brokers Client in Gogo

    Building a Replay-Tested Interactive Brokers Client in Go

    I wanted an IBKR library that felt like Go and had testing I could trust. So I wrote one.

    T
    Thomas Marcelis
    Playwright in Pictures: Fully Parallel Modeplaywright

    Playwright in Pictures: Fully Parallel Mode

    Playwright’s fullyParallel mode is often treated as a simple performance switch. In practice, it...

    V
    Vitaliy Potapov
    Designing a CLI for Both Humans and Agentscli

    Designing a CLI for Both Humans and Agents

    Learn how Alpic designed its CLI for both human developers and AI agents — covering tradeoffs like polling, context windows, interactivity, and statelessness.

    J
    Julien Vallini

    Stay up to date

    Get the latest DeepSeek prompts, rules, and resources delivered to your inbox weekly.

    Neura Market LogoNeura Market

    Discover the best AI prompts, plugins, and resources for DeepSeek and more.

    Content Types

    • Rules
    • Prompts
    • MCPs
    • Agents
    • Guides

    Platforms

    • ChatGPT Directory
    • Claude Directory
    • Gemini Directory
    • Cursor Directory
    • Grok Directory
    • Perplexity Directory
    • DeepSeek Directory
    • CoPilot Directory
    • Stable Diffusion Directory
    • Midjourney Directory
    • All Directories

    Resources

    • Blog
    • Documentation
    • Help Center
    • Marketplace

    Legal

    • Privacy Policy
    • Terms of Service

    © 2026 Neura Market. All rights reserved.

    |

    Not affiliated with any AI platform vendors.