I Built a Compiler with AI Engineering Over a Weekend. These are 3 Core Strategies for Scalable AI Development — DeepSeek Blog | Neura Market
    Neura MarketNeura Market/DeepSeek
    ChatGPTChatGPTClaudeClaudeGeminiGeminiCursorCursorGrokGrokPerplexityPerplexityDeepSeekDeepSeek
    CoPilotCoPilotStable DiffusionStable DiffusionMidjourneyMidjourney
    View All Directories
    OverviewRulesPromptsMCPsAgentsBlogVideosGuidesCoursesCommunityTrendingGenerate
    DeepSeekBlogI Built a Compiler with AI Engineering Over a Weekend. These are 3 Core Strategies for Scalable AI Development
    Back to Blog
    I Built a Compiler with AI Engineering Over a Weekend. These are 3 Core Strategies for Scalable AI Development
    rust

    I Built a Compiler with AI Engineering Over a Weekend. These are 3 Core Strategies for Scalable AI Development

    Yaser Al-Najjar February 24, 2026
    0 views

    I Built a Compiler with AI Engineering Over a Weekend. These are 3 Core Strategies for Scalable AI Development

    --- title: I Built a Compiler with AI Engineering Over a Weekend. These are 3 Core Strategies for Scalable AI Development published: true description: I Built a Compiler with AI Engineering Over a Weekend. These are 3 Core Strategies for Scalable AI Development tags: rust,python cover_image: https://dev-to-uploads.s3.amazonaws.com/uploads/articles/baf7czsdzf9hih9ah909.png # Use a ratio of 100:42 for best results. # published_at: 2026-02-24 18:24 +0000 --- You know that feeling when you've been doing something for years, and then someone comes along and says "nah, throw all that away"? That is exactly how I felt reading Cursor's blog post about [self-driving codebases](https://cursor.com/blog/self-driving-codebases). Don't get me wrong, I do believe this is impressive. 3M+ lines of code. Approximately 1,000 commits per hour. Thousands of agents working together to build a web browser. But something about it bugged me. It ignores *everything* we have learned about software engineering. ## Wait, what is wrong with 1,000 commits per hour? As you know: **Throughput is not progress.** Maybe 10 meaningful commits targeting goals we would like to achieve would be more helpful. The Cursor approach optimizes for raw output. More agents, more commits, more lines of code. But after years of building software the "right way," here is what I know matters: * **Agile development**, meaning time-bounded sprints with scoped work, not infinite agent swarms. * **Meaningful changes** over large volume. 10 thoughtful PRs might beat 1,000 commits. * **Strong feedback loops** like tests, CI, and code review, rather than just hoping the agents figure it out. * **Architecture decisions and interface contracts** backed by documented reasoning, not emergent chaos. So when I set out to build [Sifr](https://github.com/yaseralnajjar/sifr), a compiled programming language that uses Python syntax and compiles to Rust, I decided to do it with AI engineering. But I wanted *disciplined* agents. The kind that follow a process. The kind that write PRDs, create tickets, review each other's code, and do not merge without passing tests. And let me tell you, it works. Really well. 😄 ## The project: a whole programming language Before we get into the workflow, let me give you a taste of what we are building. Sifr is a compiled language with: * Python syntax plus static typing * Compilation to Rust for native binaries * A borrow-by-default ownership model * TypeScript-style union types, type narrowing, and protocols * Over 45 standard library modules with zero-panic guarantees * 21 planned phases, with 11 completed and over 80 milestones This is not a toy. It is a full compiler pipeline (lexer, parser, AST, binder, type checker, HIR, Rust codegen, rustc, and finally binary) with a roadmap stretching from language foundations all the way to a web framework, package manager, and ecosystem. And it was built almost entirely using AI engineering following the workflow I am about to describe. **Sponsorship note:** This project was initially sponsored by **CDON**, a leading marketplace in the Nordics (Sweden, Norway, Denmark, Finland). ## The basic workflow: implementing a feature Let's start small. Here is how a single Task moves through the board, from an idea to merged code. ![Basic Workflow](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/argr1dixpgqr9kfogwez.png) The Task moves across board columns as it progresses: **Backlog -> Ready -> In Progress -> Review -> Done.** Each step maps to a real action that an AI agent can execute. 1. **Draft the Task.** The agent writes a Task with the current situation, desired situation, and a suggested solution. This is scoped to a small number of changes. 2. **Add to the board.** The agent creates a GitHub issue and adds it to the project board. The Task lands in the **Backlog**. 3. **Refine & prioritize.** The agent assesses effort versus value and moves the highest-priority Tasks to **Ready**. 4. **Work on the Task.** The agent picks up the highest-priority Ready Task, creates a branch, implements the changes, runs tests locally, and creates a PR. This PR uses a template that requires an issue link, bullet-point changes, and deployment considerations. The Task moves to **Review**. 5. **Review the PR.** A *different* agent, preferably a different model, reviews the PR for logic bugs, unnecessary complexity, test coverage, style, and architecture alignment. 6. **Adjust.** The implementing agent addresses review comments. 7. **Merge.** The PR merges, and the Task moves to **Done**. Ship it. ### What does a Task actually look like? Let me show you a real example from Sifr. Here is **Task #100: Expand Built-in Functions**: * **Current Situation:** `max(a, b)` and `min(a, b)` with two arguments are not supported (only the list form `max([1, 2])` works). * **Desired Situation:** All common Python built-in function signatures should work. * **Suggested Solution:** Update the compiler's lowering phase to handle 2-argument `max`/`min` `. * **Acceptance Criteria:** `max(1, 2)` returns `2` That is it. Small, focused, concrete. The agent implemented this in one PR. > **🚨 Gotcha:** You do not want to become the bottleneck. Make sure that shipping does not require you to be in the middle. That includes manual testing, manual clicking, manual deployment, all of it. If you are the human doing QA on every PR, you have defeated the purpose. ## But what about bigger features? A single Task is great for "add a `len()` method to strings." But what about "implement a borrow checker"? That is where **Epics** come in. ![Epic Workflow](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/o8uo5tvj03rzu87qvn30.png) Here is the critical insight. **Every Epic starts with a PRDS, a combined PRD and Solution Design document.** The agent does not just jump in and start coding. It uses a specific tool to write a single structured document covering *both* sides: * **Product requirements**: problem statement, goals, scope, constraints, acceptance criteria (with Given/When/Then). * **Solution design**: architecture, data model, API design, error handling, testing strategy, trade-offs. The PRDS gets added to the board as an Epic, refined, and then comes the part that matters most. **Step 4 is a human reviewing the PRDS.** This is the human-in-the-loop checkpoint. You are not reviewing 50 PRs. You are reviewing *one document* that shapes all of them. Once approved, the Epic gets broken down into smaller Tasks, and those Tasks follow the basic workflow above. And here is a step the blog posts never talk about: **the Epic demo.** Before marking an Epic as Done, you create a working demo that showcases all major features delivered. In Sifr, these live in a `./demos` folder, each named after the Epic. If the demo does not work, the Epic is not done. Simple as that. ### What does an Epic look like? Here is **Epic: Add `collections.Counter`**. This epic was about adding the first class-based API to the standard library. * **Objective:** Users need to count hashable objects easily. Implement `collections.Counter`. * **Scope:** * Define `class Counter` in `lib/sifr/collections.sifr`. * Implement methods: `__init__`, `most_common`, `total`, `update`, `keys`, `values`. * Add necessary rust implementation to support these methods. * **Solution Design:** * **Data Structure:** Wrap a Rust `HashMap` but expose it as a Python class. * **API:** Match Python's `Counter` API exactly. * **Testing:** Verify counting works, `most_common` returns sorted results, and empty counters behave correctly. * **Acceptance Criteria:** `from sifr.collections import Counter` works, and `Counter("hello").most_common(1)` returns `[('l', 2)]`. The agents broke this down into tasks: implement the intrinsics, implement the Sifr class, add tests, and create a demo. > **🚨 Gotcha:** Without reviewing the PRDS, you cannot guarantee the results. The agent might build the completely wrong thing, beautifully. I have seen it happen. In Sifr, every Epic has a PRDS document. The borrow-by-default Phase? It started with a PRDS that defined parameter conventions, escape analysis rules, and codegen patterns, *before a single line of code was written.* ## Scaling the workflow: many Epics, many Phases Okay, so you can ship a feature. You can ship a big feature. But what about building an *entire programming language* with 21 Phases? This is where things get interesting. ![Phase Workflow](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jan9fh9b1ajn20s9mq7t.png) ### Step 1: Plan multiple Phases top-down The key word here is **top-down.** You plan the high-level Phases first, then drill into Epics within each Phase. And critically, **avoid parallel Epics.** In Sifr's roadmap, each Phase has a clear ordering rationale. For example: * Type System Power comes *before* Standard Library, because the stdlib needs generics and closures for proper type signatures. * Error Safety comes *before* Stdlib Safety Remediation, because you cannot make intrinsics return `Result` types if the compiler does not enforce error class hierarchies yet. * Borrow-by-Default comes *before* Stdlib Deepening, so new stdlib functions are written with the final ownership model from day one. Every ordering decision is documented. Not in someone's head, but in the codebase, in the roadmap, with explicit rationale for why Phase N depends on Phase N-1. ### Step 2: Execute Phase by Phase Each Epic within a Phase follows the epic workflow: PRD -> solution design -> human review -> Task breakdown -> execute. The agents pick up Tasks, implement, create PRs, get reviewed. > **🚨 Gotcha:** Don't execute too many Phases at once. I tried. The agents start creating workarounds for dependencies that haven't been implemented yet, and you end up with spaghetti. Sequential execution with clear Phase boundaries is the way. ### Step 3: Review with a different model This is one of my favorite tricks. After a Phase of execution, I use a **different agent session** (and often a different model) to review the work. The reviewer has fresh context, with no sunk cost bias or "I already wrote this so it must be right" mentality. The reviewer runs in a feedback loop: review -> fix -> review -> fix -> review. Three iterations is the sweet spot. ### Step 4: Re-planning with the judge After review cycles, a "judge" (the smartest model you have access to) evaluates whether the plan needs to be steered. Maybe the type system completion phase revealed that the codegen architecture should be restructured first. Maybe a new constraint emerged. The judge decides whether to continue as planned or adjust. If adjustment is needed, the plan is updated and execution continues. Multiple reviewer agents can also weigh in during this phase. > **🚨 Gotcha #1:** Parallel work might not be the best idea. There could be unidentified dependencies between Epics. Agents will make workarounds and create sloppy solutions instead of waiting for the right foundation to be in place. > > **🚨 Gotcha #2:** It is good to plan for the future, but don't get stuck with too many details about later Phases. The first few Phases will teach you things that change your assumptions about later Phases (you can also update the plan and insert new phases midway). Plan the *current* Phase in detail, and keep future Phases as rough outlines. ### What does a Phase Plan look like? This is a snippet of how we structure high-level planning. We track all phases, and for each one, we define exactly what capabilities it unlocks: | # | Phase | Milestones | Status | What it unlocks | | --- | ---------------------- | ----------------------------------------------- | --------- | -------------------------------------------------------------------------------- | | 1 | Language Foundations | 6 (built-ins → codegen_quality) | completed | Single-file programs with classes, error handling, safe indexing, imports | | 2 | Type System | 6 (narrowing → ...) | completed | Generics, closures, generators, decorators, operator overloading | | ... | ... | ... | ... | ... | | 13 | Type System Completion | 6 (stdlib_generic_rewrite → generics_in_stdlib) | pending | Auto-init, user-facing generics, pattern matching, enums, bigint, generic stdlib | Notice the **"What it unlocks"** column. We don't just list technical tasks; we list **capabilities**. Phase 1 unlocks single-file programs. Phase 2 unlocks generics. This helps the AI (and me) understand the *purpose* of the phase, not just the code. ## The real results Sifr has completed 11 Phases and over 80 Epics using this workflow. The compiler handles: * A full type system with generics, protocols, union types, and type narrowing. * Over 45 stdlib modules. * Borrow-by-default ownership semantics. * Error handling with compiler-enforced exhaustiveness checking. All of this was built with AI engineering following the structured workflow described above. Not thousands of agents racing to commit, but a disciplined process where every feature starts with a plan, gets implemented incrementally, and gets reviewed before merge. > An impressive result is that the first version of the working compiler for the core language was built over a weekend, **literally on a Saturday & Sunday**!! You can find the repo here: [Sifr](https://github.com/yaseralnajjar/sifr). ## Try it yourself If you want to adopt this workflow, here is the TL;DR: 1. **Small Tasks**: Draft -> Board -> Refine -> Work -> Review -> Merge. 2. **Epics**: PRDS -> Board -> Refine -> *Human Review* -> Break Down into Tasks -> Execute -> **Epic Demo** -> Done. 3. **Phases**: Plan top-down -> Execute sequentially -> Review with different model -> Re-plan with judge. 4. **Automate the boring stuff**: Ticket creation, PR templates, review checklists, board management. Make them commands the agent can run. 5. **Don't be the bottleneck**: If shipping requires you in the loop for every PR, you have lost. The agents are the hands and **the architect is YOU**. --- *What do you think? Have you tried applying AI engineering on a real project? I would love to hear about your workflow, so drop a comment or reach out!*

    Tags

    rustpython

    Comments

    More Blog

    View all
    How I'm using ASTs and Gemini to solve the "Codebase Onboarding" problem 🧠ai

    How I'm using ASTs and Gemini to solve the "Codebase Onboarding" problem 🧠

    Hi everyone! 👋 I’m Tara, a Senior Software Engineer and Consultant. Over the years, I've jumped...

    T
    tworrell
    Local AI Will Save Us All (The Math Says So, Trust Me)ai

    Local AI Will Save Us All (The Math Says So, Trust Me)

    Every few weeks a take goes viral in tech circles making the case for ditching cloud AI and running...

    S
    Sebastian Schürmann
    Lost in the AI Hype, I Started Smallai

    Lost in the AI Hype, I Started Small

    And it helped me get back into tech without drowning TL;DR at the end Coming back to...

    R
    Rohini Gaonkar
    Building a Replay-Tested Interactive Brokers Client in Gogo

    Building a Replay-Tested Interactive Brokers Client in Go

    I wanted an IBKR library that felt like Go and had testing I could trust. So I wrote one.

    T
    Thomas Marcelis
    Playwright in Pictures: Fully Parallel Modeplaywright

    Playwright in Pictures: Fully Parallel Mode

    Playwright’s fullyParallel mode is often treated as a simple performance switch. In practice, it...

    V
    Vitaliy Potapov
    Designing a CLI for Both Humans and Agentscli

    Designing a CLI for Both Humans and Agents

    Learn how Alpic designed its CLI for both human developers and AI agents — covering tradeoffs like polling, context windows, interactivity, and statelessness.

    J
    Julien Vallini

    Stay up to date

    Get the latest DeepSeek prompts, rules, and resources delivered to your inbox weekly.

    Neura Market LogoNeura Market

    Discover the best AI prompts, plugins, and resources for DeepSeek and more.

    Content Types

    • Rules
    • Prompts
    • MCPs
    • Agents
    • Guides

    Platforms

    • ChatGPT Directory
    • Claude Directory
    • Gemini Directory
    • Cursor Directory
    • Grok Directory
    • Perplexity Directory
    • DeepSeek Directory
    • CoPilot Directory
    • Stable Diffusion Directory
    • Midjourney Directory
    • All Directories

    Resources

    • Blog
    • Documentation
    • Help Center
    • Marketplace

    Legal

    • Privacy Policy
    • Terms of Service

    © 2026 Neura Market. All rights reserved.

    |

    Not affiliated with any AI platform vendors.