Intent × Shape Classification: Diagnosis & Robustness Strategy

The Problem

A user says: "Write 10 blogs on the topic XYZ."

Expected: 10 blog titles (content mode, checklist/calendar shape) Got: A project plan to write 10 blogs (action mode, project shape)

The engine classified this as intent=create, shape=project, content_mode=false — interpreting "write 10 blogs" as "here's a plan to produce 10 blogs" (action steps like "Research blog #1 topic", "Draft outline", "Write first draft") instead of "give me the 10 blog titles" (content items like "Why TypeScript Beats JavaScript in 2026").

This is a fundamental ambiguity in natural language that your engine must resolve correctly.

How the Engine Currently Decides

The 6 Orthogonal Dimensions

Your engine has 6 independent classification dimensions. Every prompt is projected onto all 6:

┌─────────────────────────────────────────────────────────┐
│                     USER PROMPT                         │
│            "Write 10 blogs on AI safety"                │
└────────────────────┬────────────────────────────────────┘
                     │
         ┌───────────┼───────────────┐
         ▼           ▼               ▼
    ┌─────────┐ ┌─────────┐   ┌──────────┐
    │ INTENT  │ │ SHAPE   │   │ CONTENT  │
    │ (12)    │ │ (11)    │   │  MODE    │
    │         │ │         │   │ (bool)   │
    └────┬────┘ └────┬────┘   └────┬─────┘
         │           │             │
    ┌────┴────┐ ┌────┴────┐  ┌────┴──────┐
    │COMPLEX- │ │TIME     │  │CUSTOM     │
    │ITY (4)  │ │HORIZON  │  │PROPERTIES │
    │         │ │ (days)  │  │ (dynamic) │
    └─────────┘ └─────────┘  └───────────┘

Dimension	Values	What it controls
Intent	12: build, launch, campaign, pipeline, roadmap, ops, event, create, learn, research, travel, personal	Domain knowledge injection (expert tips, expected phases, terminology)
Shape	11: project, schedule, itinerary, calendar, routine, budget, curriculum, checklist, comparison, tracker, document	Output structure (phase semantics, task detail focus, scheduler mode)
Content Mode	true/false	Whether tasks ARE items (nouns) vs. action steps (verbs)
Complexity	4: simple, medium, complex, massive	Scale (task count, group count, time horizon, token budget)
Time Horizon	integer (days)	Scheduling, estimate calibration
Custom Properties	dynamic list	Additional task dimensions (budget, sprint, etc.)

The Decision Flow

                   User Prompt
                       │
            ┌──────────┴──────────┐
            │  Pre-selected?      │
            │  (complexity+intent │
            │   provided by UI)   │
            └──────┬──────┬───────┘
                   │      │
              YES  │      │  NO
                   ▼      ▼
          ┌────────────┐ ┌────────────────┐
          │ FAST PATH  │ │ ROUTER LLM     │
          │            │ │                │
          │ • intent   │ │ Classifies ALL │
          │   = given  │ │ 6 dimensions   │
          │ • complex  │ │ in one call    │
          │   = given  │ │                │
          │ • shape    │ │                │
          │   = regex  │ │                │
          │ • content  │ │                │
          │   = regex  │ │                │
          └────────────┘ └────────────────┘
                   │              │
                   └──────┬───────┘
                          ▼
                  Organization Proposal
                  (grouping options)
                          │
                          ▼
                  Task Skeletons
                  (titles + groups)
                          │
                          ▼
                  Task Enrichment
                  (full details)

Where Misclassification Happens

Path A (Fast Path — pre-selected complexity+intent): Shape and content_mode are inferred via regex patterns in shapes.py. The regex for "write" matches the document shape pattern (r"\b(write\s+(a|an|my|the)|draft\s+(a|an|my|the)|...)), but only if followed by articles (a/an/my/the). "Write 10 blogs" doesn't match because "10" isn't an article — so it falls through to _INTENT_SHAPE_DEFAULTS, and if intent=create, there's no default shape for "create", so it becomes "project".

For content_mode: "Write 10 blogs" doesn't match any _CONTENT_MODE_PATTERNS because blogs aren't in the food/books/movies/exercises/places/gifts lists. So content_mode=false.

Path B (Router LLM): The LLM must simultaneously decide intent, shape, AND content_mode. But the prompt is genuinely ambiguous — the word "write" could mean:

"I want to write these blogs" → action mode, project shape
"Give me 10 blog titles to write about" → content mode, checklist/calendar shape
"Write a series of blog posts" → document shape, each task is a section

Without knowing the user's mental model, the LLM makes a guess — and often defaults to the more "complete" interpretation (a plan to write blogs), which is not what the user wanted.

The Root Cause: Ambiguity in `content_mode × shape`

The critical insight is that content_mode and shape interact non-linearly. They're not truly orthogonal for many prompts:

"Write 10 blogs on AI"

Interpretation 1:  intent=create,  shape=project,   content_mode=false
                   → Phases: Research, Outline, Writing, Editing, Publishing
                   → Tasks: "Research AI safety landscape", "Draft blog #1 outline", ...

Interpretation 2:  intent=create,  shape=checklist,  content_mode=true
                   → Phase: "Blog Topics"
                   → Tasks: "The Hidden Costs of AI Alignment", "Why AI Safety ≠ AI Ethics", ...

Interpretation 3:  intent=create,  shape=calendar,   content_mode=true
                   → Phases: Week 1, Week 2, Week 3, ...
                   → Tasks: titles scheduled across weeks

Interpretation 4:  intent=create,  shape=document,   content_mode=false
                   → Phase: "Sections"
                   → Tasks: each blog as a document section with full content

All 4 interpretations are valid. The engine currently has no mechanism to disambiguate.

Proposed Solutions

Solution 1: User-Facing Shape + Mode Selector (Quick Win)

Problem: Users can pre-select intent and complexity in the UI, but not shape or content_mode. These are inferred by regex/LLM — and get it wrong.

Fix: Expose shape and content_mode as optional UI controls during generation:

┌─────────────────────────────────────┐
│  What do you want?                  │
│  ┌─────────────────────────────────┐│
│  │ Write 10 blogs on AI safety     ││
│  └─────────────────────────────────┘│
│                                     │
│  Intent: [Create ▼]                │  ← existing
│  Complexity: [Simple ▼]            │  ← existing
│                                     │
│  Output as: [Auto ▼]               │  ← NEW: shape selector
│    • Auto (let AI decide)           │
│    • List of items                  │  → checklist + content_mode=true
│    • Action plan                    │  → project + content_mode=false
│    • Content calendar               │  → calendar + content_mode=true
│    • Document                       │  → document
│    • ...                            │
│                                     │
│  [Generate →]                       │
└─────────────────────────────────────┘

Why this works: The user knows what they want. Let them tell you directly instead of making the AI guess.

Implementation cost: Low — just pass the shape/content_mode through the existing start_generation() parameters and skip shape inference when provided.

Solution 2: Explicit Classification Confirmation (Medium Effort)

Problem: The organization proposal already pauses for user confirmation — but it only confirms the grouping structure, not the fundamental interpretation (shape + content_mode).

Fix: Add a classification confirmation step before the organization proposal. After the Router classifies, show the user what the engine understood:

{
  "type": "classification_proposal",
  "data": {
    "title": "10 AI Safety Blog Posts",
    "intent": "create",
    "shape": "checklist",
    "content_mode": true,
    "shape_label": "List of items",
    "content_mode_label": "I'll give you the actual blog titles",
    "alternatives": [
      {
        "shape": "project",
        "content_mode": false,
        "label": "Action plan to write 10 blogs",
        "description": "Phases: Research → Outline → Write → Edit → Publish"
      },
      {
        "shape": "calendar",
        "content_mode": true,
        "label": "Blog titles on a publishing schedule",
        "description": "Titles organized by week/month"
      }
    ]
  }
}

The user clicks the interpretation they want → engine proceeds with that classification.

Why this works: Moves the ambiguity resolution to the user instead of making the AI guess. Takes ~2 seconds of user time. The organization proposal already has this pattern — extend it.

Implementation:

Add a classification_proposal SSE event type
Have the Router return 2-3 alternative interpretations (shape × content_mode combos)
Wait for user confirmation (reuse _org_confirmed pattern)
Proceed with confirmed classification

Solution 3: Improve the Router's Disambiguation (LLM-side fix)

Problem: The Router prompt doesn't adequately teach the LLM about the content_mode × shape interaction. It treats them as independent decisions.

Fix: Add explicit disambiguation rules to ROUTER_SYSTEM:

ROUTER_SYSTEM += """

DISAMBIGUATION RULES (content_mode × shape):

When a prompt could mean EITHER "give me the items" OR "give me a plan to create the items",
use these heuristics:

1. QUANTITY SIGNAL: If the user specifies a number ("10 blogs", "5 recipes", "20 questions"),
   they almost always want that many ITEMS, not a plan to produce them.
   → content_mode=true, shape=checklist (or calendar if scheduling matters)

2. "FOR" SIGNAL: "blogs FOR my website" → action plan (content_mode=false)
   "blogs ON topic X" → content list (content_mode=true)

3. VERB SIGNAL:
   - "Write X" alone → ambiguous (use quantity signal to break tie)
   - "Write AND publish X" → action plan (content_mode=false)
   - "Help me write X" → action plan (content_mode=false)
   - "Give me X" → content list (content_mode=true)
   - "List X" → content list (content_mode=true)

4. PROCESS WORDS: If the prompt mentions process steps (outline, draft, edit, review,
   publish, schedule, coordinate), lean toward content_mode=false.

5. DEFAULT: When still ambiguous after all signals, prefer content_mode=true with
   shape=checklist. Users more often want the output than a meta-plan about producing it.
   It's less annoying to get items when you wanted a plan than to get a plan when you wanted items.
"""

Why this works: The LLM is already doing this classification — it just needs better instructions for the ambiguous cases. The "quantity signal" rule alone would fix your "10 blogs" example.

Implementation cost: Extremely low — just edit ROUTER_SYSTEM in prompts.py.

Solution 4: Two-Pass Inference with Confidence Gating (Robust)

Problem: The fast path (pre-selected complexity+intent) uses regex for shape/content_mode — and regex can't handle semantic ambiguity.

Fix: When the fast path's regex inference has low confidence (no pattern matched, fell through to defaults), escalate to a lightweight LLM call specifically for shape+content_mode:

async def infer_shape_with_fallback(intent: str, prompt: str, complexity: str) -> tuple[str, bool]:
    """Infer shape + content_mode. Uses regex first, LLM fallback for ambiguous cases."""

    # Try regex (fast, deterministic)
    shape = _regex_infer_shape(intent, prompt, complexity)
    content_mode = _regex_infer_content_mode(intent, prompt)
    confidence = "high" if shape != "project" else "low"  # "project" is the fallback

    if confidence == "high":
        return shape, content_mode

    # Ambiguous — ask a fast LLM
    result = await fast_llm.generate_structured(
        system=SHAPE_DISAMBIGUATOR_SYSTEM,
        user=f"Intent: {intent}\nPrompt: {prompt}",
        output_model=ShapeDecision,  # {shape: str, content_mode: bool, reasoning: str}
        max_tokens=200,
        temperature=0.1,
    )
    return result.shape, result.content_mode

Why this works: Keeps the fast path fast for unambiguous cases (wedding → schedule, trip → itinerary), but catches ambiguous cases with a cheap LLM call (~200 tokens, ~0.3s with Haiku).

Solution 5: Expand Content Mode Pattern Coverage

Problem: _CONTENT_MODE_PATTERNS in shapes.py doesn't cover blogs, articles, emails, social posts, or creative writing — which are very common "give me items" prompts.

Fix: Add missing patterns:

_CONTENT_MODE_PATTERNS.extend([
    # Blog / article titles
    r"\b(\d+\s+(blog|article|post|essay|piece|column|editorial|op-ed)s?\b)",
    # Social media / content ideas
    r"\b(\d+\s+(tweet|reel|story|caption|hook|headline|tagline|slogan)s?\b)",
    # Email subjects / templates
    r"\b(\d+\s+(email|subject\s+line|newsletter|drip)s?\b)",
    # Names / titles as creative output
    r"\b(\d+\s+(title|name|topic|theme|prompt|idea|concept|pitch)s?\b)",
    # Generic "N things" pattern
    r"\b(\d+\s+\w+\s+(idea|tip|trick|hack|way|reason|example|strategy|tactic|step|lesson|rule|principle|mistake|myth|fact|stat|quote)s?\b)",
    # "list of N" or "top N"
    r"\b(list\s+of\s+\d+|top\s+\d+|best\s+\d+|\d+\s+best)\b",
])

Why this works: The "N blogs" pattern is a strong signal for content_mode=true. When someone says "10 blogs", they want titles, not a production plan.

Implementation cost: Trivial — just add regex patterns to shapes.py.

Solution 6: Shape × Intent Compatibility Matrix (Guardrail)

Problem: Some shape × intent × content_mode combinations don't make sense, but the engine doesn't enforce constraints. For example, intent=create + shape=project + content_mode=true is contradictory.

Fix: Add a compatibility matrix that corrects invalid combinations:

# shapes.py

_SHAPE_CONTENT_MODE_OVERRIDES: dict[str, bool | None] = {
    # These shapes ALWAYS imply a specific content_mode
    "document": False,        # Documents are always action/structure mode
    "comparison": False,      # Comparisons are analysis, not items
    "tracker": False,         # Trackers track items through stages

    # These shapes ALWAYS imply content_mode=true
    # (none currently — but could add)

    # These shapes respect the inferred content_mode
    "project": None,          # Could go either way
    "checklist": None,        # Flat list of actions OR items
    "calendar": None,         # Content calendar OR schedule of actions
    "routine": None,          # Specific exercises OR "do X" actions
    "itinerary": None,        # Specific places OR travel logistics
    "schedule": None,
    "budget": None,
    "curriculum": None,
}

_VALID_COMBINATIONS: dict[tuple[str, str], list[str]] = {
    # (intent, content_mode) → allowed shapes
    ("create", "true"):   ["checklist", "calendar", "document"],
    ("create", "false"):  ["project", "calendar", "document"],
    ("learn", "true"):    ["checklist", "curriculum"],
    ("learn", "false"):   ["curriculum", "project"],
    ("research", "true"): ["checklist", "comparison"],
    ("research", "false"):["comparison", "project"],
    # ... etc.
}

def validate_classification(cls: Classification) -> Classification:
    """Fix contradictory shape × intent × content_mode combos."""
    override = _SHAPE_CONTENT_MODE_OVERRIDES.get(cls.output_shape)
    if override is not None:
        cls.content_mode = override

    key = (cls.intent, str(cls.content_mode).lower())
    allowed = _VALID_COMBINATIONS.get(key)
    if allowed and cls.output_shape not in allowed:
        cls.output_shape = allowed[0]  # Use first (preferred) option

    return cls

Why this works: Even if the LLM or regex makes a mistake, this guardrail catches contradictory combinations and corrects them. Defense in depth.

Recommended Implementation Order

Priority	Solution	Effort	Impact	Description
1	#5: Expand content_mode patterns	30 min	High	Catches "N blogs/articles/emails" immediately
2	#3: Improve Router disambiguation	1 hour	High	Better LLM classification for ambiguous prompts
3	#6: Compatibility matrix	2 hours	Medium	Guardrail against contradictory classifications
4	#1: UI shape selector	4 hours	High	User explicitly chooses, bypasses all ambiguity
5	#4: Two-pass inference	3 hours	Medium	LLM fallback for fast-path ambiguity
6	#2: Classification confirmation	6 hours	High	Full user-in-the-loop disambiguation

Start with #5 + #3 — they're cheap and fix the immediate problem. Then add #6 as a safety net. Finally, #1 or #2 for full user control.

Appendix: Current Parameter Reference

All 12 Intents

Intent	Domain	Default Shape	Content Mode Likely?
build	Software/products	project	No
launch	Go-to-market	project	No
campaign	Marketing/outreach	calendar	Sometimes (content ideas)
pipeline	Stages/workflow	project	No
roadmap	Long-term direction	project	No
ops	Processes/compliance	routine	No
event	Coordinated moment	schedule	No
create	Body of work	project	Often yes (titles, topics)
learn	Knowledge/skills	curriculum	Sometimes (reading lists)
research	Evaluate/compare	comparison	Sometimes (tool lists)
travel	Trips/journeys	itinerary	Yes (places, activities)
personal	Life goals	routine	Sometimes (habits)

All 11 Shapes

Shape	Phase Semantics	Scheduler	Content Mode Compatible?
project	Work stages	DAG	Both
schedule	Time periods	SLOT	Both
itinerary	Journey days	SLOT	Yes (places)
calendar	Publishing buckets	PASS	Yes (content items)
routine	Cycle segments	SLOT	Yes (exercises)
budget	Spending categories	PASS	No (always action)
curriculum	Learning stages	DAG	Both
checklist	Single flat group	PASS	Both
comparison	Options evaluated	PASS	No (always analysis)
tracker	Status stages	PASS	No (always tracking)
document	Document sections	PASS	No (always structure)

Content Mode Detection

Currently detected by regex patterns in shapes.py:_CONTENT_MODE_PATTERNS. Missing coverage:

Category	Currently Covered	Missing
Meals/food	Yes	-
Books/movies/shows	Yes	-
Gifts/shopping	Yes	-
Travel activities	Yes	-
Exercises/workouts	Yes	-
Interview questions	Yes	-
Names/ideas	Yes	-
Tools/apps	Yes	-
Team activities	Yes	-
Blog/article titles	No	`\d+ blogs/articles/posts`
Email subjects	No	`\d+ emails/subject lines`
Social media content	No	`\d+ tweets/reels/captions`
Creative titles	No	`\d+ titles/topics/themes`
Generic "N things"	No	`\d+ tips/tricks/hacks/ways`
"Top N" / "Best N"	No	`top \d+, best \d+, list of \d+`

How Shape is Inferred (Fast Path)

1. Regex patterns (_PROMPT_SHAPE_PATTERNS)    ← first match wins
   └─ 12 patterns: checklist, document, comparison, tracker, calendar,
      budget, routine, schedule, itinerary, curriculum

2. Intent defaults (_INTENT_SHAPE_DEFAULTS)   ← if no regex matched
   └─ learn→curriculum, event→schedule, ops→routine,
      campaign→calendar, research→comparison, travel→itinerary

3. Complexity fallback                        ← if no intent default
   └─ simple + short prompt → checklist
   └─ else → project (THE CATCH-ALL)

The "project" fallback is the source of most misclassifications. When neither regex nor intent default matches, everything becomes a project plan — even when the user wanted a simple list of items.

Appendix: The "10 Blogs" Example Traced Through the Engine

Current Behavior

Input: "Write 10 blogs on AI safety"

Fast Path (complexity=simple, intent=create):
  1. Regex shape check:
     - "write\s+(a|an|my|the)" → doesn't match ("write 10" not "write a")
     - No other pattern matches
  2. Intent default for "create": NOT IN _INTENT_SHAPE_DEFAULTS → no match
  3. Fallback: complexity=simple, words=6 (< 12) → "checklist" ← Actually correct!

  But content_mode check:
  1. _CONTENT_MODE_PATTERNS: no pattern matches "blogs"
  2. Result: content_mode = false ← WRONG

  Final: shape=checklist, content_mode=false
  → Flat list of ACTION steps like "Research AI safety topics", "Draft blog #1 outline"
  → User wanted: "The Hidden Costs of AI Alignment", "Why AI Safety ≠ AI Ethics", ...

Router LLM Path (no pre-selection):
  1. LLM sees "Write 10 blogs on AI safety"
  2. LLM interprets "write" as an action verb → intent=create, content_mode=false
  3. LLM picks shape=project (most "complete" interpretation)
  4. Final: intent=create, shape=project, content_mode=false
  → Phases: Research, Outline, Writing, Editing, Publishing
  → Tasks: "Research AI safety landscape", "Create editorial calendar", ...
  → User wanted: just the 10 titles!

After Fixes (#5 + #3)

Input: "Write 10 blogs on AI safety"

With expanded content_mode patterns (#5):
  Pattern: r"\b(\d+\s+(blog|article|post)s?\b)" → MATCHES "10 blogs"
  → content_mode = true ← CORRECT

With improved Router disambiguation (#3):
  Quantity Signal: "10 blogs" → user wants 10 items, not a plan
  → content_mode = true, shape = checklist ← CORRECT

Final: shape=checklist, content_mode=true
→ Phase: "Blog Topics"
→ Tasks: "The Hidden Costs of AI Alignment", "Why AI Safety ≠ AI Ethics", ...
→ Matches user expectation!

Intent × Shape Classification: Diagnosis & Robustness Strategy

Intent × Shape Classification: Diagnosis & Robustness Strategy

The Problem

How the Engine Currently Decides

The 6 Orthogonal Dimensions

The Decision Flow

Where Misclassification Happens

The Root Cause: Ambiguity in `content_mode × shape`

Proposed Solutions

Solution 1: User-Facing Shape + Mode Selector (Quick Win)

Solution 2: Explicit Classification Confirmation (Medium Effort)

Solution 3: Improve the Router's Disambiguation (LLM-side fix)

Solution 4: Two-Pass Inference with Confidence Gating (Robust)

Solution 5: Expand Content Mode Pattern Coverage

Solution 6: Shape × Intent Compatibility Matrix (Guardrail)

Recommended Implementation Order

Appendix: Current Parameter Reference

All 12 Intents

All 11 Shapes

Content Mode Detection

How Shape is Inferred (Fast Path)

Appendix: The "10 Blogs" Example Traced Through the Engine

Current Behavior

After Fixes (#5 + #3)

Related Documents

Draft Mode Product Roadmap

Editorial & Technical Plan: Agentii AI - Financial AI Authority Blog

Directory Structure

Content Calendar

Intent × Shape Classification: Diagnosis & Robustness Strategy

Intent × Shape Classification: Diagnosis & Robustness Strategy

The Problem

How the Engine Currently Decides

The 6 Orthogonal Dimensions

The Decision Flow

Where Misclassification Happens

The Root Cause: Ambiguity in content_mode × shape

Proposed Solutions

Solution 1: User-Facing Shape + Mode Selector (Quick Win)

Solution 2: Explicit Classification Confirmation (Medium Effort)

Solution 3: Improve the Router's Disambiguation (LLM-side fix)

Solution 4: Two-Pass Inference with Confidence Gating (Robust)

Solution 5: Expand Content Mode Pattern Coverage

Solution 6: Shape × Intent Compatibility Matrix (Guardrail)

Recommended Implementation Order

Appendix: Current Parameter Reference

All 12 Intents

All 11 Shapes

Content Mode Detection

How Shape is Inferred (Fast Path)

Appendix: The "10 Blogs" Example Traced Through the Engine

Current Behavior

After Fixes (#5 + #3)

Related Documents

Draft Mode Product Roadmap

Editorial & Technical Plan: Agentii AI - Financial AI Authority Blog

Directory Structure

Content Calendar

The Root Cause: Ambiguity in `content_mode × shape`