Input Data Format Reference

# Input Data Format Reference This document describes the data formats used by the Growth Agents system for tracking experiments, hypotheses, and creative variants. ## Overview The system uses **JSON** as the primary data format. Data is organized hierarchically: ``` Product (Aggregate Root) ├── ProductDefinition (context.json) └── KernelState (state.json) ├── Hypotheses (Dict) │ └── CreativeVariants (Array per hypothesis) ├── Insights (Dict) └── Beliefs (Dict) ``` --- ## File Structure ``` products/ ├── registry.json # Master product registry └── <product_id>/ ├── context.json # Product definition & metadata └── state.json # Experiment state (hypotheses, variants, etc.) ``` --- ## 1. Product Registry (`products/registry.json`) Tracks all products and the currently active one. ```json { "products": { "prod_acme_wellness": "Acme Wellness App", "prod_saas_platform": "SaaS Platform" }, "active_product_id": "prod_acme_wellness", "version": "1.0" } ``` | Field | Type | Description | |-------|------|-------------| | `products` | Dict[str, str] | Map of product_id → display name | | `active_product_id` | string | Currently selected product | | `version` | string | Registry schema version | --- ## 2. Product Context (`products/<product_id>/context.json`) Defines the product being marketed. ```json { "product_id": "prod_acme_wellness", "definition": { "name": "Acme Wellness App", "tagline": "Your health, simplified", "description": "A mobile app that helps adults track their wellness...", "target_audience": "Adults 50+ focused on maintaining independence", "value_propositions": [ "Personalized health insights", "Easy-to-use interface" ], "key_benefits": [ "Stay independent longer", "Catch issues early" ], "brand_voice": "Warm, supportive, empowering", "unique_selling_points": [ "AI-powered analysis", "Integration with wearables" ], "pain_points_addressed": [ "Fear of health decline", "Complexity of health tracking" ], "price_positioning": "Premium ($9.99/month)", "call_to_action_suggestions": [ "Start Your Free Trial", "Get Your Health Insights" ] }, "source_urls": ["https://example.com/product-page"], "additional_text": "Optional extra context...", "enabled_channels": ["facebook", "linkedin"], "created_at": "2026-01-09T10:30:00.000000", "updated_at": "2026-01-09T15:45:00.000000" } ``` ### Product Fields | Field | Type | Required | Description | |-------|------|----------|-------------| | `product_id` | string | Yes | Unique identifier (format: `prod_<sanitized_name>`) | | `definition.name` | string | Yes | Product display name | | `definition.tagline` | string | No | Short marketing tagline | | `definition.description` | string | Yes | Full product description | | `definition.target_audience` | string | Yes | Target audience description | | `definition.value_propositions` | string[] | No | List of value props | | `definition.key_benefits` | string[] | No | List of benefits | | `definition.brand_voice` | string | No | Brand voice/tone guidance | | `definition.unique_selling_points` | string[] | No | USPs | | `definition.pain_points_addressed` | string[] | No | Pain points solved | | `definition.price_positioning` | string | No | Pricing tier info | | `definition.call_to_action_suggestions` | string[] | No | Suggested CTAs | | `source_urls` | string[] | No | Source URLs for context | | `additional_text` | string | No | Extra context text | | `enabled_channels` | string[] | No | Active platforms | | `created_at` | ISO datetime | Yes | Creation timestamp | | `updated_at` | ISO datetime | Yes | Last update timestamp | --- ## 3. Kernel State (`products/<product_id>/state.json`) Contains all experiment data: hypotheses, variants, insights, and beliefs. ```json { "schema_version": "1.0", "is_halted": false, "halt_reason": null, "hypotheses": { "H-F60P01": { /* Hypothesis object */ }, "H-F60P02": { /* Hypothesis object */ } }, "insights": { "INS-001": { /* Insight object */ } }, "beliefs": { "B-001": { /* Belief object */ } } } ``` | Field | Type | Description | |-------|------|-------------| | `schema_version` | string | State schema version ("1.0") | | `is_halted` | boolean | Whether experiments are paused | | `halt_reason` | string or null | Reason for halt | | `hypotheses` | Dict[str, Hypothesis] | All hypotheses keyed by ID | | `insights` | Dict[str, Insight] | All insights keyed by ID | | `beliefs` | Dict[str, Belief] | All beliefs keyed by ID | --- ## 4. Hypothesis Structure Each hypothesis represents a testable marketing claim. ```json { "hypothesis_id": "H-F60P01", "statement": "Testing whether emphasizing independence resonates with adults 50+ seeking wellness solutions", "independent_variable": "primary_text", "dependent_metric": "ctr", "audience_scope": "Adults 50+ interested in health and wellness", "expected_direction": "increase", "confidence_level": "medium", "status": "proposed", "created_at": "2026-01-09T10:30:00.000000", "updated_at": "2026-01-09T15:45:00.000000", "expected_magnitude": "20-30% improvement", "conclusion": null, "abandonment_reason": null, "evidence_summary": null, "rationale": "Independence is a core value for this demographic", "psychological_trigger": "autonomy preservation", "risk_factors": "May not resonate with younger audience", "success_criteria": "CTR > 2%, CPA < $50", "test_duration_suggestion": "14 days", "budget_suggestion": "$500 total", "creative_brief": "Focus on maintaining active lifestyle", "data_quality_flags": [], "creative_variants": [ { /* CreativeVariant object */ }, { /* CreativeVariant object */ } ] } ``` ### Hypothesis Fields | Field | Type | Required | Valid Values | |-------|------|----------|--------------| | `hypothesis_id` | string | Yes | Format: `H-<identifier>` | | `statement` | string | Yes | Max 500 chars | | `independent_variable` | string | Yes | e.g., "primary_text", "headline", "image" | | `dependent_metric` | string | Yes | e.g., "ctr", "cpc", "cpa", "conversion_rate" | | `audience_scope` | string | Yes | Target audience description | | `expected_direction` | string | Yes | `"increase"`, `"decrease"`, `"change"` | | `confidence_level` | string | Yes | `"low"`, `"medium"`, `"high"` | | `status` | string | Yes | `"proposed"`, `"approved"`, `"active"`, `"concluded"`, `"abandoned"` | | `created_at` | ISO datetime | Yes | Creation timestamp | | `updated_at` | ISO datetime | Yes | Last update timestamp | | `expected_magnitude` | string/null | No | e.g., "20-30%" | | `conclusion` | string/null | No | `"confirmed"`, `"refuted"`, `"inconclusive"` | | `abandonment_reason` | string/null | No | `"spend_cap"`, `"time_limit"`, `"early_stop"`, `"policy_block"`, `"human_override"` | | `evidence_summary` | string/null | No | Summary of evidence | | `rationale` | string/null | No | Why hypothesis was proposed | | `psychological_trigger` | string/null | No | Psychological principle | | `risk_factors` | string/null | No | Potential risks | | `success_criteria` | string/null | No | Success metrics | | `test_duration_suggestion` | string/null | No | Suggested duration | | `budget_suggestion` | string/null | No | Suggested budget | | `creative_brief` | string/null | No | Brief for creatives | | `data_quality_flags` | string[] | No | Quality concerns | | `creative_variants` | Variant[] | No | Array of variants | --- ## 5. Creative Variant Structure Each variant is a specific creative execution tied to a hypothesis. ```json { "variant_id": "V-001", "asset_type": "single_image", "asset_reference": "pending", "description": "Independence-focused messaging for wellness app", "created_at": "2026-01-09T11:00:00.000000", "platform_id": "linkedin", "content_format": "short_form", "primary_text": null, "headline": "Stay Independent Longer: Understand Your Wellness Now", "link_description": null, "cta_button": "Learn More", "hook": "Your independence matters.", "angle": "risk mitigation", "rationale": "Direct autonomy-preservation messaging", "psychological_angle": "risk mitigation", "target_emotion": "relief", "differentiation": "Uses explicit autonomy language", "image_description": "Active senior adult walking outdoors, checking fitness tracker", "image_style": "professional lifestyle photography", "image_mood": "confident and empowered" } ``` ### Variant Fields | Field | Type | Required | Description | |-------|------|----------|-------------| | `variant_id` | string | Yes | Format: `V-<sequence>` | | `asset_type` | string | Yes | e.g., "single_image", "video", "carousel" | | `asset_reference` | string | Yes | Asset URL or "pending" | | `description` | string | Yes | Brief variant description | | `created_at` | ISO datetime | Yes | Creation timestamp | | `platform_id` | string/null | No | `"facebook"`, `"linkedin"`, etc. | | `content_format` | string/null | No | `"short_form"`, `"long_form"`, `"article"` | | `primary_text` | string/null | No | Main ad copy (Facebook) | | `headline` | string/null | No | Ad headline | | `link_description` | string/null | No | Link description text | | `cta_button` | string/null | No | CTA button text | | `hook` | string/null | No | Attention-grabbing opening | | `angle` | string/null | No | Persuasion angle | | `rationale` | string/null | No | Why variant was created | | `psychological_angle` | string/null | No | Psychological lever | | `target_emotion` | string/null | No | Target emotion | | `differentiation` | string/null | No | How it differs from others | | `image_description` | string/null | No | Ideal image description | | `image_style` | string/null | No | Visual style direction | | `image_mood` | string/null | No | Image mood/atmosphere | --- ## 6. Metrics Snapshot Structure Metrics are used for analyzing hypothesis performance. Each snapshot tracks performance data for a specific variant within a hypothesis, enabling A/B comparison analysis. ```json { "hypothesis_id": "H-F60P01", "variant_id": "V-001", "platform_id": "facebook", "period_start": "2026-01-01T00:00:00.000000", "period_end": "2026-01-07T23:59:59.000000", "impressions": 5000, "clicks": 150, "conversions": 10, "spend": 250.00 } ``` ### Metrics Fields | Field | Type | Required | Constraints | |-------|------|----------|-------------| | `hypothesis_id` | string | Yes | Must match existing hypothesis | | `variant_id` | string/null | No | Variant identifier (e.g., "V-001") for A/B comparison | | `platform_id` | string/null | No | Platform identifier (e.g., "facebook", "linkedin") | | `period_start` | ISO datetime | Yes | Start of measurement period | | `period_end` | ISO datetime | Yes | End of period (must be > start) | | `impressions` | integer | Yes | >= 0 | | `clicks` | integer | Yes | >= 0, <= impressions | | `conversions` | integer | Yes | >= 0, <= clicks | | `spend` | float | Yes | >= 0.0 | ### Variant-Level Metrics for A/B Testing When ingesting metrics, you should specify which variant the metrics are for. This enables: - Per-variant performance comparison - Identification of winning variants - Proper A/B test analysis Example: Metrics for two variants in the same hypothesis: ```json // Variant A metrics { "hypothesis_id": "H-F60P01", "variant_id": "V-001", "platform_id": "facebook", "impressions": 5000, "clicks": 150, "conversions": 10, "spend": 250.00, "period_start": "2026-01-01T00:00:00", "period_end": "2026-01-07T23:59:59" } // Variant B metrics { "hypothesis_id": "H-F60P01", "variant_id": "V-002", "platform_id": "facebook", "impressions": 5000, "clicks": 200, "conversions": 15, "spend": 250.00, "period_start": "2026-01-01T00:00:00", "period_end": "2026-01-07T23:59:59" } ``` ### Derived Metrics (Computed) These are calculated from the base metrics: | Metric | Formula | |--------|---------| | CTR | clicks / impressions * 100 (%) | | CPC | spend / clicks ($) | | CPA | spend / conversions ($) | | Conversion Rate | conversions / clicks * 100 (%) | ### Aggregated Variant Metrics When analyzing A/B test results, the system aggregates metrics per variant: ```json { "variant_id": "V-001", "platform_id": "facebook", "impressions": 10000, "clicks": 300, "conversions": 20, "spend": 500.00, "ctr": 3.0, "cpc": 1.67, "cpa": 25.00, "conversion_rate": 6.67, "snapshot_count": 2 } ``` The analyst uses these aggregated metrics to: 1. Rank variants by performance (conversions, CTR, CPA) 2. Calculate relative performance differences between variants 3. Identify the winning variant for the hypothesis --- ## 7. Referencing Entities ### Product Reference ``` products/<product_id>/ ``` Example: `products/prod_acme_wellness/` ### Hypothesis Reference ``` products/<product_id>/state.json → hypotheses.<hypothesis_id> ``` Example: `hypotheses.H-F60P01` ### Variant Reference ``` products/<product_id>/state.json → hypotheses.<hypothesis_id>.creative_variants[<index>] ``` Or by variant_id: Find variant where `variant_id == "V-001"` --- ## 8. Multi-Entity Support ### Multiple Products Yes - each product has its own directory with isolated state: ``` products/ ├── prod_acme_wellness/ │ ├── context.json │ └── state.json ├── prod_saas_platform/ │ ├── context.json │ └── state.json ``` ### Multiple Hypotheses per Product Yes - hypotheses are stored as a dictionary in state.json: ```json { "hypotheses": { "H-001": { /* hypothesis 1 */ }, "H-002": { /* hypothesis 2 */ }, "H-003": { /* hypothesis 3 */ } } } ``` ### Multiple Variants per Hypothesis Yes - variants are stored as an array within each hypothesis: ```json { "hypothesis_id": "H-001", "creative_variants": [ { "variant_id": "V-001", "platform_id": "facebook", ... }, { "variant_id": "V-002", "platform_id": "facebook", ... }, { "variant_id": "V-003", "platform_id": "linkedin", ... } ] } ``` ### Metrics for Multiple Variants Metrics can be tracked at the variant level using the `variant_id` field. Each metrics snapshot should specify: - `hypothesis_id` - the parent hypothesis - `variant_id` - the specific variant (e.g., "V-001", "V-002") - `platform_id` - the advertising platform (e.g., "facebook", "linkedin") This enables proper A/B test analysis where you can compare performance across variants. --- ## 9. CSV Export Format When exporting variants, the system can produce CSV with these columns: | Column | Description | |--------|-------------| | `hypothesis_id` | Parent hypothesis ID | | `ad_name` / `variant_id` | Variant identifier | | `asset_type` | Asset format type | | `platform_id` | Target platform | | `audience` | Audience scope | | `hook` | Attention hook | | `angle` | Persuasion angle | | `primary_text` / `intro_text` | Main copy (platform-specific) | | `headline` | Headline text | | `description` | Description/link description | | `cta_type` | Call-to-action | | `content_format` | Format (short_form/long_form) | --- ## 10. JSON Import Example To import data programmatically, structure it as: ```json { "schema_version": "1.0", "is_halted": false, "halt_reason": null, "hypotheses": { "H-IMPORT-001": { "hypothesis_id": "H-IMPORT-001", "statement": "Testing new messaging approach", "independent_variable": "primary_text", "dependent_metric": "ctr", "audience_scope": "Target audience description", "expected_direction": "increase", "confidence_level": "medium", "status": "proposed", "created_at": "2026-01-10T10:00:00.000000", "updated_at": "2026-01-10T10:00:00.000000", "data_quality_flags": [], "creative_variants": [] } }, "insights": {}, "beliefs": {} } ``` --- ## 11. Validation Rules ### Required Fields - Hypothesis: `hypothesis_id`, `statement`, `independent_variable`, `dependent_metric`, `audience_scope`, `expected_direction`, `confidence_level`, `status`, `created_at`, `updated_at` - Variant: `variant_id`, `asset_type`, `asset_reference`, `description`, `created_at` - Metrics: `hypothesis_id`, `period_start`, `period_end`, `impressions`, `clicks`, `conversions`, `spend` ### Recommended Fields for A/B Testing - Metrics: `variant_id` (to enable per-variant comparison), `platform_id` (to filter by platform) ### Enum Validation All enum fields are validated on load. Invalid values cause deserialization errors. ### Relationship Integrity - Variants reference parent hypothesis via embedding (no foreign key) - Metrics reference hypothesis via `hypothesis_id` (must exist) - Insights reference hypotheses via `evidence_hypothesis_ids[]` --- ## 12. Common Operations ### Add a New Hypothesis 1. Load `state.json` 2. Add new hypothesis object to `hypotheses` dict 3. Save `state.json` ### Add Variants to Hypothesis 1. Load `state.json` 2. Find hypothesis by ID 3. Append to `creative_variants` array 4. Update `updated_at` timestamp 5. Save `state.json` ### Record Metrics Metrics are logged through the event system, not directly in state.json. Use the CLI or API to record metrics for analysis. --- ## Questions? For implementation details, see: - `agents_learning_kernel_types.py` - Data type definitions - `agents_learning_kernel_state_io.py` - Serialization/deserialization - `body_services.py` - Service layer for data operations

Related Documents

Evaluation Harness (Offline + Online)

/godmode:eval

🔬 Open Deep Research

EEG-Datasets