Loading...
Loading...
This document describes the data formats used by the Growth Agents system for tracking experiments, hypotheses, and creative variants.
# Input Data Format Reference
This document describes the data formats used by the Growth Agents system for tracking experiments, hypotheses, and creative variants.
## Overview
The system uses **JSON** as the primary data format. Data is organized hierarchically:
```
Product (Aggregate Root)
├── ProductDefinition (context.json)
└── KernelState (state.json)
├── Hypotheses (Dict)
│ └── CreativeVariants (Array per hypothesis)
├── Insights (Dict)
└── Beliefs (Dict)
```
---
## File Structure
```
products/
├── registry.json # Master product registry
└── <product_id>/
├── context.json # Product definition & metadata
└── state.json # Experiment state (hypotheses, variants, etc.)
```
---
## 1. Product Registry (`products/registry.json`)
Tracks all products and the currently active one.
```json
{
"products": {
"prod_acme_wellness": "Acme Wellness App",
"prod_saas_platform": "SaaS Platform"
},
"active_product_id": "prod_acme_wellness",
"version": "1.0"
}
```
| Field | Type | Description |
|-------|------|-------------|
| `products` | Dict[str, str] | Map of product_id → display name |
| `active_product_id` | string | Currently selected product |
| `version` | string | Registry schema version |
---
## 2. Product Context (`products/<product_id>/context.json`)
Defines the product being marketed.
```json
{
"product_id": "prod_acme_wellness",
"definition": {
"name": "Acme Wellness App",
"tagline": "Your health, simplified",
"description": "A mobile app that helps adults track their wellness...",
"target_audience": "Adults 50+ focused on maintaining independence",
"value_propositions": [
"Personalized health insights",
"Easy-to-use interface"
],
"key_benefits": [
"Stay independent longer",
"Catch issues early"
],
"brand_voice": "Warm, supportive, empowering",
"unique_selling_points": [
"AI-powered analysis",
"Integration with wearables"
],
"pain_points_addressed": [
"Fear of health decline",
"Complexity of health tracking"
],
"price_positioning": "Premium ($9.99/month)",
"call_to_action_suggestions": [
"Start Your Free Trial",
"Get Your Health Insights"
]
},
"source_urls": ["https://example.com/product-page"],
"additional_text": "Optional extra context...",
"enabled_channels": ["facebook", "linkedin"],
"created_at": "2026-01-09T10:30:00.000000",
"updated_at": "2026-01-09T15:45:00.000000"
}
```
### Product Fields
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `product_id` | string | Yes | Unique identifier (format: `prod_<sanitized_name>`) |
| `definition.name` | string | Yes | Product display name |
| `definition.tagline` | string | No | Short marketing tagline |
| `definition.description` | string | Yes | Full product description |
| `definition.target_audience` | string | Yes | Target audience description |
| `definition.value_propositions` | string[] | No | List of value props |
| `definition.key_benefits` | string[] | No | List of benefits |
| `definition.brand_voice` | string | No | Brand voice/tone guidance |
| `definition.unique_selling_points` | string[] | No | USPs |
| `definition.pain_points_addressed` | string[] | No | Pain points solved |
| `definition.price_positioning` | string | No | Pricing tier info |
| `definition.call_to_action_suggestions` | string[] | No | Suggested CTAs |
| `source_urls` | string[] | No | Source URLs for context |
| `additional_text` | string | No | Extra context text |
| `enabled_channels` | string[] | No | Active platforms |
| `created_at` | ISO datetime | Yes | Creation timestamp |
| `updated_at` | ISO datetime | Yes | Last update timestamp |
---
## 3. Kernel State (`products/<product_id>/state.json`)
Contains all experiment data: hypotheses, variants, insights, and beliefs.
```json
{
"schema_version": "1.0",
"is_halted": false,
"halt_reason": null,
"hypotheses": {
"H-F60P01": { /* Hypothesis object */ },
"H-F60P02": { /* Hypothesis object */ }
},
"insights": {
"INS-001": { /* Insight object */ }
},
"beliefs": {
"B-001": { /* Belief object */ }
}
}
```
| Field | Type | Description |
|-------|------|-------------|
| `schema_version` | string | State schema version ("1.0") |
| `is_halted` | boolean | Whether experiments are paused |
| `halt_reason` | string or null | Reason for halt |
| `hypotheses` | Dict[str, Hypothesis] | All hypotheses keyed by ID |
| `insights` | Dict[str, Insight] | All insights keyed by ID |
| `beliefs` | Dict[str, Belief] | All beliefs keyed by ID |
---
## 4. Hypothesis Structure
Each hypothesis represents a testable marketing claim.
```json
{
"hypothesis_id": "H-F60P01",
"statement": "Testing whether emphasizing independence resonates with adults 50+ seeking wellness solutions",
"independent_variable": "primary_text",
"dependent_metric": "ctr",
"audience_scope": "Adults 50+ interested in health and wellness",
"expected_direction": "increase",
"confidence_level": "medium",
"status": "proposed",
"created_at": "2026-01-09T10:30:00.000000",
"updated_at": "2026-01-09T15:45:00.000000",
"expected_magnitude": "20-30% improvement",
"conclusion": null,
"abandonment_reason": null,
"evidence_summary": null,
"rationale": "Independence is a core value for this demographic",
"psychological_trigger": "autonomy preservation",
"risk_factors": "May not resonate with younger audience",
"success_criteria": "CTR > 2%, CPA < $50",
"test_duration_suggestion": "14 days",
"budget_suggestion": "$500 total",
"creative_brief": "Focus on maintaining active lifestyle",
"data_quality_flags": [],
"creative_variants": [
{ /* CreativeVariant object */ },
{ /* CreativeVariant object */ }
]
}
```
### Hypothesis Fields
| Field | Type | Required | Valid Values |
|-------|------|----------|--------------|
| `hypothesis_id` | string | Yes | Format: `H-<identifier>` |
| `statement` | string | Yes | Max 500 chars |
| `independent_variable` | string | Yes | e.g., "primary_text", "headline", "image" |
| `dependent_metric` | string | Yes | e.g., "ctr", "cpc", "cpa", "conversion_rate" |
| `audience_scope` | string | Yes | Target audience description |
| `expected_direction` | string | Yes | `"increase"`, `"decrease"`, `"change"` |
| `confidence_level` | string | Yes | `"low"`, `"medium"`, `"high"` |
| `status` | string | Yes | `"proposed"`, `"approved"`, `"active"`, `"concluded"`, `"abandoned"` |
| `created_at` | ISO datetime | Yes | Creation timestamp |
| `updated_at` | ISO datetime | Yes | Last update timestamp |
| `expected_magnitude` | string/null | No | e.g., "20-30%" |
| `conclusion` | string/null | No | `"confirmed"`, `"refuted"`, `"inconclusive"` |
| `abandonment_reason` | string/null | No | `"spend_cap"`, `"time_limit"`, `"early_stop"`, `"policy_block"`, `"human_override"` |
| `evidence_summary` | string/null | No | Summary of evidence |
| `rationale` | string/null | No | Why hypothesis was proposed |
| `psychological_trigger` | string/null | No | Psychological principle |
| `risk_factors` | string/null | No | Potential risks |
| `success_criteria` | string/null | No | Success metrics |
| `test_duration_suggestion` | string/null | No | Suggested duration |
| `budget_suggestion` | string/null | No | Suggested budget |
| `creative_brief` | string/null | No | Brief for creatives |
| `data_quality_flags` | string[] | No | Quality concerns |
| `creative_variants` | Variant[] | No | Array of variants |
---
## 5. Creative Variant Structure
Each variant is a specific creative execution tied to a hypothesis.
```json
{
"variant_id": "V-001",
"asset_type": "single_image",
"asset_reference": "pending",
"description": "Independence-focused messaging for wellness app",
"created_at": "2026-01-09T11:00:00.000000",
"platform_id": "linkedin",
"content_format": "short_form",
"primary_text": null,
"headline": "Stay Independent Longer: Understand Your Wellness Now",
"link_description": null,
"cta_button": "Learn More",
"hook": "Your independence matters.",
"angle": "risk mitigation",
"rationale": "Direct autonomy-preservation messaging",
"psychological_angle": "risk mitigation",
"target_emotion": "relief",
"differentiation": "Uses explicit autonomy language",
"image_description": "Active senior adult walking outdoors, checking fitness tracker",
"image_style": "professional lifestyle photography",
"image_mood": "confident and empowered"
}
```
### Variant Fields
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `variant_id` | string | Yes | Format: `V-<sequence>` |
| `asset_type` | string | Yes | e.g., "single_image", "video", "carousel" |
| `asset_reference` | string | Yes | Asset URL or "pending" |
| `description` | string | Yes | Brief variant description |
| `created_at` | ISO datetime | Yes | Creation timestamp |
| `platform_id` | string/null | No | `"facebook"`, `"linkedin"`, etc. |
| `content_format` | string/null | No | `"short_form"`, `"long_form"`, `"article"` |
| `primary_text` | string/null | No | Main ad copy (Facebook) |
| `headline` | string/null | No | Ad headline |
| `link_description` | string/null | No | Link description text |
| `cta_button` | string/null | No | CTA button text |
| `hook` | string/null | No | Attention-grabbing opening |
| `angle` | string/null | No | Persuasion angle |
| `rationale` | string/null | No | Why variant was created |
| `psychological_angle` | string/null | No | Psychological lever |
| `target_emotion` | string/null | No | Target emotion |
| `differentiation` | string/null | No | How it differs from others |
| `image_description` | string/null | No | Ideal image description |
| `image_style` | string/null | No | Visual style direction |
| `image_mood` | string/null | No | Image mood/atmosphere |
---
## 6. Metrics Snapshot Structure
Metrics are used for analyzing hypothesis performance. Each snapshot tracks performance data for a specific variant within a hypothesis, enabling A/B comparison analysis.
```json
{
"hypothesis_id": "H-F60P01",
"variant_id": "V-001",
"platform_id": "facebook",
"period_start": "2026-01-01T00:00:00.000000",
"period_end": "2026-01-07T23:59:59.000000",
"impressions": 5000,
"clicks": 150,
"conversions": 10,
"spend": 250.00
}
```
### Metrics Fields
| Field | Type | Required | Constraints |
|-------|------|----------|-------------|
| `hypothesis_id` | string | Yes | Must match existing hypothesis |
| `variant_id` | string/null | No | Variant identifier (e.g., "V-001") for A/B comparison |
| `platform_id` | string/null | No | Platform identifier (e.g., "facebook", "linkedin") |
| `period_start` | ISO datetime | Yes | Start of measurement period |
| `period_end` | ISO datetime | Yes | End of period (must be > start) |
| `impressions` | integer | Yes | >= 0 |
| `clicks` | integer | Yes | >= 0, <= impressions |
| `conversions` | integer | Yes | >= 0, <= clicks |
| `spend` | float | Yes | >= 0.0 |
### Variant-Level Metrics for A/B Testing
When ingesting metrics, you should specify which variant the metrics are for. This enables:
- Per-variant performance comparison
- Identification of winning variants
- Proper A/B test analysis
Example: Metrics for two variants in the same hypothesis:
```json
// Variant A metrics
{
"hypothesis_id": "H-F60P01",
"variant_id": "V-001",
"platform_id": "facebook",
"impressions": 5000,
"clicks": 150,
"conversions": 10,
"spend": 250.00,
"period_start": "2026-01-01T00:00:00",
"period_end": "2026-01-07T23:59:59"
}
// Variant B metrics
{
"hypothesis_id": "H-F60P01",
"variant_id": "V-002",
"platform_id": "facebook",
"impressions": 5000,
"clicks": 200,
"conversions": 15,
"spend": 250.00,
"period_start": "2026-01-01T00:00:00",
"period_end": "2026-01-07T23:59:59"
}
```
### Derived Metrics (Computed)
These are calculated from the base metrics:
| Metric | Formula |
|--------|---------|
| CTR | clicks / impressions * 100 (%) |
| CPC | spend / clicks ($) |
| CPA | spend / conversions ($) |
| Conversion Rate | conversions / clicks * 100 (%) |
### Aggregated Variant Metrics
When analyzing A/B test results, the system aggregates metrics per variant:
```json
{
"variant_id": "V-001",
"platform_id": "facebook",
"impressions": 10000,
"clicks": 300,
"conversions": 20,
"spend": 500.00,
"ctr": 3.0,
"cpc": 1.67,
"cpa": 25.00,
"conversion_rate": 6.67,
"snapshot_count": 2
}
```
The analyst uses these aggregated metrics to:
1. Rank variants by performance (conversions, CTR, CPA)
2. Calculate relative performance differences between variants
3. Identify the winning variant for the hypothesis
---
## 7. Referencing Entities
### Product Reference
```
products/<product_id>/
```
Example: `products/prod_acme_wellness/`
### Hypothesis Reference
```
products/<product_id>/state.json → hypotheses.<hypothesis_id>
```
Example: `hypotheses.H-F60P01`
### Variant Reference
```
products/<product_id>/state.json → hypotheses.<hypothesis_id>.creative_variants[<index>]
```
Or by variant_id: Find variant where `variant_id == "V-001"`
---
## 8. Multi-Entity Support
### Multiple Products
Yes - each product has its own directory with isolated state:
```
products/
├── prod_acme_wellness/
│ ├── context.json
│ └── state.json
├── prod_saas_platform/
│ ├── context.json
│ └── state.json
```
### Multiple Hypotheses per Product
Yes - hypotheses are stored as a dictionary in state.json:
```json
{
"hypotheses": {
"H-001": { /* hypothesis 1 */ },
"H-002": { /* hypothesis 2 */ },
"H-003": { /* hypothesis 3 */ }
}
}
```
### Multiple Variants per Hypothesis
Yes - variants are stored as an array within each hypothesis:
```json
{
"hypothesis_id": "H-001",
"creative_variants": [
{ "variant_id": "V-001", "platform_id": "facebook", ... },
{ "variant_id": "V-002", "platform_id": "facebook", ... },
{ "variant_id": "V-003", "platform_id": "linkedin", ... }
]
}
```
### Metrics for Multiple Variants
Metrics can be tracked at the variant level using the `variant_id` field. Each metrics snapshot should specify:
- `hypothesis_id` - the parent hypothesis
- `variant_id` - the specific variant (e.g., "V-001", "V-002")
- `platform_id` - the advertising platform (e.g., "facebook", "linkedin")
This enables proper A/B test analysis where you can compare performance across variants.
---
## 9. CSV Export Format
When exporting variants, the system can produce CSV with these columns:
| Column | Description |
|--------|-------------|
| `hypothesis_id` | Parent hypothesis ID |
| `ad_name` / `variant_id` | Variant identifier |
| `asset_type` | Asset format type |
| `platform_id` | Target platform |
| `audience` | Audience scope |
| `hook` | Attention hook |
| `angle` | Persuasion angle |
| `primary_text` / `intro_text` | Main copy (platform-specific) |
| `headline` | Headline text |
| `description` | Description/link description |
| `cta_type` | Call-to-action |
| `content_format` | Format (short_form/long_form) |
---
## 10. JSON Import Example
To import data programmatically, structure it as:
```json
{
"schema_version": "1.0",
"is_halted": false,
"halt_reason": null,
"hypotheses": {
"H-IMPORT-001": {
"hypothesis_id": "H-IMPORT-001",
"statement": "Testing new messaging approach",
"independent_variable": "primary_text",
"dependent_metric": "ctr",
"audience_scope": "Target audience description",
"expected_direction": "increase",
"confidence_level": "medium",
"status": "proposed",
"created_at": "2026-01-10T10:00:00.000000",
"updated_at": "2026-01-10T10:00:00.000000",
"data_quality_flags": [],
"creative_variants": []
}
},
"insights": {},
"beliefs": {}
}
```
---
## 11. Validation Rules
### Required Fields
- Hypothesis: `hypothesis_id`, `statement`, `independent_variable`, `dependent_metric`, `audience_scope`, `expected_direction`, `confidence_level`, `status`, `created_at`, `updated_at`
- Variant: `variant_id`, `asset_type`, `asset_reference`, `description`, `created_at`
- Metrics: `hypothesis_id`, `period_start`, `period_end`, `impressions`, `clicks`, `conversions`, `spend`
### Recommended Fields for A/B Testing
- Metrics: `variant_id` (to enable per-variant comparison), `platform_id` (to filter by platform)
### Enum Validation
All enum fields are validated on load. Invalid values cause deserialization errors.
### Relationship Integrity
- Variants reference parent hypothesis via embedding (no foreign key)
- Metrics reference hypothesis via `hypothesis_id` (must exist)
- Insights reference hypotheses via `evidence_hypothesis_ids[]`
---
## 12. Common Operations
### Add a New Hypothesis
1. Load `state.json`
2. Add new hypothesis object to `hypotheses` dict
3. Save `state.json`
### Add Variants to Hypothesis
1. Load `state.json`
2. Find hypothesis by ID
3. Append to `creative_variants` array
4. Update `updated_at` timestamp
5. Save `state.json`
### Record Metrics
Metrics are logged through the event system, not directly in state.json.
Use the CLI or API to record metrics for analysis.
---
## Questions?
For implementation details, see:
- `agents_learning_kernel_types.py` - Data type definitions
- `agents_learning_kernel_state_io.py` - Serialization/deserialization
- `body_services.py` - Service layer for data operations
- Without a harness, you **can't compare** prompts, models, retrieval configs, or costs.
Evaluate, benchmark, and regression-test AI/LLM systems. Covers evaluation framework design, benchmark creation, human evaluation protocols, automated evaluation (LLM-as-judge), regression testing, statistical significance, and continuous evaluation pipelines.
<img width="1388" height="298" alt="full_diagram" src="https://github.com/user-attachments/assets/12a2371b-8be2-4219-9b48-90503eb43c69" />
A list of all public EEG-datasets. This list of EEG-resources is not exhaustive. If you find something new, or have explored any unfiltered link in depth, please update the repository.