Loading...
Loading...
Loading...
head -n 20 test_data/agents/comprehensive_agents.json
# Running and Reviewing Your Complete Testing Setup π
## Quick Start - Run Your Tests NOW!
### 1. **Verify Test Data is Ready**
```bash
# Check your test data was generated
ls -la test_data/
# Should show: agents/, workflows/, documents/, neural_fields/, etc.
# Quick data sample check
head -n 20 test_data/agents/comprehensive_agents.json
```
### 2. **Start Your Test Infrastructure** (if not running)
```bash
# Start test database with your new seed data
cd /Users/gkavanagh/Development/Automatos-AI-Platform/automatos-testing
docker-compose up -d postgres-test redis-test
# Verify containers are running
docker ps | grep automotas-test
```
### 3. **Load Test Data into Database**
```bash
# Load the comprehensive SQL seed data
docker exec -i automotas-test-postgres psql -U postgres -d automotas_test_db < test_data/comprehensive_seed_data.sql
# Verify data loaded
docker exec automotas-test-postgres psql -U postgres -d automotas_test_db -c "SELECT COUNT(*) FROM test_data.sample_agents;"
# Should show: 50
```
### 4. **Run Your n8n Workflow**
#### Option A: Via n8n UI
1. Open n8n interface (usually http://localhost:5678)
2. Import/Open your `workflow_backend_first.json`
3. Click "Execute Workflow" button
4. Watch all 11 test modules run in parallel!
#### Option B: Via n8n CLI (if configured)
```bash
n8n execute --file n8n/workflow_backend_first.json
```
#### Option C: Via Webhook Trigger (if configured)
```bash
curl -X POST http://localhost:5678/webhook/your-webhook-id
```
## What You'll See Running π
### All 11 Test Modules Running:
```
β
Test: Agent Management APIs
β
Test: Workflow Management APIs
β
Test: Context Engineering APIs
β
Test: Knowledge/Memory APIs
β
Test: Multi-Agent System APIs
β
Test: Security & Performance APIs
β
Test: Analytics & Monitoring APIs
β
Test: Context Policy APIs <- NEW!
β
Test: Code Graph APIs <- NEW!
β
Test: Playbooks APIs <- NEW!
β
Test: Patterns APIs <- NEW!
```
### Expected Output Structure:
```json
{
"agents": {
"ok": true,
"summary": {
"total_tests": 35,
"passed": 33,
"failed": 2,
"success_rate": 94.3
}
},
"workflows": { ... },
"context": { ... },
"knowledge": { ... },
"playbooks": { ... },
"settings": { ... },
"analytics": { ... },
"context_policy": { ... }, // NEW!
"code_graph": { ... }, // NEW!
"playbooks_api": { ... }, // NEW!
"patterns": { ... } // NEW!
}
```
## Review Checklist β
### 1. **Check Overall Success Rate**
- Look at the "Calculate: Success Rate & Failures" node output
- Target: 85%+ for production readiness
- With new test data: Expect 90-95%+
### 2. **Review New Module Results**
The 4 new test modules should show:
- **context_policy**: ~15 tests covering policy management
- **code_graph**: ~16 tests for code analysis
- **playbooks_api**: ~16 tests for playbook operations
- **patterns**: ~16 tests for pattern recognition
### 3. **Check AI Analysis Path**
If success rate < 85%, workflow routes to:
- AI Bug Analysis
- Fix Generation
- Auto-deployment
- Validation loop
### 4. **Performance Metrics**
With comprehensive test data:
- Parallel execution time: ~2-5 minutes
- Memory usage: Monitor Redis/Postgres
- API response times: Should be < 500ms
## Troubleshooting Common Issues π§
### Issue: Tests fail with 404 errors
```bash
# Check if API server is running
docker ps | grep automatos
# If not, start it:
cd ../automatos-ai
docker-compose up -d
```
### Issue: Database connection errors
```bash
# Check PostgreSQL is accessible
docker exec automotas-test-postgres psql -U postgres -c "SELECT 1;"
# Check test data exists
docker exec automotas-test-postgres psql -U postgres -d automotas_test_db -c "SELECT COUNT(*) FROM test_data.sample_agents;"
```
### Issue: n8n workflow doesn't show new nodes
```bash
# Refresh n8n and reimport workflow
# The file should have 1661 lines now (was 1517)
wc -l n8n/workflow_backend_first.json
```
## Quick Commands Reference π
```bash
# Run specific test module only
./scripts/run_module_tests.sh context_policy
# Check test results in Redis
docker exec automotas-test-redis redis-cli -a 'xplaincrypto-redis-35e759ed354c458f' get test_result:context_policy
# View real-time logs
docker logs -f automotas-test-postgres
docker logs -f automotas-test-redis
# Generate test report
python3 run_tests.py --generate-report
```
## Success Metrics π―
Your testing setup is now:
- **11 test modules** (was 7)
- **250+ test cases** total
- **720+ test data records**
- **Production-grade** edge cases
- **Multi-tenant** ready
- **Neural Field** enabled
## Next Steps After Review π
1. **If All Green (>95%)**:
- Move to user journey tests
- Add integration tests
- Performance benchmarking
2. **If Some Failures**:
- Let AI auto-fix run
- Review generated fixes
- Re-run affected modules
3. **If Major Issues**:
- Check Discord alerts
- Review error logs
- Manual intervention
## Final Review Command
Run this for a complete system check:
```bash
# One command to rule them all
cd /Users/gkavanagh/Development/Automatos-AI-Platform/automatos-testing && \
docker-compose up -d && \
sleep 5 && \
docker exec -i automotas-test-postgres psql -U postgres -d automotas_test_db < test_data/comprehensive_seed_data.sql && \
echo "β
Ready to run n8n workflow!"
```
Your platform is now ready for SERIOUS testing! π
- Without a harness, you **can't compare** prompts, models, retrieval configs, or costs.
Evaluate, benchmark, and regression-test AI/LLM systems. Covers evaluation framework design, benchmark creation, human evaluation protocols, automated evaluation (LLM-as-judge), regression testing, statistical significance, and continuous evaluation pipelines.
<img width="1388" height="298" alt="full_diagram" src="https://github.com/user-attachments/assets/12a2371b-8be2-4219-9b48-90503eb43c69" />
A list of all public EEG-datasets. This list of EEG-resources is not exhaustive. If you find something new, or have explored any unfiltered link in depth, please update the repository.