Loading...
Loading...
Loading...
# AccessGuard AI - Technical Requirements Document (TRD)
**Version:** 1.0.0
**Date:** March 17, 2026
**Status:** Draft
---
## 1. Executive Summary
AccessGuard AI is an insider threat detection system combining AI-powered behavior analysis with real-time alerts. The system consists of three primary components:
1. **Chrome Extension** - Browser-side monitoring and real-time alerts
2. **Backend Detection Engine** - Sigma rules + ML-based anomaly detection
3. **Web Dashboard** - React-based analytics and investigation interface
All frontend components (extension + dashboard) will implement the **Dark Neumorphic Design System** for consistent, modern UI/UX.
---
## 2. System Architecture
### 2.1 High-Level Architecture
```
┌─────────────────────┐
│ Chrome Extension │ ──────┐
│ (Browser Monitor) │ │
└─────────────────────┘ │
│ WebSocket/REST
┌─────────────────────┐ │
│ Log Collectors │ ──────┤
│ (Windows/Linux) │ │
└─────────────────────┘ │
▼
┌──────────────────┐
│ Backend API │
│ (FastAPI) │
└──────────────────┘
│
┌─────────┴─────────┐
│ │
┌───────▼──────┐ ┌──────▼──────┐
│ Rules Engine │ │ ML Engine │
│ (Sigma) │ │ (Anomaly) │
└───────┬──────┘ └──────┬──────┘
│ │
└─────────┬─────────┘
│
┌─────────▼─────────┐
│ Alert Queue │
│ (RabbitMQ) │
└─────────┬─────────┘
│
┌─────────▼─────────┐
│ PostgreSQL │
│ InfluxDB │
└───────────────────┘
│
┌─────────▼─────────┐
│ Web Dashboard │
│ (React) │
└───────────────────┘
```
### 2.2 Technology Stack
**Frontend:**
- React 18+ with TypeScript
- Tailwind CSS (configured with Dark Neumorphic theme)
- Recharts/D3.js for visualizations
- WebSocket client for real-time updates
- Chrome Extension Manifest V3
**Backend:**
- Python 3.11+
- FastAPI (REST API)
- Celery (async task processing)
- RabbitMQ (message queue)
- PostgreSQL (relational data)
- InfluxDB (time-series metrics)
**ML/Detection:**
- Sigma rules (YAML-based detection)
- scikit-learn (behavioral baselining)
- pandas/numpy (data processing)
**Infrastructure:**
- Docker + Docker Compose
- AWS S3 (log archival)
- Nginx (reverse proxy)
- Redis (caching)
---
## 3. Functional Requirements
### 3.1 Chrome Extension
**FR-EXT-001:** Monitor browser downloads in real-time
**FR-EXT-002:** Capture clipboard copy events (with content hashing)
**FR-EXT-003:** Track authentication attempts (login forms)
**FR-EXT-004:** Log browsing activity (URL patterns, not full URLs)
**FR-EXT-005:** Display real-time alerts via browser notifications
**FR-EXT-006:** Block suspicious file downloads (configurable)
**FR-EXT-007:** Warn users before dangerous actions (e.g., bulk downloads)
**FR-EXT-008:** Send telemetry to backend API via REST
**FR-EXT-009:** Implement Dark Neumorphic UI for popup/settings
**FR-EXT-010:** Support offline queuing (sync when online)
### 3.2 Backend Detection Engine
**FR-BE-001:** Ingest logs from multiple sources (Windows Event Viewer, Syslog, Chrome Extension)
**FR-BE-002:** Normalize logs to common schema
**FR-BE-003:** Execute 40+ Sigma detection rules
**FR-BE-004:** Train ML models on user behavior (4+ weeks baseline)
**FR-BE-005:** Compute anomaly scores for each event
**FR-BE-006:** Aggregate related alerts (deduplication)
**FR-BE-007:** Calculate user risk scores (0-100)
**FR-BE-008:** Generate compliance audit trails
**FR-BE-009:** Send notifications (Slack, email, webhook)
**FR-BE-010:** Archive raw logs to S3 (encrypted, 30-90 day retention)
### 3.3 Web Dashboard
**FR-DASH-001:** Executive summary page (KPIs, trends, top 5 users)
**FR-DASH-002:** Real-time alert queue (sortable, filterable)
**FR-DASH-003:** User timeline view (complete activity history)
**FR-DASH-004:** Risk ranking page (scored user list)
**FR-DASH-005:** Alert investigation drill-down (context, related events)
**FR-DASH-006:** Compliance reporting (SOX, HIPAA, GDPR templates)
**FR-DASH-007:** Rule management interface (enable/disable, tune thresholds)
**FR-DASH-008:** User search and filtering
**FR-DASH-009:** WebSocket-based real-time updates
**FR-DASH-010:** Implement Dark Neumorphic Design System throughout
**FR-DASH-011:** Role-based access control (Admin, Analyst, Viewer)
**FR-DASH-012:** Export reports (PDF, CSV)
---
## 4. Non-Functional Requirements
### 4.1 Performance
**NFR-PERF-001:** Alert latency < 5 seconds (from event to dashboard)
**NFR-PERF-002:** Dashboard page load < 2 seconds
**NFR-PERF-003:** Support 500 concurrent users
**NFR-PERF-004:** Process 10,000 events/second
**NFR-PERF-005:** Database queries < 500ms (95th percentile)
### 4.2 Reliability
**NFR-REL-001:** System availability 99.5% (production)
**NFR-REL-002:** Zero data loss (message queue persistence)
**NFR-REL-003:** Graceful degradation (if ML engine fails, rules still work)
**NFR-REL-004:** Automatic retry for failed log ingestion
### 4.3 Security
**NFR-SEC-001:** All data encrypted at rest (AES-256)
**NFR-SEC-002:** TLS 1.3 for all network communication
**NFR-SEC-003:** JWT-based authentication (15-minute expiry)
**NFR-SEC-004:** Role-based access control (RBAC)
**NFR-SEC-005:** Audit logging for all admin actions
**NFR-SEC-006:** PII masking in logs (email, SSN, credit cards)
### 4.4 Compliance
**NFR-COMP-001:** GDPR-compliant data retention (configurable)
**NFR-COMP-002:** SOX audit trail (immutable logs)
**NFR-COMP-003:** HIPAA-compliant encryption and access controls
**NFR-COMP-004:** Data deletion API (right to be forgotten)
### 4.5 Usability
**NFR-UX-001:** Dashboard responsive (desktop, tablet)
**NFR-UX-002:** False positive rate < 10% by Month 3
**NFR-UX-003:** Consistent Dark Neumorphic UI across all interfaces
**NFR-UX-004:** Accessibility (WCAG 2.1 AA target)
**NFR-UX-005:** Onboarding tutorial for new users
---
## 5. Design System Implementation
### 5.1 Dark Neumorphic Theme Application
All frontend components must implement the design system from `style.json`:
**Color Palette:**
- Primary Background: `#0f0f1e`
- Secondary Background: `#1a1a2e`
- Accent Colors: Magenta (`#ff006e`), Cyan (`#00d4ff`), Orange (`#ff6b35`)
- Text: White (`#ffffff`) with secondary gray tones
**Component Styling:**
- Cards: Neumorphic shadows with glass effect
- Buttons: Gradient backgrounds with glow effects
- Inputs: Subtle borders with focus glow
- Charts: Dark backgrounds with neon accent colors
**Typography:**
- Primary Font: Inter, Segoe UI, Roboto
- Monospace: Fira Code (for logs/code)
- Font Sizes: 0.75rem - 3rem scale
**Effects:**
- Glow shadows on interactive elements
- Smooth transitions (300ms cubic-bezier)
- Backdrop blur for overlays
- Neon text shadows for emphasis
### 5.2 Component Library
Build reusable React components:
- `<Card>` - Neumorphic card container
- `<Button>` - Primary, secondary, ghost variants
- `<MetricCard>` - KPI display with gradient background
- `<ChartContainer>` - Wrapper for charts with dark theme
- `<Badge>` - Status indicators (magenta, cyan, success)
- `<ProgressRing>` - Circular progress with glow
- `<Input>` - Form inputs with focus effects
- `<Sidebar>` - Navigation sidebar
- `<Navbar>` - Top navigation bar
---
## 6. Data Models
### 6.1 PostgreSQL Schema
**users**
```sql
id: UUID PRIMARY KEY
username: VARCHAR(255) UNIQUE
email: VARCHAR(255)
department: VARCHAR(100)
role: VARCHAR(50)
risk_score: INTEGER (0-100)
baseline_computed: BOOLEAN
created_at: TIMESTAMP
updated_at: TIMESTAMP
```
**events**
```sql
id: UUID PRIMARY KEY
user_id: UUID FOREIGN KEY
event_type: VARCHAR(100)
source: VARCHAR(50) -- 'windows', 'chrome', 'linux'
timestamp: TIMESTAMP
raw_data: JSONB
normalized_data: JSONB
risk_score: INTEGER
created_at: TIMESTAMP
INDEX ON (user_id, timestamp)
INDEX ON (event_type, timestamp)
```
**alerts**
```sql
id: UUID PRIMARY KEY
user_id: UUID FOREIGN KEY
alert_type: VARCHAR(100)
severity: VARCHAR(20) -- 'critical', 'high', 'medium', 'low'
title: VARCHAR(255)
description: TEXT
rule_id: VARCHAR(100)
event_ids: UUID[] -- related events
status: VARCHAR(20) -- 'new', 'investigating', 'resolved', 'false_positive'
assigned_to: UUID FOREIGN KEY (nullable)
created_at: TIMESTAMP
updated_at: TIMESTAMP
resolved_at: TIMESTAMP (nullable)
INDEX ON (status, created_at)
INDEX ON (user_id, created_at)
```
**detection_rules**
```sql
id: UUID PRIMARY KEY
name: VARCHAR(255)
rule_type: VARCHAR(50) -- 'sigma', 'ml'
enabled: BOOLEAN
sigma_yaml: TEXT (nullable)
threshold: FLOAT (nullable)
false_positive_count: INTEGER
true_positive_count: INTEGER
created_at: TIMESTAMP
updated_at: TIMESTAMP
```
### 6.2 InfluxDB Measurements
**user_activity_metrics**
- Fields: login_count, file_access_count, download_count, failed_auth_count
- Tags: user_id, department, time_bucket (hourly)
**system_metrics**
- Fields: events_processed, alerts_generated, processing_latency_ms
- Tags: component (rules_engine, ml_engine, api)
---
## 7. API Specifications
### 7.1 REST API Endpoints
**Authentication:**
```
POST /api/v1/auth/login
POST /api/v1/auth/logout
POST /api/v1/auth/refresh
```
**Alerts:**
```
GET /api/v1/alerts # List alerts (paginated, filtered)
GET /api/v1/alerts/{id} # Get alert details
PATCH /api/v1/alerts/{id} # Update alert status
POST /api/v1/alerts/{id}/comment # Add investigation comment
```
**Users:**
```
GET /api/v1/users # List users with risk scores
GET /api/v1/users/{id} # Get user profile
GET /api/v1/users/{id}/timeline # Get user activity timeline
GET /api/v1/users/{id}/risk # Get risk score breakdown
```
**Events:**
```
POST /api/v1/events # Ingest event (from Chrome extension)
GET /api/v1/events # Query events (admin only)
```
**Rules:**
```
GET /api/v1/rules # List detection rules
PATCH /api/v1/rules/{id} # Update rule (enable/disable, tune)
POST /api/v1/rules/test # Test rule against sample data
```
**Dashboard:**
```
GET /api/v1/dashboard/summary # Executive summary KPIs
GET /api/v1/dashboard/trends # Alert trends (time-series)
```
### 7.2 WebSocket Events
**Client → Server:**
```
subscribe_alerts # Subscribe to real-time alerts
subscribe_user:{id} # Subscribe to user activity updates
```
**Server → Client:**
```
new_alert # New alert created
alert_updated # Alert status changed
user_risk_updated # User risk score changed
```
---
## 8. Detection Rules
### 8.1 Tier 1 Rules (Critical - Immediate Response)
**RULE-001: Privilege Escalation**
- Trigger: Non-admin user executes admin command
- Severity: Critical
- Response: Block + immediate alert
**RULE-002: Mass Data Exfiltration**
- Trigger: 10+ sensitive files accessed in 1 hour
- Severity: Critical
- Response: Alert + notify CISO
**RULE-003: Lateral Movement**
- Trigger: Unauthorized access to admin shares
- Severity: Critical
- Response: Alert + block network access
### 8.2 Tier 2 Rules (High - Daily Review)
**RULE-004: Off-Hours Admin Activity**
- Trigger: Admin activity outside 9am-6pm
- Severity: High
- Response: Alert
**RULE-005: Credential Misuse**
- Trigger: 3+ failed logins → success
- Severity: High
- Response: Alert + require MFA
**RULE-006: Suspicious Downloads**
- Trigger: Download of source code, databases, trade secrets
- Severity: High
- Response: Alert + log file hash
### 8.3 Tier 3 Rules (Medium - Weekly Review)
**RULE-007: Role Misuse**
- Trigger: User accesses files outside department
- Severity: Medium
- Response: Log + weekly report
**RULE-008: Unusual Login Pattern**
- Trigger: Login from new location/device
- Severity: Medium
- Response: Alert + require verification
**RULE-009: Hacking Tool Execution**
- Trigger: Mimikatz, psexec, nmap detected
- Severity: Medium
- Response: Alert + quarantine process
**RULE-010: Account Enumeration**
- Trigger: Bulk AD queries by non-admin
- Severity: Medium
- Response: Alert
---
## 9. ML Models
### 9.1 Behavioral Baselining
**Model:** Isolation Forest (anomaly detection)
**Features:**
- Login times (hour of day, day of week)
- File access patterns (count, file types, directories)
- Download frequency and volume
- Failed authentication rate
- Network access patterns
**Training:**
- Minimum 4 weeks of data per user
- Retrain weekly with new data
- Separate models per department
**Output:**
- Anomaly score (0-1, higher = more anomalous)
- Contributes 30% to overall risk score
### 9.2 Risk Scoring Algorithm
```
risk_score = (
0.40 * sigma_rule_score + # Sigma rule matches
0.30 * ml_anomaly_score + # ML anomaly detection
0.20 * historical_incidents + # Past incidents
0.10 * privilege_level # User privilege level
)
Normalized to 0-100 scale
```
---
## 10. Deployment Architecture
### 10.1 Production Environment
**Components:**
- 2x API servers (load balanced)
- 1x PostgreSQL (primary + read replica)
- 1x InfluxDB
- 1x RabbitMQ cluster (3 nodes)
- 1x Redis (caching)
- 1x Nginx (reverse proxy + SSL termination)
**Scaling:**
- Horizontal: Add API servers behind load balancer
- Vertical: Increase database resources
- Queue: Add RabbitMQ nodes for high throughput
### 10.2 Development Environment
**Docker Compose:**
```yaml
services:
- api (FastAPI)
- postgres
- influxdb
- rabbitmq
- redis
- frontend (React dev server)
```
---
## 11. Testing Strategy
### 11.1 Unit Tests
- Backend: pytest (80% coverage target)
- Frontend: Jest + React Testing Library (70% coverage)
- Detection rules: Test against known malicious patterns
### 11.2 Integration Tests
- API endpoint tests (all CRUD operations)
- WebSocket connection tests
- Database migration tests
### 11.3 End-to-End Tests
- User login → view alerts → investigate → resolve
- Chrome extension → event ingestion → alert generation
- Rule trigger → notification delivery
### 11.4 Performance Tests
- Load testing: 10,000 events/second
- Stress testing: 1,000 concurrent dashboard users
- Latency testing: Alert generation < 5 seconds
### 11.5 Security Tests
- Penetration testing (external consultant)
- SQL injection tests
- XSS/CSRF tests
- Authentication bypass attempts
---
## 12. Monitoring & Observability
### 12.1 Metrics
**System Metrics:**
- Events processed per second
- Alert generation rate
- API response times (p50, p95, p99)
- Database query performance
- Queue depth (RabbitMQ)
**Business Metrics:**
- False positive rate
- True positive rate
- Mean time to detect (MTTD)
- Mean time to respond (MTTR)
- User adoption rate (Chrome extension)
### 12.2 Logging
**Log Levels:**
- ERROR: System failures, exceptions
- WARN: Degraded performance, retries
- INFO: Normal operations, audit events
- DEBUG: Detailed troubleshooting (dev only)
**Log Aggregation:**
- Centralized logging (ELK stack or CloudWatch)
- Structured JSON logs
- Correlation IDs for request tracing
### 12.3 Alerting
**System Alerts:**
- API server down
- Database connection failures
- Queue backlog > 10,000 messages
- Disk space < 10%
**Business Alerts:**
- False positive rate > 15%
- No events received in 5 minutes
- Critical alert not acknowledged in 10 minutes
---
## 13. Security Considerations
### 13.1 Data Privacy
- PII masking in logs (email, SSN, credit cards)
- Configurable data retention (30-90 days)
- Data deletion API (GDPR compliance)
- Employee transparency (notify users of monitoring)
### 13.2 Access Control
- JWT-based authentication
- Role-based access control (Admin, Analyst, Viewer)
- API rate limiting (100 requests/minute per user)
- Audit logging for all admin actions
### 13.3 Encryption
- TLS 1.3 for all network traffic
- AES-256 for data at rest (S3, database)
- Encrypted backups
- Secure key management (AWS KMS or HashiCorp Vault)
---
## 14. Compliance Requirements
### 14.1 GDPR
- Data retention policies
- Right to be forgotten (data deletion API)
- Data portability (export user data)
- Consent management (opt-in for monitoring)
### 14.2 SOX
- Immutable audit trails
- Access control logging
- Change management tracking
- Quarterly compliance reports
### 14.3 HIPAA
- Encrypted data storage and transmission
- Access control and authentication
- Audit logging
- Business associate agreements (if applicable)
---
## 15. Success Criteria
### 15.1 Technical Metrics
- ✓ Alert latency < 5 seconds
- ✓ False positive rate < 10% by Month 3
- ✓ 95%+ Chrome extension adoption
- ✓ 99.5%+ system availability
- ✓ 0 false negatives on test scenarios
### 15.2 Business Metrics
- ✓ Insider threat detected within 10 minutes
- ✓ Investigation time reduced by 80%
- ✓ Compliance audit findings drop 50%
- ✓ Prevention of 1-2 insider incidents per year
### 15.3 User Satisfaction
- ✓ IT security team rating: 7/10 or higher
- ✓ <5 support tickets per month
- ✓ Monthly tuning sessions with IT team
---
## 16. Risks & Mitigation
| Risk | Impact | Probability | Mitigation |
|------|--------|-------------|-----------|
| False positive explosion | High | Medium | Weekly rule tuning, start with high-confidence rules only |
| Small team burnout | High | Medium | Ruthless prioritization, consider contractors |
| Privacy violations | Critical | Low | Early legal review, clear retention policies |
| Log ingestion bottleneck | High | Medium | Use message queue, load test early, plan for Kafka |
| Integration complexity | Medium | High | Start simple (direct logs), build connectors incrementally |
| ML model drift | Medium | Medium | Retrain weekly, monitor anomaly score distribution |
| Chrome extension adoption | High | Medium | Gradual rollout, user training, executive sponsorship |
---
## 17. Dependencies
### 17.1 External Dependencies
- Legal team approval (data collection, retention)
- IT security team buy-in
- Executive sponsorship
- Budget approval ($500-700k)
### 17.2 Technical Dependencies
- Windows Event Viewer access
- Active Directory integration
- Chrome browser deployment (managed)
- AWS account or on-premises infrastructure
---
## 18. Assumptions
1. Organization has <500 users
2. Windows-based infrastructure (primary)
3. Chrome is the standard browser
4. IT team available for weekly tuning
5. Legal approval for employee monitoring
6. Budget available for 3-person team + infrastructure
---
## 19. Out of Scope
The following are explicitly out of scope for v1.0:
- Mobile device monitoring (iOS, Android)
- Email content analysis
- Network packet inspection
- Endpoint DLP (data loss prevention)
- Integration with SIEM (Splunk, QRadar) - planned for v2.0
- Multi-tenancy support
- On-premises deployment (cloud-only for v1.0)
---
## 20. Glossary
- **Sigma Rules:** Open-source detection rule format (YAML-based)
- **Behavioral Baselining:** ML technique to learn "normal" user behavior
- **Risk Score:** 0-100 metric indicating user threat level
- **Neumorphic Design:** UI style combining flat design with subtle shadows
- **Lateral Movement:** Attacker moving between systems after initial compromise
- **Privilege Escalation:** Gaining higher access than authorized
- **Data Exfiltration:** Unauthorized data transfer outside organization
---
## 21. Approval
| Role | Name | Signature | Date |
|------|------|-----------|------|
| Project Sponsor | [Name] | _________ | _____ |
| Technical Lead | [Name] | _________ | _____ |
| Security Lead | [Name] | _________ | _____ |
| Legal Counsel | [Name] | _________ | _____ |
---
**Document Control:**
- Version: 1.0.0
- Last Updated: March 17, 2026
- Next Review: April 17, 2026
- Owner: [Technical Lead Name]
> 屬於 [research/](./README.md)。涵蓋 LLM-as-Judge、Reasoning Model、評估維度、Judge 設計原則。
> ⚠️ Note (Option A): `hwp-web (planned)` is intentionally excluded/disabled in this repo snapshot.
Here are three new, highly specialized AI agents for the T20 framework:
The **LLM Judge** is LLMTrace's third security detector alongside the