Loading...
Loading...
Loading...
# Software Requirements Specification (SRS)
Document version: 0.2
Last updated: 2025-10-28
Author: (Drafted by developer / to be reviewed by project owner)
**Constitution Alignment**: This SRS adheres to the [Yoga Pose Tracking Constitution](yoga_pose_tracking/.specify/memory/constitution.md) v1.0.0. All requirements comply with the seven core principles: Real-Time Performance, Privacy & Data Protection, Hardcoded Biomechanics Logic, ML Model Transparency, User Feedback Clarity, Analytics & Progress Tracking, and Simplicity & Incremental Delivery.
## 1. Purpose
This SRS describes requirements for a Yoga Pose Tracking application that uses a camera and machine learning to detect human skeleton keypoints, runs 2-minute sessions for simple yoga asanas, provides real-time textual and audio feedback about pose accuracy, and logs data to improve the model.
Primary stakeholders:
- End users (yoga practitioners)
- Product owner
- ML engineers
- Mobile/Desktop app developers
## 2. Scope
The system will:
- Offer curated sessions of simple, easy-to-detect yoga poses (each session: 2 minutes per pose)
- Use camera input to detect human skeleton nodes (keypoints) in real time
- Use a combination of ML pose estimation and explicit (hardcoded) biomechanics rules to evaluate pose correctness
- Provide real-time feedback via text (sidebar/in-app) and sound (TTS/beeps) indicating correctness percentage and actionable improvements
- Log anonymized pose data and feedback to a training pipeline to improve the ML model over time
- Log anonymized pose data and feedback to a training pipeline to improve the ML model over time
- Provide live biometric metrics during practice (heart rate, estimated calories) and a detailed post-session analytics dashboard showing session summary and progress over time.
Out of scope for v0.1:
- Full multi-pose flows (sequences with transitions) — initial focus is one pose per session
- Multi-user simultaneous tracking in the same feed (only single-person supported)
## 3. Definitions, acronyms, abbreviations
- Pose estimation: predicting locations of human body keypoints (skeleton nodes) from images/video.
- Keypoint: A single joint/body landmark (e.g., left shoulder, right knee).
- ML model: the trained pose estimation and/or pose classification/regression model.
- TTS: Text-to-speech engine.
## 4. High-level description
User flow (happy path):
1. User opens the app.
2. User selects a simple pose session (2-minute duration) from a curated list.
3. App requests camera permission and positions user with on-screen guidance.
4. App shows clear instructions in a left/right sidebar and a central camera preview.
5. When user presses Start, the system runs pose estimation and either starts session immediately or after a short calibration period (3–5 seconds).
6. During the 2-minute session, the system computes a real-time correctness score (0–100%) and displays textual instructions and plays short audio cues when corrections are needed.
7. At session end, a summary and optional upload/consent prompt appears for contributing the session data to model improvement.
## 5. Example curated simple poses (v0.1)
Selection criteria: static or near-static poses, primarily standing or simple bending, clearly separable using keypoint angles.
- Mountain Pose (Tadasana) — standing neutral (primary checks: spine alignment, shoulder-roll, straight legs/knees)
- Forward Fold (Uttanasana) — forward hip hinge (primary checks: hip angle, spine curve)
- Chair Pose (Utkatasana) — knee/hip flexion and back angle (simple knee/hip angle thresholds)
- Tree Pose (Vrikshasana) (optional) — single-leg balance with recognizable leg/hip configuration (only if single-person stable detection is possible)
Notes: the system must be configurable to enable/disable particular poses based on detection reliability in the target environment.
## 6. Functional requirements
FR-1: Pose session management
- FR-1.1: The system shall let the user select a pose session with fixed duration of 2 minutes.
- FR-1.2: The system shall show a countdown timer and pause/resume controls.
FR-2: Camera & pose detection
- FR-2.1: The system shall capture video from the front-facing camera and run skeleton keypoint detection at a minimum of 15 FPS (target 30 FPS where hardware allows).
- FR-2.2: The system shall output a set of keypoints per frame with confidence scores.
FR-3: Pose correctness evaluation
- FR-3.1: For each supported pose, the system shall compute explicit joint-angle measurements and compare them to pose-specific thresholds (hardcoded rules) to compute a correctness score in real time.
- FR-3.2: The system shall combine the ML pose estimation confidence with hardcoded rule results into a single correctness metric (0–100%).
FR-4: Real-time feedback
- FR-4.1: The system shall display the current correctness score and a top-3 list of actionable textual corrections at the sidebar in clear, short sentences.
- FR-4.2: The system shall announce corrections via audio (short TTS prompts or sound cues) when the correctness drops below configured thresholds.
- FR-4.3: The system shall provide a verbal summary at end of session and list most frequent corrections.
FR-5: Data logging and model update
- FR-5.1: The system shall log anonymized keypoint time series, computed scores, and textual corrections for each session (if user consent is given).
- FR-5.2: The system shall upload session data to a model training pipeline (batch or online) for periodic retraining.
FR-6: Accessibility & clarity
- FR-6.1: All sidebar text instructions must be concise, use large readable fonts, and use a high-contrast UI.
- FR-6.2: Audio prompts must be toggleable and support adjustable volume.
FR-7: Live metrics and analytics
- FR-7.1: The system shall display live biometric metrics (e.g., heart rate, estimated calories burned) in-session if available from device sensors or connected wearables.
- FR-7.2: The system shall provide a post-session analytics dashboard that includes per-session summary (mean/max heart rate, calories, average correctness score, time-in-target zones) and trends across sessions (weekly/monthly progress).
- FR-7.3: The system shall allow users to export or delete analytics and health-related data under privacy controls.
## 7. Non-functional requirements
- NFR-1: Latency — end-to-end processing (camera frame to displayed score) should be under 200 ms on supported devices.
- NFR-2: Accuracy — target per-pose detection accuracy (F1 or mean correctness) should be established in evaluation; initial acceptance: correctness > 75% for curated poses under good lighting.
- NFR-3: Reliability — the app should handle temporary keypoint dropouts gracefully by smoothing and not producing spurious corrections.
- NFR-4: Security & Privacy — personal video must not be uploaded without explicit user consent; uploads must be anonymized and stored securely.
- NFR-5: Platform — target platforms: desktop (Linux/Windows), Android, iOS (later), with camera API support.
## 8. System interfaces
8.1 Camera API
- Input: live camera frames at configured resolution (default 640x480 or 1280x720 where available).
8.2 ML Pose Estimation
- Input: frames
- Output: keypoints list, per-keypoint confidence
- Implementation note: recommended options include MediaPipe Pose, OpenPose, or a lightweight mobile model (MoveNet/BlazePose)
8.3 Text-to-Speech (TTS)
- Input: short instruction strings
- Output: audio played to user; must be low-latency
8.4 Remote training API (server)
- Endpoint to POST anonymized session artifacts (keypoints, aggregated metrics) with metadata (device, model-version) and consent flag.
8.5 Biometric & Health Interfaces
- Input sources: connected heart-rate sensors (Bluetooth LE HR straps), wearable platform integrations (HealthKit on iOS, Google Fit on Android), and phone sensors where applicable.
- Output: numeric stream of HR (beats per minute), optionally oxygen/other metrics if device provides, and derived metrics such as calories burned per session.
Implementation note: integrate with platform APIs (HealthKit/Google Fit) and provide optional pairing flow for Bluetooth HR straps. Fall back gracefully when no biometric input is available (show "No heart rate source connected").
## 9. Pose correctness rules (example: Mountain Pose)
Contract: The system computes angles between keypoints and compares to thresholds. Joint angle formula uses three points A-B-C and angle at B computed by vector math.
Example: Mountain Pose (Tadasana) checks
- Shoulder alignment: shoulder-to-hip line angle difference < 8 degrees (shoulders aligned vertically over hips)
- Arms: arms resting along sides — elbow angle ~180° ± 15°
- Knee/leg: knees should be straight — knee flexion angle (hip-knee-ankle) > 170°
- Head: head neutral — ear-shoulder vertical alignment within 10°
Scoring: Each check returns a subscore (0–1) weighted and summed into 0–100%. ML keypoint confidence multiplies the final score to downweight low-confidence frames.
Real-time feedback rules (example):
- If shoulder alignment off by >12° for >=1s, show text: "Relax and lower your shoulders — pull them down and back"; play short audio prompt.
- If knees bent >10° when they should be straight for >=1s, show text: "Straighten your knees slightly".
## 10. ML model behavior and update
10.1 On-device inference
- Use a fast pose-estimation model for real-time keypoints. Keep model size small for mobile.
10.2 Model confidence & smoothing
- Smooth keypoints across a sliding window (e.g., 0.5–1s) to reduce jitter. Maintain per-keypoint exponential smoothing and ignore isolated spikes.
10.3 Data collection & training pipeline
- Collected items: timestamped keypoint arrays, device metadata (no PII), per-frame correctness scores, final session annotations (user-accepted corrections).
- Privacy: require explicit opt-in before upload. Allow users to view/delete their uploaded sessions.
- Retraining: pipeline accepts batch uploads, curates labeled examples (auto-labeling using rules + human verification), and retrains pose classification/regression models.
10.4 Online learning (optional)
- For safety, v0.1 uses batch retraining. Online/edge learning can be explored later with strict safeguards.
## 11. Data storage and formats
- Store per-session JSON: {session_id, device_hash, model_version, pose_type, timestamps[], keypoints[][x,y,conf], frame_scores[], aggregated_metrics{mean,median,std}, consent}
- Files compressed and encrypted in transit (HTTPS, authenticated endpoint)
- Files compressed and encrypted in transit (HTTPS, authenticated endpoint)
When biometrics are available, include: {heart_rate_timeseries[], hr_summary{mean,max,min,time_in_zones}, calories_estimate}. Treat biometric data as sensitive: store only with explicit opt-in and provide clear retention/deletion options.
## 12. Error handling and edge cases
- Multiple people in frame: app should try to track the largest person by bounding box; if ambiguous, prompt user to step forward and center.
- Occlusion/missing keypoints: continue with available keypoints and lower confidence; delay correction prompts until consistent low-confidence occurs to avoid false alarms.
- Poor lighting / low resolution: show a clear UI message: "Lighting too low — please move to a brighter place".
- Camera angle strongly dihedral (not front-facing): suggest repositioning to keep full body visible.
Additional edge case: inconsistent biometric stream (e.g., intermittent BLE disconnect). Mitigation: buffer HR samples, mark intervals of missing data in analytics, and notify user to re-pair their device.
## 13. Accessibility and localization
- All textual instructions and audio prompts should be localizable. Provide at least English for v0.1.
- Provide adjustable font sizes and toggleable audio prompts.
## 14. Acceptance criteria and tests
AC-1: Session lifecycle
- Starting a 2-minute pose session begins the timer and runs detection continuously.
AC-2: Pose detection
- Given a user in Mountain Pose under good lighting, system computes a correctness score > 75% within 5 seconds of stabilization.
AC-3: Real-time correction
- If shoulder alignment deviates by >12° for >1s, the system displays the shoulder correction text and generates an audio cue within 500 ms of detection.
AC-4: Data logging
- With consent, session JSON is created and stored locally and uploaded successfully to the training endpoint.
AC-5: Analytics & biometrics
- If a heart-rate source is connected, the system displays live BPM in-session and records HR timeseries; the post-session dashboard summarizes mean/max HR and estimated calories.
- The post-session dashboard must render trends (last 7 sessions, 30 days) for correctness score and calories burned.
Minimal test cases (unit/integration):
- TC-1: Synthetic keypoint input for Mountain Pose (angles within thresholds) -> score > 85%.
- TC-2: Synthetic keypoint with knee bent -> system suggests "Straighten your knees".
- TC-3: Low-confidence keypoints -> system reduces score and delays correction prompts.
- TC-4: Synthetic HR timeseries + keypoints -> dashboard computes mean/max HR and calories estimate consistent with formula.
- TC-5: Intermittent HR input (gaps) -> dashboard marks missing intervals and does not crash; exported JSON includes nulls for missing timestamps.
## 15. Traceability matrix
| Requirement | Acceptance Criteria | Constitution Principle |
|-------------|---------------------|------------------------|
| FR-1 (session management) | AC-1 | VII. Simplicity & Incremental Delivery |
| FR-2 (camera & detection) | AC-2 | I. Real-Time Performance First |
| FR-3 (correctness evaluation) | AC-2, TC-1 | III. Hardcoded Biomechanics Logic |
| FR-4 (real-time feedback) | AC-3, TC-2 | V. User Feedback Clarity & Accessibility |
| FR-5 (data logging) | AC-4 | II. Privacy & Data Protection, IV. ML Model Transparency |
| FR-7 (live metrics & analytics) | AC-5, TC-4, TC-5 | VI. Analytics & Progress Tracking |
**Constitution Compliance Notes**:
- All FR requirements map to at least one Core Principle from the Constitution
- Non-negotiable principles (I, II, III) have explicit test coverage (TC-1 through TC-5)
- Privacy principle (II) enforced in FR-5 with consent and anonymization requirements
- Performance principle (I) enforced via NFR-1 (< 200 ms latency budget)
## 16. Security & privacy considerations
- Obtain explicit consent before storing/transmitting video or pose data.
- Transmit only anonymized numeric keypoints; do not store raw video unless user explicitly opts in.
- Provide an in-app privacy settings page to view and delete uploaded sessions.
## 17. Assumptions
- Single-person front-facing camera view for v0.1.
- Reasonable lighting and full-body visibility.
## 18. Future enhancements
- Sequence-based multi-pose sessions and transition detection.
- Multi-user/group classes.
- More advanced model personalization and online learning with federated options.
## 19. Small contract (inputs, outputs, error modes, success criteria)
- Inputs: camera frames (RGB), user-selected pose type, start/pause/resume commands, optional consent flag.
- Outputs: per-frame keypoints and confidences, real-time correctness score (0–100), textual correction messages, short audio cues, session JSON for logging.
- Inputs (expanded): camera frames (RGB), user-selected pose type, start/pause/resume commands, optional consent flag, optional biometric streams (heart rate/time-series) from connected devices or platform health APIs.
- Outputs (expanded): per-frame keypoints and confidences, real-time correctness score (0–100), textual correction messages, short audio cues, session JSON for logging, per-session biometric summaries and analytics dashboards (charts, trend numbers).
- Error modes: multiple people, low-confidence keypoints, occlusion, camera permission denied. In these cases the app should show clear guidance to the user and avoid producing misleading corrections.
- Success: User receives timely, accurate, actionable feedback and session data can be used to improve model quality.
## 20. Edge cases (top 5) and mitigations
1. Partial body visible — mitigation: require reframe prompt and degrade score gracefully.
2. Sudden spikes/jitter in keypoints — mitigation: smoothing window + require persistence (e.g., >0.8s) before prompting.
3. Multiple people — mitigation: track largest person; if two large detections, pause and ask user to ensure single person.
4. Low light — mitigation: display "Increase lighting" and reduce false-positive corrections until lighting improves.
5. User holds an object (phone) occluding body — mitigation: warn and request removal.
## 21. Deliverables
- `SRS.md` (this document)
- Minimal test harness for synthetic keypoint evaluation (to be created separately)
- UI mockups for sidebar and camera preview (future)
## 22. Next steps
1. Review SRS and Constitution with product owner and decide final curated poses for v0.1.
2. Implement unit tests for angle computation and scoring (TC-1, TC-2, TC-4, TC-5) per Constitution Principle III (Test-First Development).
3. Create minimal prototype using existing pose-estimation libs (MediaPipe/MoveNet) and the hardcoded checks.
4. Establish performance baseline on target devices to validate NFR-1 (< 200 ms latency) per Constitution Principle I.
5. Implement privacy consent flows and data anonymization per Constitution Principle II before any data collection.
## 23. Constitution Compliance Checklist
This SRS has been validated against the [Yoga Pose Tracking Constitution](yoga_pose_tracking/.specify/memory/constitution.md) v1.0.0:
- [x] **I. Real-Time Performance**: NFR-1 specifies < 200 ms latency; FR-2 specifies 15-30 FPS
- [x] **II. Privacy & Data Protection**: FR-5 requires explicit consent; FR-7.3 provides data export/deletion
- [x] **III. Hardcoded Biomechanics**: Section 9 defines explicit angle thresholds; TC-1/TC-2 test synthetic inputs
- [x] **IV. ML Model Transparency**: Section 10 specifies on-device inference, confidence weighting, and batch retraining
- [x] **V. User Feedback Clarity**: FR-4, FR-6 specify sidebar instructions, high-contrast UI, toggleable audio
- [x] **VI. Analytics & Progress Tracking**: FR-7 specifies live metrics and post-session dashboard with trends
- [x] **VII. Simplicity & Incremental Delivery**: Scope section defines v0.1 boundaries; out-of-scope items listed
All non-negotiable principles (I, II, III) have explicit requirements, acceptance criteria, and test cases.
---
Please review and tell me any pose choices you'd like to add/remove or any additional constraints (platform, privacy rules, or integration endpoints) and I will iterate the SRS.
You are an autonomous senior full-stack engineer responsible for building and maintaining a complete SaaS product. You operate with minimal supervision, making independent decisions while consulting on major strategic changes.
<author>blefnk/rules</author>
trigger: model_decision
description: Authoritative guide for all software-writing agents in this repository