Loading...
Loading...
Loading...
Repository: `WajeehAlamoudi/Smart_AI_Camera`
# SMART-CAM (Smart AI Camera) — Full Documentation
Repository: `WajeehAlamoudi/Smart_AI_Camera`
Docs generated on: **2026-03-11**
---
## 1) Project Overview
**SMART-CAM** is an edge-first smart surveillance pipeline designed to run **fully locally** on devices like a **Raspberry Pi** (Pi 5 recommended) and optionally accelerate inference using a **Hailo NPU** through Picamera2’s Hailo integration.
The system does four major things in real time:
1. **Capture frames** from a camera source (Pi CSI camera, USB webcam, RTSP, video file, etc.).
2. **Detect people** using a Hailo-compiled **YOLOv8** object detection model (`yolov8m.hef`).
3. **Generate Re-Identification embeddings** using a Hailo-compiled **RepVGG ReID** model (`repvgg_a0_person_reid_512.hef`).
4. **Track identities across frames** using a hybrid method:
- **Spatial consistency** (IoU with Kalman-predicted boxes)
- **Appearance matching** (cosine similarity on embeddings)
5. **Persist identities and snapshots** in an **SQLite database**, and optionally update the best snapshot when a **face is detected** (Haar Cascade “bouncer”).
It supports two main run modes:
- **Web dashboard mode** using **FastAPI** (`app.py`) with MJPEG streaming + database APIs.
- **Local desktop window mode** using OpenCV display (`main.py`) for direct monitor output.
---
## 2) Repository Structure (Key Files)
> Note: I fetched the most important files; if more files exist beyond what was returned by search, they may not be shown here.
- `README.md` — high-level description + install instructions
- `requirements.txt` — minimal Python deps (`fastapi`, `uvicorn`, `opencv-python`, `numpy==1.26.4`)
- `app.py` — FastAPI server + background AI loop + REST APIs
- `main.py` — “local monitor” loop (imshow) pipeline
### Packages / Modules
- `camera/stream.py` — `SmartCamera` (robust camera input abstraction)
- `ai/detector.py` — `SmartDetector` (Hailo YOLO detector)
- `ai/embedder.py` — `SmartEmbedder` (Hailo ReID embedding extractor)
- `ai/tracker.py` — `SmartTracker` (hybrid tracker: Kalman + cosine + IoU)
- `ai/face_detector.py` — `SmartFaceDetector` (CPU Haar cascade face check)
- `ai/events.py` — `handle_new_person()` (snapshot + DB insert)
- `database/db_manager.py` — `DatabaseManager` (SQLite + embeddings persistence)
- `core/logger.py` — log formatting + file/console logging helpers
- `templates/index.html` — dashboard UI (HTML/CSS/JS)
### Models / Assets
- `models/coco.txt` — COCO labels for detection classes
- AI model:
- `models/yolov8m.hef`
- `models/repvgg_a0_person_reid_512.hef`
- `models/haarcascade_frontalface_default.xml`
---
## 3) How the Whole Pipeline Works (Conceptual)
Think of SMART-CAM like an **assembly line** running on every frame:
### Stage A — Frame acquisition (`SmartCamera`)
- A background thread continuously captures frames and stores the newest one.
- The rest of the pipeline always pulls the “latest frame” when ready.
### Stage B — Object detection (`SmartDetector`)
- Runs YOLOv8 (compiled to Hailo `.hef`) to produce bounding boxes.
- Output becomes a list of detection dictionaries (each has at least `bbox`, plus metadata like `score`, etc.).
### Stage C — ReID embedding (`SmartEmbedder`)
- For each detected person box:
- Crop the person region from the frame.
- Resize to model input.
- Run embedding model on Hailo.
- Flatten and **L2-normalize** the vector (critical for cosine similarity).
- Attach it to the detection dict as `det["embedding"]`.
### Stage D — Tracking & identity assignment (`SmartTracker`)
- Maintains a **gallery** of known IDs (loaded from SQLite on startup).
- For each new detection, it tries to match it to an existing ID using:
- IoU with Kalman predicted box (spatial continuity)
- Cosine similarity with stored embedding (appearance continuity)
- If match succeeds → reuse existing ID and update memory.
- If match fails → create a new ID, save snapshot + embedding to DB.
### Stage E — “Smart face capture”
- For known IDs, the tracker checks:
- If the crop contains a **frontal face** (Haar cascade)
- OR if the person box got significantly larger than the best snapshot
- Then it overwrites snapshot with a higher-quality crop and updates `last_seen`.
### Stage F — Output
- `SmartTracker.draw()` draws bounding boxes + IDs.
- In `app.py` the annotated frames are encoded as JPEG and streamed to browser via MJPEG.
---
## 4) Detailed Module Documentation
## 4.1 `camera/stream.py` — SmartCamera
### Class: `SmartCamera`
A resilient camera abstraction that supports different input types:
- `"auto"`: try PiCamera first, then fall back to USB camera index 0
- `"picamera"`: force Raspberry Pi camera pipeline
- integer index `0,1,...` for USB cameras
- `"/dev/video0"` device path
- URLs like `rtsp://`, `http(s)://`, `udp://`
- video files (`.mp4`, `.avi`, ...)
#### Key constructor parameters
- `source`: input source selector
- `resolution`: desired capture resolution (e.g., `(1280,720)`)
- `target_fps`: software throttling target
- `reconnect_delay`: how long to wait between reconnect attempts
- `jpeg_quality`: output quality for MJPEG usage
- `flip_method`: optional transforms (`"h"`, `"v"`, `"cw"`, `"ccw"`, `"180"`)
#### Core idea
`SmartCamera` runs capture on its own thread and exposes:
- `get_frame()` → returns latest frame (numpy array) safely
- `stop()` → shuts down threads/capture cleanly
This design prevents the AI loop from blocking camera capture.
---
## 4.2 `ai/detector.py` SmartDetector (YOLO on Hailo)
### Class: `SmartDetector`
Responsible for **people detection** (default: person-only).
#### Parameters
- `confidence_threshold` (default `0.8`)
- `only_person` (default `True`): if true, it filters to COCO class `0` (person)
- `model_name` (default `yolov8m.hef`)
- `labels_name` (default `coco.txt`)
#### How it loads models
- It builds paths relative to repository root:
- `models/<model_name>`
- `models/<labels_name>`
- Uses: `from picamera2.devices import Hailo`
- Calls `Hailo(self.model_path)` and enters context (`__enter__()`).
- Reads model input shape via `get_input_shape()`.
#### Detection extraction logic (`_extract_detections`)
The Hailo output is assumed to be in the Picamera2 Hailo example format:
- `hailo_output[class_id]` is a list of detections
- each detection: `[:4] = [y0, x0, y1, x1]` in normalized coords; `[4] = score`
Then it:
- applies threshold filtering
- optionally filters class id (person-only)
- converts normalized coords to pixel coords safely
#### Output format
Returns list of dicts, typically like:
```python
{
"bbox": [x0, y0, x1, y1],
"score": float,
"class_id": int,
"label": str
}
```
(Exact keys depend on the rest of the file, but `bbox` is definitely used downstream.)
---
## 4.3 `ai/embedder.py` — SmartEmbedder (ReID embeddings on Hailo)
### Class: `SmartEmbedder`
Computes a “visual fingerprint” vector for each person detection.
#### Parameters
- `model_name` (default `repvgg_a0_person_reid_512.hef`)
#### Steps inside `process_detections(frame, detections)`
For each detection:
1. Read `det["bbox"]`
2. Clamp bbox to frame bounds (avoid crashes)
3. Crop person region
4. Resize to model input size `(model_w, model_h)`
5. Run NPU inference: `raw_results = self.hailo.run(infer_crop)`
6. Flatten results: `embedding = np.array(raw_results).flatten()`
7. Normalize: `embedding = embedding / ||embedding||`
8. Attach: `det["embedding"] = embedding`
**Why L2 normalize?**
Because cosine similarity becomes stable and comparable across frames and lighting changes.
---
## 4.4 `ai/face_detector.py` — SmartFaceDetector (Haar cascade bouncer)
### Class: `SmartFaceDetector`
A CPU-based face presence checker used as a *quality gate* for updating snapshots.
#### Model dependency
`models/haarcascade_frontalface_default.xml` must exist.
#### Method: `has_face(frame, person_bbox) -> bool`
1. Crops the person region using bbox
2. Rejects too-small crops
3. Converts to grayscale
4. Runs:
- `detectMultiScale(scaleFactor=1.1, minNeighbors=4, minSize=(60,60))`
5. Returns `True` if at least one face detected
This does **not** do recognition; it’s only “is there a frontal face right now?”.
---
## 4.5 `database/db_manager.py` — DatabaseManager (SQLite memory)
### Class: `DatabaseManager`
Stores:
- embeddings (as BLOB)
- snapshot file paths
- timestamps (`last_seen`)
- name field (default `'Unknown'`)
#### Table schema (`people`)
- `id INTEGER PRIMARY KEY AUTOINCREMENT`
- `name TEXT DEFAULT 'Unknown'`
- `embedding BLOB NOT NULL`
- `snapshot_path TEXT`
- `last_seen TIMESTAMP`
#### Method: `add_new_person(embedding, snapshot_path=None) -> int`
- converts numpy embedding to bytes
- inserts row
- returns new `id`
#### Method: `update_snapshot(person_id, embedding, snapshot_path, new_frame, bbox)`
- crops bbox from `new_frame`
- overwrites existing snapshot image on disk
- updates `last_seen` in DB
(Note: embedding parameter is currently not used to update embedding in DB in the shown code.)
#### Method: `get_all_embeddings() -> dict[int, np.ndarray]`
Loads every embedding:
- `SELECT id, embedding FROM people`
- converts each BLOB using `np.frombuffer(..., dtype=np.float32)`
This provides the initial “gallery” for the tracker at startup.
---
## 4.6 `ai/events.py` — Events: saving new identities
### Function: `handle_new_person(db, person_id, embedding, frame, bbox) -> str`
Triggered **only once** when a new identity is created.
Steps:
1. Clamp bbox and crop the detection from the frame
2. Save crop to `data/snapshots/person_<id>_<timestamp>.jpg`
3. Insert the embedding + snapshot_path into SQLite via `db.add_new_person()`
4. Returns the saved filepath (used by the tracker later)
---
## 4.7 `ai/tracker.py` — SmartTracker (Hybrid IoU + Cosine + Kalman)
### Class: `SmartTracker`
#### Purpose
Assign stable IDs to detections over time, even with occlusions and crossing paths, by combining:
- **Kalman filter predictions** (motion model)
- **IoU** (spatial overlap)
- **Cosine similarity** (appearance match)
#### Constructor
```python
SmartTracker(
db_manager,
face_detector,
similarity_threshold=0.65,
iou_threshold=0.40,
alpha=0.2,
max_lost_frames=300
)
```
- Loads database embeddings: `db.get_all_embeddings()`
- Builds `self.gallery`:
```python
{
id: {
"embedding": np.array,
"kf": None,
"last_seen": int,
"best_area": float,
"snapshot_path": str|None
}
}
```
- `next_id` starts at max existing + 1
- `colors` assigns consistent per-ID colors for drawing
---
### Tracking algorithm, step-by-step (`update(detections, raw_frame)`)
#### Phase 0 — bookkeeping
- increments `frame_count`
- creates `used_gallery_ids` to prevent assigning two detections to the same ID in one frame
#### Phase 1 — Kalman predict
For every known person with a Kalman filter:
- `kf.predict()`
This estimates where the bbox should be *before* matching.
#### Phase 2 — Matching cascade (per detection)
For each incoming detection:
- If `embedding` missing → mark `Unknown`
- Else compute a best match among all gallery IDs not used yet:
- `iou_score` between incoming bbox and **predicted** bbox (from KF)
- `cos_score` between incoming embedding and stored embedding
- Decision rule:
- If IoU passes `iou_threshold`, it can dominate and force match
- Else choose highest cosine score
#### Phase 3 — Assignment
If best score >= `similarity_threshold`:
- Assign existing `id`
- Update:
1. Kalman filter with measurement `z` based on incoming bbox geometry
2. Embedding memory with EMA:
```python
new_emb = alpha * incoming + (1-alpha) * old
new_emb = new_emb / norm(new_emb)
```
3. Update `last_seen`
Then snapshot upgrade logic:
- compute bbox area
- run face check: `face_detector.has_face(raw_frame, incoming_box)`
- if face present **or** area improved significantly:
- update best_area
- if snapshot_path exists, overwrite snapshot on disk & update `last_seen` in DB
If not matched:
- Create new ID:
- call `handle_new_person()` (saves snapshot + inserts DB record)
- initialize Kalman filter for this new bbox
- create gallery entry with snapshot_path from event
#### Phase 4 — Housekeeping (forget lost tracks)
Removes any ID not seen for more than `max_lost_frames`.
---
### Kalman Filter Design (`_init_kalman`)
State: 7D, Measurement: 4D
It uses the common bbox tracker trick with measurement as:
- center x, center y, scale (area), ratio (w/h)
Includes tuning:
- `R` measurement noise (trust in detector)
- `P` initial covariance
- `Q` process noise (how much motion can deviate)
Also includes NaN/invalid protection in `_kf_to_bbox()` by forcing:
- `s = max(1.0, s)`
- `r = max(0.1, r)`
and using `sqrt(abs(s*r))`.
---
### Drawing (`draw(frame, detections)`)
- draws rectangle + filled label background + `ID: <id>`
- uses deterministic random colors per integer ID
- gray if ID is `"Unknown"`
---
## 4.8 `app.py` — FastAPI Web Server + MJPEG stream + DB API
### Global runtime state
- `current_frame`: most recent JPEG bytes
- `current_stats`: `{"detected":0,"tracked":0,"unknown":0}`
- thread lock to protect frame buffer
### Lifespan (`lifespan(app)`)
On startup:
1. Create DB: `DatabaseManager()`
2. Create camera: `SmartCamera(source="auto", resolution=(1280,720), target_fps=30, flip_method="ccw")`
3. Load detector: `SmartDetector(model_name="yolov8m.hef")`
4. Load embedder: `SmartEmbedder(model_name="repvgg_a0_person_reid_512.hef")`
5. Load face detector: `SmartFaceDetector()`
6. Load tracker: `SmartTracker(db_manager=db, face_detector=face_detector)`
7. Start background thread: `threading.Thread(target=ai_loop, daemon=True).start()`
On shutdown:
- stop camera
- close face detector
### AI loop (`ai_loop()`)
Runs forever:
1. `frame = camera.get_frame()`
2. `detections = detector.detect(frame)`
3. `detections = embedder.process_detections(frame, detections)`
4. `tracked_objects = tracker.update(detections, frame)`
5. Update stats:
- known = IDs that are `int`
- unknown = rest
6. Draw + encode JPEG into `current_frame`
### HTTP routes
- `GET /` → renders `templates/index.html`
- `GET /video_feed` → MJPEG stream (`multipart/x-mixed-replace`)
- `GET /api/stats` → live counts
- `GET /api/records` → paginated/filterable/sortable list of DB people sightings
- `GET /api/record/{person_id}` → all sightings for a single person ID
---
## 4.9 `main.py` — Local “monitor mode” pipeline
This is the same AI pipeline but displayed using OpenCV:
- It sets:
- `DISPLAY=":0"`
- `QT_QPA_PLATFORM="xcb"`
so the window opens on the Pi’s HDMI output.
Loop:
1. `camera.get_frame()`
2. detector → embedder → tracker
3. overlay real-time FPS computation
4. `cv2.imshow(...)`
5. press `q` to quit
Clean shutdown closes camera, detector, embedder, face detector, and windows.
---
## 5) Algorithms Explained
## 5.1 YOLO Object Detection (on Hailo)
YOLO is a single-shot detector that predicts bounding boxes + classes in one forward pass.
In SMART-CAM:
- It’s used primarily to detect `person` class.
- The model is a Hailo-compiled `.hef` file.
## 5.2 Person Re-Identification (ReID)
ReID creates a vector embedding such that:
- same person across frames → vectors are close (high cosine similarity)
- different people → vectors are far
SMART-CAM normalizes vectors to make cosine similarity stable.
## 5.3 Cosine Similarity
For embeddings `a` and `b`:
```
cos(a,b) = (a·b) / (||a|| ||b||)
```
After normalization, `||a|| = ||b|| = 1`, so cosine becomes just a dot product.
## 5.4 IoU (Intersection over Union)
IoU measures overlap between two boxes:
```
IoU = area(intersection) / area(union)
```
SMART-CAM uses it to strongly favor spatial continuity when the predicted box still overlaps the new detection.
## 5.5 Kalman Filter (Constant Velocity)
A Kalman filter is a Bayesian estimator that predicts object motion.
Here it smooths the bounding box center/size over time and helps maintain stable tracking when detections jitter.
## 5.6 Hybrid Matching Cascade (Why it works)
- **IoU** handles “same position” continuity even if appearance changes slightly.
- **Cosine similarity** handles identity continuity even if objects cross or boxes shift.
- **Kalman** predicts where to look next, stabilizing IoU and handling momentary gaps.
---
## 6) Running the Project
### Install dependencies
```bash
pip install -r requirements.txt
```
### Add models to `models/`
You must add:
- `models/yolov8m.hef`
- `models/repvgg_a0_person_reid_512.hef`
- `models/haarcascade_frontalface_default.xml`
### Web dashboard mode
```bash
python app.py
```
Then open:
- `http://<PI_IP>:5000`
### Local monitor mode
```bash
python main.py
```
---
## 7) Data Outputs
- SQLite DB: `data/database.sqlite`
- Snapshots: `data/snapshots/*.jpg`
Each new person creates:
- 1 DB row with embedding + snapshot_path + last_seen
- 1 image crop file saved on disk
Snapshots may be overwritten later with a better image (face found or larger bbox).
---
## 8) Notes / Limitations
1. **Model files are required** and are not committed in the repo by default.
2. The Hailo/Picamera2 integration requires Raspberry Pi + correct system packages.
3. `DatabaseManager.get_all_embeddings()` assumes embeddings are stored as `float32` bytes.
4. The tracker removes identities after `max_lost_frames` since last seen.
5. Web streaming uses MJPEG which is simple and fast, but not as bandwidth-efficient as modern codecs.
---
> Design document analyzing how user actions feed back into ML predictions,
This document provides a complete reference for all exported APIs in the go-attention library.
This document captures important learnings and best practices discovered while building and maintaining the Papr Memory Python SDK, specifically around on-device processing and Core ML integration.
Tensor factorization is a method for decomposing tensors, which are described in [Section @sec:loading-rescal], into lower-rank approximations.