SMART-CAM (Smart AI Camera) — Full Documentation — .md Directory

# SMART-CAM (Smart AI Camera) — Full Documentation Repository: `WajeehAlamoudi/Smart_AI_Camera` Docs generated on: **2026-03-11** --- ## 1) Project Overview **SMART-CAM** is an edge-first smart surveillance pipeline designed to run **fully locally** on devices like a **Raspberry Pi** (Pi 5 recommended) and optionally accelerate inference using a **Hailo NPU** through Picamera2’s Hailo integration. The system does four major things in real time: 1. **Capture frames** from a camera source (Pi CSI camera, USB webcam, RTSP, video file, etc.). 2. **Detect people** using a Hailo-compiled **YOLOv8** object detection model (`yolov8m.hef`). 3. **Generate Re-Identification embeddings** using a Hailo-compiled **RepVGG ReID** model (`repvgg_a0_person_reid_512.hef`). 4. **Track identities across frames** using a hybrid method: - **Spatial consistency** (IoU with Kalman-predicted boxes) - **Appearance matching** (cosine similarity on embeddings) 5. **Persist identities and snapshots** in an **SQLite database**, and optionally update the best snapshot when a **face is detected** (Haar Cascade “bouncer”). It supports two main run modes: - **Web dashboard mode** using **FastAPI** (`app.py`) with MJPEG streaming + database APIs. - **Local desktop window mode** using OpenCV display (`main.py`) for direct monitor output. --- ## 2) Repository Structure (Key Files) > Note: I fetched the most important files; if more files exist beyond what was returned by search, they may not be shown here. - `README.md` — high-level description + install instructions - `requirements.txt` — minimal Python deps (`fastapi`, `uvicorn`, `opencv-python`, `numpy==1.26.4`) - `app.py` — FastAPI server + background AI loop + REST APIs - `main.py` — “local monitor” loop (imshow) pipeline ### Packages / Modules - `camera/stream.py` — `SmartCamera` (robust camera input abstraction) - `ai/detector.py` — `SmartDetector` (Hailo YOLO detector) - `ai/embedder.py` — `SmartEmbedder` (Hailo ReID embedding extractor) - `ai/tracker.py` — `SmartTracker` (hybrid tracker: Kalman + cosine + IoU) - `ai/face_detector.py` — `SmartFaceDetector` (CPU Haar cascade face check) - `ai/events.py` — `handle_new_person()` (snapshot + DB insert) - `database/db_manager.py` — `DatabaseManager` (SQLite + embeddings persistence) - `core/logger.py` — log formatting + file/console logging helpers - `templates/index.html` — dashboard UI (HTML/CSS/JS) ### Models / Assets - `models/coco.txt` — COCO labels for detection classes - AI model: - `models/yolov8m.hef` - `models/repvgg_a0_person_reid_512.hef` - `models/haarcascade_frontalface_default.xml` --- ## 3) How the Whole Pipeline Works (Conceptual) Think of SMART-CAM like an **assembly line** running on every frame: ### Stage A — Frame acquisition (`SmartCamera`) - A background thread continuously captures frames and stores the newest one. - The rest of the pipeline always pulls the “latest frame” when ready. ### Stage B — Object detection (`SmartDetector`) - Runs YOLOv8 (compiled to Hailo `.hef`) to produce bounding boxes. - Output becomes a list of detection dictionaries (each has at least `bbox`, plus metadata like `score`, etc.). ### Stage C — ReID embedding (`SmartEmbedder`) - For each detected person box: - Crop the person region from the frame. - Resize to model input. - Run embedding model on Hailo. - Flatten and **L2-normalize** the vector (critical for cosine similarity). - Attach it to the detection dict as `det["embedding"]`. ### Stage D — Tracking & identity assignment (`SmartTracker`) - Maintains a **gallery** of known IDs (loaded from SQLite on startup). - For each new detection, it tries to match it to an existing ID using: - IoU with Kalman predicted box (spatial continuity) - Cosine similarity with stored embedding (appearance continuity) - If match succeeds → reuse existing ID and update memory. - If match fails → create a new ID, save snapshot + embedding to DB. ### Stage E — “Smart face capture” - For known IDs, the tracker checks: - If the crop contains a **frontal face** (Haar cascade) - OR if the person box got significantly larger than the best snapshot - Then it overwrites snapshot with a higher-quality crop and updates `last_seen`. ### Stage F — Output - `SmartTracker.draw()` draws bounding boxes + IDs. - In `app.py` the annotated frames are encoded as JPEG and streamed to browser via MJPEG. --- ## 4) Detailed Module Documentation ## 4.1 `camera/stream.py` — SmartCamera ### Class: `SmartCamera` A resilient camera abstraction that supports different input types: - `"auto"`: try PiCamera first, then fall back to USB camera index 0 - `"picamera"`: force Raspberry Pi camera pipeline - integer index `0,1,...` for USB cameras - `"/dev/video0"` device path - URLs like `rtsp://`, `http(s)://`, `udp://` - video files (`.mp4`, `.avi`, ...) #### Key constructor parameters - `source`: input source selector - `resolution`: desired capture resolution (e.g., `(1280,720)`) - `target_fps`: software throttling target - `reconnect_delay`: how long to wait between reconnect attempts - `jpeg_quality`: output quality for MJPEG usage - `flip_method`: optional transforms (`"h"`, `"v"`, `"cw"`, `"ccw"`, `"180"`) #### Core idea `SmartCamera` runs capture on its own thread and exposes: - `get_frame()` → returns latest frame (numpy array) safely - `stop()` → shuts down threads/capture cleanly This design prevents the AI loop from blocking camera capture. --- ## 4.2 `ai/detector.py` SmartDetector (YOLO on Hailo) ### Class: `SmartDetector` Responsible for **people detection** (default: person-only). #### Parameters - `confidence_threshold` (default `0.8`) - `only_person` (default `True`): if true, it filters to COCO class `0` (person) - `model_name` (default `yolov8m.hef`) - `labels_name` (default `coco.txt`) #### How it loads models - It builds paths relative to repository root: - `models/<model_name>` - `models/<labels_name>` - Uses: `from picamera2.devices import Hailo` - Calls `Hailo(self.model_path)` and enters context (`__enter__()`). - Reads model input shape via `get_input_shape()`. #### Detection extraction logic (`_extract_detections`) The Hailo output is assumed to be in the Picamera2 Hailo example format: - `hailo_output[class_id]` is a list of detections - each detection: `[:4] = [y0, x0, y1, x1]` in normalized coords; `[4] = score` Then it: - applies threshold filtering - optionally filters class id (person-only) - converts normalized coords to pixel coords safely #### Output format Returns list of dicts, typically like: ```python { "bbox": [x0, y0, x1, y1], "score": float, "class_id": int, "label": str } ``` (Exact keys depend on the rest of the file, but `bbox` is definitely used downstream.) --- ## 4.3 `ai/embedder.py` — SmartEmbedder (ReID embeddings on Hailo) ### Class: `SmartEmbedder` Computes a “visual fingerprint” vector for each person detection. #### Parameters - `model_name` (default `repvgg_a0_person_reid_512.hef`) #### Steps inside `process_detections(frame, detections)` For each detection: 1. Read `det["bbox"]` 2. Clamp bbox to frame bounds (avoid crashes) 3. Crop person region 4. Resize to model input size `(model_w, model_h)` 5. Run NPU inference: `raw_results = self.hailo.run(infer_crop)` 6. Flatten results: `embedding = np.array(raw_results).flatten()` 7. Normalize: `embedding = embedding / ||embedding||` 8. Attach: `det["embedding"] = embedding` **Why L2 normalize?** Because cosine similarity becomes stable and comparable across frames and lighting changes. --- ## 4.4 `ai/face_detector.py` — SmartFaceDetector (Haar cascade bouncer) ### Class: `SmartFaceDetector` A CPU-based face presence checker used as a *quality gate* for updating snapshots. #### Model dependency `models/haarcascade_frontalface_default.xml` must exist. #### Method: `has_face(frame, person_bbox) -> bool` 1. Crops the person region using bbox 2. Rejects too-small crops 3. Converts to grayscale 4. Runs: - `detectMultiScale(scaleFactor=1.1, minNeighbors=4, minSize=(60,60))` 5. Returns `True` if at least one face detected This does **not** do recognition; it’s only “is there a frontal face right now?”. --- ## 4.5 `database/db_manager.py` — DatabaseManager (SQLite memory) ### Class: `DatabaseManager` Stores: - embeddings (as BLOB) - snapshot file paths - timestamps (`last_seen`) - name field (default `'Unknown'`) #### Table schema (`people`) - `id INTEGER PRIMARY KEY AUTOINCREMENT` - `name TEXT DEFAULT 'Unknown'` - `embedding BLOB NOT NULL` - `snapshot_path TEXT` - `last_seen TIMESTAMP` #### Method: `add_new_person(embedding, snapshot_path=None) -> int` - converts numpy embedding to bytes - inserts row - returns new `id` #### Method: `update_snapshot(person_id, embedding, snapshot_path, new_frame, bbox)` - crops bbox from `new_frame` - overwrites existing snapshot image on disk - updates `last_seen` in DB (Note: embedding parameter is currently not used to update embedding in DB in the shown code.) #### Method: `get_all_embeddings() -> dict[int, np.ndarray]` Loads every embedding: - `SELECT id, embedding FROM people` - converts each BLOB using `np.frombuffer(..., dtype=np.float32)` This provides the initial “gallery” for the tracker at startup. --- ## 4.6 `ai/events.py` — Events: saving new identities ### Function: `handle_new_person(db, person_id, embedding, frame, bbox) -> str` Triggered **only once** when a new identity is created. Steps: 1. Clamp bbox and crop the detection from the frame 2. Save crop to `data/snapshots/person_<id>_<timestamp>.jpg` 3. Insert the embedding + snapshot_path into SQLite via `db.add_new_person()` 4. Returns the saved filepath (used by the tracker later) --- ## 4.7 `ai/tracker.py` — SmartTracker (Hybrid IoU + Cosine + Kalman) ### Class: `SmartTracker` #### Purpose Assign stable IDs to detections over time, even with occlusions and crossing paths, by combining: - **Kalman filter predictions** (motion model) - **IoU** (spatial overlap) - **Cosine similarity** (appearance match) #### Constructor ```python SmartTracker( db_manager, face_detector, similarity_threshold=0.65, iou_threshold=0.40, alpha=0.2, max_lost_frames=300 ) ``` - Loads database embeddings: `db.get_all_embeddings()` - Builds `self.gallery`: ```python { id: { "embedding": np.array, "kf": None, "last_seen": int, "best_area": float, "snapshot_path": str|None } } ``` - `next_id` starts at max existing + 1 - `colors` assigns consistent per-ID colors for drawing --- ### Tracking algorithm, step-by-step (`update(detections, raw_frame)`) #### Phase 0 — bookkeeping - increments `frame_count` - creates `used_gallery_ids` to prevent assigning two detections to the same ID in one frame #### Phase 1 — Kalman predict For every known person with a Kalman filter: - `kf.predict()` This estimates where the bbox should be *before* matching. #### Phase 2 — Matching cascade (per detection) For each incoming detection: - If `embedding` missing → mark `Unknown` - Else compute a best match among all gallery IDs not used yet: - `iou_score` between incoming bbox and **predicted** bbox (from KF) - `cos_score` between incoming embedding and stored embedding - Decision rule: - If IoU passes `iou_threshold`, it can dominate and force match - Else choose highest cosine score #### Phase 3 — Assignment If best score >= `similarity_threshold`: - Assign existing `id` - Update: 1. Kalman filter with measurement `z` based on incoming bbox geometry 2. Embedding memory with EMA: ```python new_emb = alpha * incoming + (1-alpha) * old new_emb = new_emb / norm(new_emb) ``` 3. Update `last_seen` Then snapshot upgrade logic: - compute bbox area - run face check: `face_detector.has_face(raw_frame, incoming_box)` - if face present **or** area improved significantly: - update best_area - if snapshot_path exists, overwrite snapshot on disk & update `last_seen` in DB If not matched: - Create new ID: - call `handle_new_person()` (saves snapshot + inserts DB record) - initialize Kalman filter for this new bbox - create gallery entry with snapshot_path from event #### Phase 4 — Housekeeping (forget lost tracks) Removes any ID not seen for more than `max_lost_frames`. --- ### Kalman Filter Design (`_init_kalman`) State: 7D, Measurement: 4D It uses the common bbox tracker trick with measurement as: - center x, center y, scale (area), ratio (w/h) Includes tuning: - `R` measurement noise (trust in detector) - `P` initial covariance - `Q` process noise (how much motion can deviate) Also includes NaN/invalid protection in `_kf_to_bbox()` by forcing: - `s = max(1.0, s)` - `r = max(0.1, r)` and using `sqrt(abs(s*r))`. --- ### Drawing (`draw(frame, detections)`) - draws rectangle + filled label background + `ID: <id>` - uses deterministic random colors per integer ID - gray if ID is `"Unknown"` --- ## 4.8 `app.py` — FastAPI Web Server + MJPEG stream + DB API ### Global runtime state - `current_frame`: most recent JPEG bytes - `current_stats`: `{"detected":0,"tracked":0,"unknown":0}` - thread lock to protect frame buffer ### Lifespan (`lifespan(app)`) On startup: 1. Create DB: `DatabaseManager()` 2. Create camera: `SmartCamera(source="auto", resolution=(1280,720), target_fps=30, flip_method="ccw")` 3. Load detector: `SmartDetector(model_name="yolov8m.hef")` 4. Load embedder: `SmartEmbedder(model_name="repvgg_a0_person_reid_512.hef")` 5. Load face detector: `SmartFaceDetector()` 6. Load tracker: `SmartTracker(db_manager=db, face_detector=face_detector)` 7. Start background thread: `threading.Thread(target=ai_loop, daemon=True).start()` On shutdown: - stop camera - close face detector ### AI loop (`ai_loop()`) Runs forever: 1. `frame = camera.get_frame()` 2. `detections = detector.detect(frame)` 3. `detections = embedder.process_detections(frame, detections)` 4. `tracked_objects = tracker.update(detections, frame)` 5. Update stats: - known = IDs that are `int` - unknown = rest 6. Draw + encode JPEG into `current_frame` ### HTTP routes - `GET /` → renders `templates/index.html` - `GET /video_feed` → MJPEG stream (`multipart/x-mixed-replace`) - `GET /api/stats` → live counts - `GET /api/records` → paginated/filterable/sortable list of DB people sightings - `GET /api/record/{person_id}` → all sightings for a single person ID --- ## 4.9 `main.py` — Local “monitor mode” pipeline This is the same AI pipeline but displayed using OpenCV: - It sets: - `DISPLAY=":0"` - `QT_QPA_PLATFORM="xcb"` so the window opens on the Pi’s HDMI output. Loop: 1. `camera.get_frame()` 2. detector → embedder → tracker 3. overlay real-time FPS computation 4. `cv2.imshow(...)` 5. press `q` to quit Clean shutdown closes camera, detector, embedder, face detector, and windows. --- ## 5) Algorithms Explained ## 5.1 YOLO Object Detection (on Hailo) YOLO is a single-shot detector that predicts bounding boxes + classes in one forward pass. In SMART-CAM: - It’s used primarily to detect `person` class. - The model is a Hailo-compiled `.hef` file. ## 5.2 Person Re-Identification (ReID) ReID creates a vector embedding such that: - same person across frames → vectors are close (high cosine similarity) - different people → vectors are far SMART-CAM normalizes vectors to make cosine similarity stable. ## 5.3 Cosine Similarity For embeddings `a` and `b`: ``` cos(a,b) = (a·b) / (||a|| ||b||) ``` After normalization, `||a|| = ||b|| = 1`, so cosine becomes just a dot product. ## 5.4 IoU (Intersection over Union) IoU measures overlap between two boxes: ``` IoU = area(intersection) / area(union) ``` SMART-CAM uses it to strongly favor spatial continuity when the predicted box still overlaps the new detection. ## 5.5 Kalman Filter (Constant Velocity) A Kalman filter is a Bayesian estimator that predicts object motion. Here it smooths the bounding box center/size over time and helps maintain stable tracking when detections jitter. ## 5.6 Hybrid Matching Cascade (Why it works) - **IoU** handles “same position” continuity even if appearance changes slightly. - **Cosine similarity** handles identity continuity even if objects cross or boxes shift. - **Kalman** predicts where to look next, stabilizing IoU and handling momentary gaps. --- ## 6) Running the Project ### Install dependencies ```bash pip install -r requirements.txt ``` ### Add models to `models/` You must add: - `models/yolov8m.hef` - `models/repvgg_a0_person_reid_512.hef` - `models/haarcascade_frontalface_default.xml` ### Web dashboard mode ```bash python app.py ``` Then open: - `http://<PI_IP>:5000` ### Local monitor mode ```bash python main.py ``` --- ## 7) Data Outputs - SQLite DB: `data/database.sqlite` - Snapshots: `data/snapshots/*.jpg` Each new person creates: - 1 DB row with embedding + snapshot_path + last_seen - 1 image crop file saved on disk Snapshots may be overwritten later with a better image (face found or larger bbox). --- ## 8) Notes / Limitations 1. **Model files are required** and are not committed in the repo by default. 2. The Hailo/Picamera2 integration requires Raspberry Pi + correct system packages. 3. `DatabaseManager.get_all_embeddings()` assumes embeddings are stored as `float32` bytes. 4. The tracker removes identities after `max_lost_frames` since last seen. 5. Web streaming uses MJPEG which is simple and fast, but not as bandwidth-efficient as modern codecs. ---

SMART-CAM (Smart AI Camera) — Full Documentation

Related Documents

ML Feedback Loop Analysis

Go-Attention API Documentation

Agent Learnings - Papr Memory Python SDK

tensor_factorization