Model Creation & Classification Flow
This high-level flow shows how raw captured hand landmarks become a trained model and how that model runs in the browser to classify live input.
Overview
The application combines a hand landmark detector (MediaPipe Tasks) with a custom TensorFlow.js classification model trained on locally collected samples. It focuses on handshape-based alphabet recognition (finger spelling) and phrase exploration using mapped reference media.
Experimental: Models are prototype quality and not a full or authoritative representation of any standardized sign language.
Processing Pipeline
1. Video Capture
Accesses your camera using getUserMedia()
. Frames never leave your device.
2. Hand Landmarks
MediaPipe produces 3D landmark coordinates (normalized) for each detected hand.
3. Feature Normalization
Coordinates re-rooted at the wrist, scaled, flattened into a feature vector.
4. Model Inference
Custom TF.js model outputs per-letter probabilities each frame.
5. Temporal Hold
Detected letter must stay stable above threshold for a hold duration.
6. Acceptance & Output
Accepted letters appended to transcript, optional speech synthesis triggered.
Model Details
Alphabet Classifier
Trained using locally captured landmark snapshots. Each sample is a 63–126 dimension vector depending on preprocessing. The model architecture (e.g., dense layers with dropout) is tuned for fast inference in the browser.
Filipino Experimental Variant
The Filipino model card references a parallel dataset exploring localized sign variants. Currently limited sample coverage—feedback helps identify misclassifications and guide dataset balancing.
Inference Performance
- Runs fully client-side (no network latency).
- WebGL acceleration where available; CPU fallback supported.
- Adaptive frame sampling prevents UI blocking on slower devices.
Privacy & Data
The app does not upload your camera frames or detected landmarks. All inference happens locally inside the browser. No persistent personal data storage is performed from public pages.
- No cloud inference API calls.
- No tracking cookies beyond default session (if authenticated areas are used).
- Model files are static assets served over HTTPS.
Limitations
- Focused on single-hand finger spelling; two-hand and motion signs not fully supported yet.
- Lighting variation and extreme camera angles reduce accuracy.
- Not a substitute for formal instruction or standardized certification.
- Localized (Filipino) experimental model has sparse training samples.
Planned Enhancements
- Motion / dynamic gesture model for select phrases.
- Improved per-letter confidence visualization and error correction suggestions.
- Offline caching (Service Worker) for faster cold starts.
- Per-session adaptive thresholding to personalize sensitivity.
Give Feedback
Spot a misclassified letter or want to contribute samples? Open an issue or send feedback via a future integrated form. Your input helps expand coverage and fairness.