Model Creation & Classification Flow
This high-level flow shows how raw captured hand landmarks become a trained model and how that model runs in the browser to classify live input.
Overview
The application combines a hand landmark detector (MediaPipe Tasks) with a custom TensorFlow.js classification model trained on locally collected samples. It focuses on handshape-based alphabet recognition (finger spelling) and phrase exploration using mapped reference media.
Experimental: Models are prototype quality and not a full or authoritative representation of any standardized sign language.
Processing Pipeline
1. Video Capture
Accesses your camera using getUserMedia(). Frames never leave your device.
2. Hand Landmarks
MediaPipe produces 3D landmark coordinates (normalized) for each detected hand.
3. Feature Normalization
Coordinates re-rooted at the wrist, scaled, flattened into a feature vector.
4. Model Inference
Custom TF.js model outputs per-letter probabilities each frame.
5. Temporal Hold
Detected letter must stay stable above threshold for a hold duration.
6. Acceptance & Output
Accepted letters appended to transcript, optional speech synthesis triggered.
Model Details
Alphabet Classifier
Trained using locally captured landmark snapshots. Each sample is a 63–126 dimension vector depending on preprocessing. The model architecture (e.g., dense layers with dropout) is tuned for fast inference in the browser.
Filipino Experimental Variant
The Filipino model card references a parallel dataset exploring localized sign variants. Currently limited sample coverage—feedback helps identify misclassifications and guide dataset balancing.
Inference Performance
- Runs fully client-side (no network latency).
- WebGL acceleration where available; CPU fallback supported.
- Adaptive frame sampling prevents UI blocking on slower devices.
Privacy & Data
The app does not upload your camera frames or detected landmarks. All inference happens locally inside the browser. No persistent personal data storage is performed from public pages.
- No cloud inference API calls.
- No tracking cookies beyond default session (if authenticated areas are used).
- Model files are static assets served over HTTPS.
Limitations
- Focused on single-hand finger spelling; two-hand and motion signs not fully supported yet.
- Lighting variation and extreme camera angles reduce accuracy.
- Not a substitute for formal instruction or standardized certification.
- Localized (Filipino) experimental model has sparse training samples.
Planned Enhancements
- Motion / dynamic gesture model for select phrases.
- Improved per-letter confidence visualization and error correction suggestions.
- Offline caching (Service Worker) for faster cold starts.
- Per-session adaptive thresholding to personalize sensitivity.
Give Feedback
Spot a misclassified letter or want to contribute samples? Open an issue or send feedback via a future integrated form. Your input helps expand coverage and fairness.