About – SignLanguageTranslator

Model Creation & Classification Flow

This high-level flow shows how raw captured hand landmarks become a trained model and how that model runs in the browser to classify live input.

Diagram: Offline training produces an exported JSON model consumed by the client‑side inference loop.

Overview

The application combines a hand landmark detector (MediaPipe Tasks) with a custom TensorFlow.js classification model trained on locally collected samples. It focuses on handshape-based alphabet recognition (finger spelling) and phrase exploration using mapped reference media.

Experimental: Models are prototype quality and not a full or authoritative representation of any standardized sign language.

Processing Pipeline

1. Video Capture

Accesses your camera using getUserMedia(). Frames never leave your device.

2. Hand Landmarks

MediaPipe produces 3D landmark coordinates (normalized) for each detected hand.

3. Feature Normalization

Coordinates re-rooted at the wrist, scaled, flattened into a feature vector.

4. Model Inference

Custom TF.js model outputs per-letter probabilities each frame.

5. Temporal Hold

Detected letter must stay stable above threshold for a hold duration.

6. Acceptance & Output

Accepted letters appended to transcript, optional speech synthesis triggered.

Model Details

Alphabet Classifier

Trained using locally captured landmark snapshots. Each sample is a 63–126 dimension vector depending on preprocessing. The model architecture (e.g., dense layers with dropout) is tuned for fast inference in the browser.

Filipino Experimental Variant

The Filipino model card references a parallel dataset exploring localized sign variants. Currently limited sample coverage—feedback helps identify misclassifications and guide dataset balancing.

Inference Performance

Runs fully client-side (no network latency).
WebGL acceleration where available; CPU fallback supported.
Adaptive frame sampling prevents UI blocking on slower devices.

Privacy & Data

The app does not upload your camera frames or detected landmarks. All inference happens locally inside the browser. No persistent personal data storage is performed from public pages.

No cloud inference API calls.
No tracking cookies beyond default session (if authenticated areas are used).
Model files are static assets served over HTTPS.

Limitations

Focused on single-hand finger spelling; two-hand and motion signs not fully supported yet.
Lighting variation and extreme camera angles reduce accuracy.
Not a substitute for formal instruction or standardized certification.
Localized (Filipino) experimental model has sparse training samples.

Planned Enhancements

Motion / dynamic gesture model for select phrases.
Improved per-letter confidence visualization and error correction suggestions.
Offline caching (Service Worker) for faster cold starts.
Per-session adaptive thresholding to personalize sensitivity.

Give Feedback

Spot a misclassified letter or want to contribute samples? Open an issue or send feedback via a future integrated form. Your input helps expand coverage and fairness.