Voxsight
✨ AI-Powered View on Chrome Web StoreChrome will indicate if you already have this installed.
Overview
VoxSight is a Chrome extension that transforms web browsing into a voice-driven experience. Speak natural commands like "Click the search button" or "Describe this page", and VoxSight understands your intent, analyzes the page visually, and executes precise actions.
How it works:
1. Open the VoxSight side panel (Alt+V)
2. Hold the mic button or press Space to speak
3. VoxSight captures a screenshot and sends it to Gemini's multimodal vision model
4. Actions are executed directly on the page with visual highlighting
5. Results are verified with a follow-up screenshot
Key Features:
- Voice commands in Chinese and English with automatic language detection
- Works on any website -- no site-specific setup needed
- High-risk action confirmation for safety (submit, pay, delete)
- Visual highlight overlay showing exactly where actions will occur
- Continuous conversation with multi-turn context
Accessibility:
- WCAG 2.1 AA compliant
- High contrast mode for low vision users
- Adjustable font sizes (normal / large / extra-large)
- Full keyboard navigation (Space to speak, Escape to cancel, Alt+D to describe page)
- Bilingual support (Chinese / English / Auto-detect)
Technical Details:
- Built with Chrome Manifest V3
- Powered by Gemini Live API with bidirectional streaming
- Screenshot-based analysis works universally across all websites
- Backend hosted on Google Cloud Run with WebSocket streaming
Privacy:
- No browsing history collected
- No passwords or personal data stored
- Screenshots are processed in memory only, never saved to disk
- Voice recognition runs locally in your browser via Web Speech API
Tags
Privacy Practices
🔐 Security Analysis
This extension hasn't been security-scanned yet.