BridgeVoice
Privacy-first voice dictation for builders. On-device Whisper transcription, universal text injection, and zero cloud dependency.
BridgeVoice is a privacy-first desktop voice dictation app. It transcribes your speech on-device using Whisper AI, then injects the text directly into whatever app you're focused on — your editor, terminal, browser, or anything else. Audio never leaves your machine.
Key Features
- On-device transcription — Whisper AI runs locally. Your audio stays on your machine.
- Universal text injection — Transcribed text is pasted into the currently focused app. Works with every desktop application.
- Sub-500ms latency — From the moment you stop speaking to text appearing on screen.
- Offline support — No internet connection required for local transcription.
- Cloud transcription — Optional Groq-powered transcription for 99+ languages (requires BridgeMind Pro).
- Push-to-Talk and Toggle modes — Hold a key to record, or press once to start and again to stop.
- Custom dictionary — Automatic text replacements for technical terms (e.g., "web hook" → "WebHook").
- Transcription history — Every transcription is saved locally with timestamps, word counts, and duration.
Installation
macOS
Download BridgeVoice from bridgemind.ai. The app is code-signed and notarized by Apple.
- Apple Silicon (ARM64) — Metal GPU acceleration for fast transcription
- Intel (x86_64) — CPU transcription
Windows
Download the NSIS installer from bridgemind.ai. Code-signed with Azure Trusted Signing.
Linux (Experimental)
Available as AppImage and .deb packages.
Getting Started
- Download and install BridgeVoice from bridgemind.ai.
- Grant microphone permissions when prompted.
- Download a Whisper model — Open Settings and choose a model size. Start with "Base" for a balance of speed and accuracy.
- Set your hotkey — Configure a Push-to-Talk key in Settings (e.g., Right Option).
- Start dictating — Hold your hotkey, speak, and release. Text appears in your focused app.
Transcription Modes
Local (Whisper)
On-device transcription powered by whisper.cpp. Audio is processed entirely on your machine.
Model Sizes
| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
| Tiny | 75 MB | Fastest | Basic | Quick notes, commands |
| Base | 142 MB | Fast | Good | General dictation |
| Small | 466 MB | Moderate | Better | Longer dictation |
| Medium | 1.5 GB | Slower | Great | Detailed transcription |
| Large-v3 | 3.1 GB | Slowest | Best | Maximum accuracy |
| Distil-Large-v3 | ~1.5 GB | Fast | Great | Best speed-to-accuracy ratio |
On Apple Silicon Macs, BridgeVoice uses Metal GPU acceleration for roughly 10x faster transcription compared to CPU-only.
Anti-Hallucination
BridgeVoice includes tuning to prevent Whisper from generating false text during silence:
- Silence detection before processing
- Entropy threshold filtering
- Non-speech token suppression
Cloud (Groq)
Optional cloud transcription via the BridgeMind API. Requires a BridgeMind Pro subscription.
- 99+ languages with automatic language detection
- Audio encoded as WAV (16kHz mono) and uploaded securely
- Powered by Groq's Whisper Large-v3-Turbo
Switch between Local and Cloud modes in the BridgeVoice dashboard.
Recording Modes
Push-to-Talk
Hold your configured hotkey to record. Release to stop recording and trigger transcription. This is the default mode and works well for short dictation bursts.
Toggle Recording
Press your hotkey once to start recording, press again to stop. Better for longer dictation sessions where holding a key is uncomfortable.
Configure your preferred mode and hotkey in Settings → Recording.
Text Injection
After transcription, BridgeVoice injects the text into your currently focused application using a clipboard-and-paste method:
- Transcribed text is copied to the system clipboard
- A keyboard shortcut (
Cmd+Von macOS,Ctrl+Von Windows) is simulated - Text appears in the focused application
This approach works universally across all desktop apps — editors, terminals, browsers, chat apps, and more. Full Unicode support is included.
Widget
BridgeVoice includes a compact always-on-top widget that floats over your other windows:
| State | Appearance |
|---|---|
| Idle | Small pill with BridgeVoice logo |
| Listening | Expanded with real-time audio visualization (7 frequency bands) |
| Processing | Loading indicator while transcription runs |
Double-click the widget to toggle recording. Drag it anywhere on screen.
Custom Dictionary
Create automatic text replacements for terms that Whisper frequently gets wrong:
| Spoken | Replaced With |
|---|---|
| "web hook" | "WebHook" |
| "next js" | "Next.js" |
| "typescript" | "TypeScript" |
| "bridge mind" | "BridgeMind" |
Add entries in Settings → Dictionary. You can also quick-add from the transcription history.
Transcription History
Every transcription is saved locally with metadata:
- Text — Full transcription content
- Timestamp — When the transcription occurred
- Word count — Number of words transcribed
- Duration — How long the recording lasted
- Source — Local (Whisper) or Cloud (Groq)
Statistics
The dashboard tracks your usage over time:
- Total words transcribed
- Total speaking time
- Session count
- Words per minute average
Authentication
Sign in with your BridgeMind account for Pro features:
- Click Sign In in the BridgeVoice dashboard
- Your browser opens to the BridgeMind login page
- Authenticate with Google OAuth
- BridgeVoice receives the callback via deep link (
bridgevoice://auth/callback) - Tokens are encrypted locally with AES-GCM
Subscription Tiers
| Feature | Free | Pro |
|---|---|---|
| On-device transcription | Yes | Yes |
| All Whisper model sizes | Yes | Yes |
| Push-to-Talk / Toggle | Yes | Yes |
| Custom dictionary | Yes | Yes |
| Transcription history | Yes | Yes |
| Cloud transcription (Groq) | — | Yes |
| 99+ language support | — | Yes |
| AI text polish (coming soon) | — | Yes |
| Cross-device sync (coming soon) | — | Yes |
Architecture
BridgeVoice is built with Tauri 2.0:
bridgevoice/
├── src-tauri/ # Rust backend
│ ├── src/
│ │ ├── main.rs # App entry, plugin setup
│ │ ├── audio/capture.rs # Persistent audio stream (cpal)
│ │ ├── transcription/
│ │ │ ├── whisper.rs # Local Whisper (whisper-rs + Metal)
│ │ │ ├── groq.rs # Cloud API client
│ │ │ └── models.rs # Model download manager
│ │ ├── injection/
│ │ │ ├── macos.rs # Clipboard + CGEvent
│ │ │ └── windows.rs # Clipboard + SendInput
│ │ ├── auth/ # OAuth, encrypted token storage
│ │ └── commands/ # Tauri command handlers
├── src/ # React frontend
│ ├── App.tsx # Widget pill component
│ ├── Dashboard.tsx # Main dashboard
│ ├── components/ # UI components
│ ├── store/ # Zustand state
│ └── api/ # HTTP client
├── package.json
└── vite.config.tsPersistent Audio Stream
BridgeVoice initializes a single audio stream at startup that runs continuously. When you're not recording, the stream idles with near-zero overhead. When you press your hotkey, recording begins instantly (under 10ms) with no audio glitch or pop — because the stream is already active.
Platform Specifics
| Feature | macOS | Windows | Linux |
|---|---|---|---|
| GPU Acceleration | Metal (Apple Silicon) | — | — |
| Text Injection | pbcopy + CGEvent Cmd+V | Clipboard + SendInput Ctrl+V | xdotool |
| Audio Input | Core Audio via cpal | WASAPI via cpal | ALSA via cpal |
| Code Signing | Developer ID + Notarization | Azure Trusted Signing | — |
System Requirements
| Platform | Minimum |
|---|---|
| macOS | macOS 11 (Big Sur) or later |
| Windows | Windows 10 or later |
| Linux | Ubuntu 20.04+ or equivalent (experimental) |
| RAM | 4 GB minimum (8 GB recommended for Large models) |
| Disk | 200 MB + model size (75 MB – 3.1 GB) |
BridgeSpace
Your desktop command center for shipping. Multi-pane terminals, integrated code editor, and AI agent workflows — all in one native app.
BridgeMCP
Give your agents superpowers. A Model Context Protocol server that connects any AI coding agent to the BridgeMind ecosystem through a standardized, open protocol.