BridgeMindDocs

BridgeVoice

Privacy-first voice dictation for builders. On-device Whisper transcription, universal text injection, and zero cloud dependency.

BridgeVoice is a privacy-first desktop voice dictation app. It transcribes your speech on-device using Whisper AI, then injects the text directly into whatever app you're focused on — your editor, terminal, browser, or anything else. Audio never leaves your machine.

Key Features

  • On-device transcription — Whisper AI runs locally. Your audio stays on your machine.
  • Universal text injection — Transcribed text is pasted into the currently focused app. Works with every desktop application.
  • Sub-500ms latency — From the moment you stop speaking to text appearing on screen.
  • Offline support — No internet connection required for local transcription.
  • Cloud transcription — Optional Groq-powered transcription for 99+ languages (requires BridgeMind Pro).
  • Push-to-Talk and Toggle modes — Hold a key to record, or press once to start and again to stop.
  • Custom dictionary — Automatic text replacements for technical terms (e.g., "web hook" → "WebHook").
  • Transcription history — Every transcription is saved locally with timestamps, word counts, and duration.

Installation

macOS

Download BridgeVoice from bridgemind.ai. The app is code-signed and notarized by Apple.

  • Apple Silicon (ARM64) — Metal GPU acceleration for fast transcription
  • Intel (x86_64) — CPU transcription

Windows

Download the NSIS installer from bridgemind.ai. Code-signed with Azure Trusted Signing.

Linux (Experimental)

Available as AppImage and .deb packages.

Getting Started

  1. Download and install BridgeVoice from bridgemind.ai.
  2. Grant microphone permissions when prompted.
  3. Download a Whisper model — Open Settings and choose a model size. Start with "Base" for a balance of speed and accuracy.
  4. Set your hotkey — Configure a Push-to-Talk key in Settings (e.g., Right Option).
  5. Start dictating — Hold your hotkey, speak, and release. Text appears in your focused app.

Transcription Modes

Local (Whisper)

On-device transcription powered by whisper.cpp. Audio is processed entirely on your machine.

Model Sizes

ModelSizeSpeedAccuracyBest For
Tiny75 MBFastestBasicQuick notes, commands
Base142 MBFastGoodGeneral dictation
Small466 MBModerateBetterLonger dictation
Medium1.5 GBSlowerGreatDetailed transcription
Large-v33.1 GBSlowestBestMaximum accuracy
Distil-Large-v3~1.5 GBFastGreatBest speed-to-accuracy ratio

On Apple Silicon Macs, BridgeVoice uses Metal GPU acceleration for roughly 10x faster transcription compared to CPU-only.

Anti-Hallucination

BridgeVoice includes tuning to prevent Whisper from generating false text during silence:

  • Silence detection before processing
  • Entropy threshold filtering
  • Non-speech token suppression

Cloud (Groq)

Optional cloud transcription via the BridgeMind API. Requires a BridgeMind Pro subscription.

  • 99+ languages with automatic language detection
  • Audio encoded as WAV (16kHz mono) and uploaded securely
  • Powered by Groq's Whisper Large-v3-Turbo

Switch between Local and Cloud modes in the BridgeVoice dashboard.

Recording Modes

Push-to-Talk

Hold your configured hotkey to record. Release to stop recording and trigger transcription. This is the default mode and works well for short dictation bursts.

Toggle Recording

Press your hotkey once to start recording, press again to stop. Better for longer dictation sessions where holding a key is uncomfortable.

Configure your preferred mode and hotkey in Settings → Recording.

Text Injection

After transcription, BridgeVoice injects the text into your currently focused application using a clipboard-and-paste method:

  1. Transcribed text is copied to the system clipboard
  2. A keyboard shortcut (Cmd+V on macOS, Ctrl+V on Windows) is simulated
  3. Text appears in the focused application

This approach works universally across all desktop apps — editors, terminals, browsers, chat apps, and more. Full Unicode support is included.

Widget

BridgeVoice includes a compact always-on-top widget that floats over your other windows:

StateAppearance
IdleSmall pill with BridgeVoice logo
ListeningExpanded with real-time audio visualization (7 frequency bands)
ProcessingLoading indicator while transcription runs

Double-click the widget to toggle recording. Drag it anywhere on screen.

Custom Dictionary

Create automatic text replacements for terms that Whisper frequently gets wrong:

SpokenReplaced With
"web hook""WebHook"
"next js""Next.js"
"typescript""TypeScript"
"bridge mind""BridgeMind"

Add entries in Settings → Dictionary. You can also quick-add from the transcription history.

Transcription History

Every transcription is saved locally with metadata:

  • Text — Full transcription content
  • Timestamp — When the transcription occurred
  • Word count — Number of words transcribed
  • Duration — How long the recording lasted
  • Source — Local (Whisper) or Cloud (Groq)

Statistics

The dashboard tracks your usage over time:

  • Total words transcribed
  • Total speaking time
  • Session count
  • Words per minute average

Authentication

Sign in with your BridgeMind account for Pro features:

  1. Click Sign In in the BridgeVoice dashboard
  2. Your browser opens to the BridgeMind login page
  3. Authenticate with Google OAuth
  4. BridgeVoice receives the callback via deep link (bridgevoice://auth/callback)
  5. Tokens are encrypted locally with AES-GCM

Subscription Tiers

FeatureFreePro
On-device transcriptionYesYes
All Whisper model sizesYesYes
Push-to-Talk / ToggleYesYes
Custom dictionaryYesYes
Transcription historyYesYes
Cloud transcription (Groq)Yes
99+ language supportYes
AI text polish (coming soon)Yes
Cross-device sync (coming soon)Yes

Architecture

BridgeVoice is built with Tauri 2.0:

bridgevoice/
├── src-tauri/                # Rust backend
│   ├── src/
│   │   ├── main.rs           # App entry, plugin setup
│   │   ├── audio/capture.rs  # Persistent audio stream (cpal)
│   │   ├── transcription/
│   │   │   ├── whisper.rs    # Local Whisper (whisper-rs + Metal)
│   │   │   ├── groq.rs       # Cloud API client
│   │   │   └── models.rs     # Model download manager
│   │   ├── injection/
│   │   │   ├── macos.rs      # Clipboard + CGEvent
│   │   │   └── windows.rs    # Clipboard + SendInput
│   │   ├── auth/             # OAuth, encrypted token storage
│   │   └── commands/         # Tauri command handlers
├── src/                      # React frontend
│   ├── App.tsx               # Widget pill component
│   ├── Dashboard.tsx         # Main dashboard
│   ├── components/           # UI components
│   ├── store/                # Zustand state
│   └── api/                  # HTTP client
├── package.json
└── vite.config.ts

Persistent Audio Stream

BridgeVoice initializes a single audio stream at startup that runs continuously. When you're not recording, the stream idles with near-zero overhead. When you press your hotkey, recording begins instantly (under 10ms) with no audio glitch or pop — because the stream is already active.

Platform Specifics

FeaturemacOSWindowsLinux
GPU AccelerationMetal (Apple Silicon)
Text Injectionpbcopy + CGEvent Cmd+VClipboard + SendInput Ctrl+Vxdotool
Audio InputCore Audio via cpalWASAPI via cpalALSA via cpal
Code SigningDeveloper ID + NotarizationAzure Trusted Signing

System Requirements

PlatformMinimum
macOSmacOS 11 (Big Sur) or later
WindowsWindows 10 or later
LinuxUbuntu 20.04+ or equivalent (experimental)
RAM4 GB minimum (8 GB recommended for Large models)
Disk200 MB + model size (75 MB – 3.1 GB)

On this page