A local-only web application for analyzing speaking practice recordings. It transcribes speech while preserving fillers (um, uh, like...) and detects pauses, generating a detailed report suitable for AI-assisted feedback.
- 🎤 Browser Recording or File Upload (mp3, wav, m4a, webm)
- 🗣️ Local STT using faster-whisper (preserves fillers, repetitions)
- ⏸️ Pause Detection using Silero VAD (configurable threshold 0.4-1.2s)
- 📋 Timeline Transcript with
[PAUSE X.XXXs]markers - ⚙️ Whisper Model Selection (Tiny, Base, Small)
- 📝 One-click Copy for easy sharing
-
Python 3.9+ (3.9, 3.10, 3.11, or 3.12)
# If using pyenv pyenv install 3.11 # or 3.9, 3.10, 3.12 pyenv shell 3.11
-
Node.js (18+ recommended)
brew install node # or use nvm -
FFmpeg
brew install ffmpeg
⚠️ Prerequisites Required: Make sure you've installed Python 3.9+, Node.js, and FFmpeg (see Prerequisites above) before running these commands.
Fresh clone? Run this one command after installing prerequisites:
git clone https://github.com/evanshlee/speaking-practice.git
cd practice-speaking
./setup-and-run.shThis will:
- Create Python virtual environment
- Install all backend dependencies
- Install all frontend dependencies
- Start both backend and frontend servers
Then open http://localhost:5173 in your browser! 🚀
Dependencies already installed? Just run:
./start.shThis starts both servers instantly.
If you prefer to install dependencies manually instead of using setup-and-run.sh:
git clone https://github.com/evanshlee/speaking-practice.git
cd practice-speaking# Create virtual environment with Python 3.9+
python3 -m venv venv
# Activate and install dependencies
source venv/bin/activate
pip install -r server/requirements.txtcd client
npm install
cd ..After manual installation, you can use ./start.sh for quick start, or run servers manually:
Terminal 1 - Backend:
source venv/bin/activate
cd server
uvicorn main:app --reload --host 0.0.0.0 --port 8000Terminal 2 - Frontend:
cd client
npm run devThen open http://localhost:5173 in your browser.
- Select Record or Upload mode
- (Optional) Choose Whisper model size:
- Tiny: Fastest, lower accuracy
- Base: Balanced (recommended)
- Small: Best accuracy, slower
- (Optional) Adjust pause threshold (0.4-1.2s)
- Record/upload your speaking sample
- Click Transcribe
- Wait for processing (varies by audio length and model)
- Click Copy and paste into your preferred AI assistant for feedback
The generated report contains:
- A) SUMMARY: Date, duration (speech/silence), word count, WPM
- B) TIMELINE: Timestamped transcript with
[PAUSE X.XXXs]markers
Example:
=== A) SUMMARY ===
Date: 2026-01-30
Duration: 62.5s (Speech: 55.2s, Silence: 7.3s)
Words: 142 (Approx. 154 WPM)
=== B) TIMELINE ===
[00:00.000] So, um, I think the main point here is...
[00:05.234] [PAUSE 1.523s]
[00:06.757] And basically, you know, we need to consider...
| Problem | Solution |
|---|---|
ffmpeg not found |
Install with brew install ffmpeg |
| Python version errors | Use Python 3.11 or 3.12 with pyenv |
| CORS errors | Ensure backend is running on port 8000 |
| Slow first run | Whisper model downloads on first use (~150MB for base) |
| Port already in use | Kill existing processes or change port |
- Frontend: Vite + React
- Backend: Python FastAPI
- STT: faster-whisper (local Whisper implementation)
- VAD: Silero VAD
This project is licensed under the MIT License - see the LICENSE file for details.
Local-only processing. No data is sent to external servers.