Speaking Practice Aid

A local-only web application for analyzing speaking practice recordings. It transcribes speech while preserving fillers (um, uh, like...) and detects pauses, generating a detailed report suitable for AI-assisted feedback.

Features

🎤 Browser Recording or File Upload (mp3, wav, m4a, webm)
🗣️ Local STT using faster-whisper (preserves fillers, repetitions)
⏸️ Pause Detection using Silero VAD (configurable threshold 0.4-1.2s)
📋 Timeline Transcript with [PAUSE X.XXXs] markers
⚙️ Whisper Model Selection (Tiny, Base, Small)
📝 One-click Copy for easy sharing

Prerequisites

Python 3.9+ (3.9, 3.10, 3.11, or 3.12)

# If using pyenv
pyenv install 3.11  # or 3.9, 3.10, 3.12
pyenv shell 3.11

Node.js (18+ recommended)
```
brew install node  # or use nvm
```
FFmpeg
```
brew install ffmpeg
```

Quick Start

⚠️ Prerequisites Required: Make sure you've installed Python 3.9+, Node.js, and FFmpeg (see Prerequisites above) before running these commands.

🔧 First Time Setup & Run

Fresh clone? Run this one command after installing prerequisites:

git clone https://github.com/evanshlee/speaking-practice.git
cd practice-speaking
./setup-and-run.sh

This will:

Create Python virtual environment
Install all backend dependencies
Install all frontend dependencies
Start both backend and frontend servers

Then open http://localhost:5173 in your browser! 🚀

⚡ Next Time (Already Set Up)

Dependencies already installed? Just run:

./start.sh

This starts both servers instantly.

Manual Installation & Setup (Optional)

If you prefer to install dependencies manually instead of using setup-and-run.sh:

1. Clone and navigate to the project

git clone https://github.com/evanshlee/speaking-practice.git
cd practice-speaking

2. Set up the backend

# Create virtual environment with Python 3.9+
python3 -m venv venv

# Activate and install dependencies
source venv/bin/activate
pip install -r server/requirements.txt

3. Set up the frontend

cd client
npm install
cd ..

Running the Application

After manual installation, you can use ./start.sh for quick start, or run servers manually:

🛠️ Manual Start (Advanced)

Terminal 1 - Backend:

source venv/bin/activate
cd server
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Terminal 2 - Frontend:

cd client
npm run dev

Then open http://localhost:5173 in your browser.

Usage

Select Record or Upload mode
(Optional) Choose Whisper model size:
- Tiny: Fastest, lower accuracy
- Base: Balanced (recommended)
- Small: Best accuracy, slower
(Optional) Adjust pause threshold (0.4-1.2s)
Record/upload your speaking sample
Click Transcribe
Wait for processing (varies by audio length and model)
Click Copy and paste into your preferred AI assistant for feedback

Report Format

The generated report contains:

A) SUMMARY: Date, duration (speech/silence), word count, WPM
B) TIMELINE: Timestamped transcript with [PAUSE X.XXXs] markers

Example:

=== A) SUMMARY ===
Date: 2026-01-30
Duration: 62.5s (Speech: 55.2s, Silence: 7.3s)
Words: 142 (Approx. 154 WPM)

=== B) TIMELINE ===
[00:00.000] So, um, I think the main point here is...
[00:05.234] [PAUSE 1.523s]
[00:06.757] And basically, you know, we need to consider...

Troubleshooting

Problem	Solution
`ffmpeg not found`	Install with `brew install ffmpeg`
Python version errors	Use Python 3.11 or 3.12 with pyenv
CORS errors	Ensure backend is running on port 8000
Slow first run	Whisper model downloads on first use (~150MB for base)
Port already in use	Kill existing processes or change port

Tech Stack

Frontend: Vite + React
Backend: Python FastAPI
STT: faster-whisper (local Whisper implementation)
VAD: Silero VAD

License

This project is licensed under the MIT License - see the LICENSE file for details.

Local-only processing. No data is sent to external servers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speaking Practice Aid

Features

Prerequisites

Quick Start

🔧 First Time Setup & Run

⚡ Next Time (Already Set Up)

Manual Installation & Setup (Optional)

1. Clone and navigate to the project

2. Set up the backend

3. Set up the frontend

Running the Application

🛠️ Manual Start (Advanced)

Usage

Report Format

Troubleshooting

Tech Stack

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
client		client
server		server
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup-and-run.sh		setup-and-run.sh
start.sh		start.sh

License

evanshlee/speaking-practice

Folders and files

Latest commit

History

Repository files navigation

Speaking Practice Aid

Features

Prerequisites

Quick Start

🔧 First Time Setup & Run

⚡ Next Time (Already Set Up)

Manual Installation & Setup (Optional)

1. Clone and navigate to the project

2. Set up the backend

3. Set up the frontend

Running the Application

🛠️ Manual Start (Advanced)

Usage

Report Format

Troubleshooting

Tech Stack

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages