Simple subtitle maker for Radeon GPUs on WSL.
A One-Time-Code simple toolchain for automatically generating movie subtitles. This project leverages OpenAI Whisper for transcription and Google Gemini for high-quality, concurrent translation, specifically optimized for AMD GPUs running on WSL with ROCm 7.2.0.
📖 Curious about the journey? Read DEVLOG.md to see how we navigated the "Wild West" of AMD ROCm on WSL, failed with CTranslate2, and succeeded with native PyTorch.
- Audio Extraction: Extracts optimized audio (16kHz, Mono) from video files using a portable FFmpeg build.
- Transcription: Uses
openai-whisper(accelerated by ROCm/PyTorch) to generate timestamped SRT files. - Cleaning: Automatically removes SDH tags (e.g.,
[Music],(Applause)), HTML tags, and common ASR hallucinations. - Translation: Translates subtitles into your target language (default: Chinese) using the Google Gemini API with multi-threaded concurrency for speed.
- OS: Linux (Tested on WSL with ROCm 7.2.0)
- Hardware: AMD GPU (Tested on Radeon RX 9070 XT, gfx1201)
- Python: 3.12+
- API Key: A Google Gemini API Key
git clone https://github.com/yiz-liu/Simple-Radeon-Subs.git
cd Simple-Radeon-Subspython3 -m venv venv
source venv/bin/activatetorch from PyPI.
You must install the pre-built wheels specifically built for your ROCm version (e.g., ROCm 7.2.0).
Please see Install Radeon software for WSL with ROCm and Install PyTorch for ROCm
# Example command (adjust filenames as needed)
pip install torch-2.9.1+rocm7.2.0.lw.git7e1940d4-cp312-cp312-linux_x86_64.whl
pip install triton-3.5.1+rocm7.2.0.gita272dfa8-cp312-cp312-linux_x86_64.whlVerify installation:
python3 -c "import torch; print(f'CUDA Available: {torch.cuda.is_available()}')"
# Should print: CUDA Available: Truepip install -r requirements.txtIf you have installed FFmpeg, just go with it. To avoid system conflicts (and as a control freak), I choose a static build of FFmpeg.
mkdir -p tools
# Download static build (example URL, check for latest)
wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz -O tools/ffmpeg.tar.xz
tar -xJf tools/ffmpeg.tar.xz -C tools/
mv tools/ffmpeg-*-amd64-static tools/ffmpeg
rm tools/ffmpeg.tar.xzCreate a .env file in the project root:
cp .env.template .env
# Edit .env and add your GEMINI_API_KEYRun the full pipeline (Extract -> Transcribe -> Clean -> Translate) with run.py.
# Basic usage (defaults to Chinese translation)
python run.py /path/to/video.mp4
# Specify output directory and target language
python run.py /path/to/movie.mkv --output-dir ./subs --lang "French"
# Keep temporary files (wav, raw srt) for debugging
python run.py /path/to/video.mp4 --keep-tempinput: Path to the input video or audio file.-o, --output-dir: Directory to save the final.srtfile.--lang: Target language (default: "Chinese").--src-lang: Source language of the audio (e.g., 'en', 'zh'). Auto-detects if omitted.--model: Whisper model to use (default:large-v3-turbo).--keep-temp: Don't delete intermediate files (.wav,.cleaned.srt).
- Ensure
tools/ffmpeg/ffmpegexists and is executable. - The script looks for FFmpeg in
tools/ffmpeg/relative to the project root.
- Why standard Whisper? We attempted to use
CTranslate2for faster inference, but compilation failed due to deep incompatibilities with ROCm 7.2.0 (Thrust/hipBLAS API changes). - Solution: We reverted to the official
openai-whisperpackage using the native PyTorch ROCm wheels, which works perfectly.
- If the translation count doesn't match the source lines, the script will output a warning or fill gaps. We use strict prompting ("No Bullshit", "Strict Alignment") to minimize this, but Gemini can occasionally be creative.