An intelligent voice-controlled DAW controller that emulates the PreSonus FaderPort 16, combining AI-powered speech recognition with MIDI control for seamless music production workflows.
-
Dual Mode Operation
- Music Theory Chat Mode - AI-powered music theory discussion and assistance
- DAW Control Mode - Voice commands for transport, tracks, and faders
-
Voice Control
- Natural language processing for intuitive commands
- Google Speech Recognition with PocketSphinx offline fallback
- iOS-compatible audio recording via Gradio interface
-
16-Track MIDI Control
- Virtual faders with real-time MIDI output
- Individual track mute/solo controls
- Transport controls (play, stop, record)
- Session state persistence
-
Cross-Platform Interface
- Web-based Gradio UI accessible from PC and iPad
- Real-time visual feedback for all controls
- Responsive design for mobile devices
- Python 3.10 or higher
- Windows 11 (for LoopMIDI support)
- DAW software (Ableton Live, Logic Pro, etc.)
- Microphone access
- Clone the repository:
git clone https://github.com/yourusername/AudioCommandController.git
cd AudioCommandController- Create and activate virtual environment:
# Windows
python -m venv venv
venv\Scripts\activate
# macOS/Linux
python -m venv venv
source venv/bin/activate- Install dependencies:
pip install -r requirements.txt
# Windows-specific PyAudio installation
pip install pipwin
pipwin install pyaudio- Configure environment (optional):
cp .env.example .env
# Edit .env with your preferencespython app.pyAccess the interface at:
- Local: http://localhost:7860
- Network: http://[your-ip]:7860 (for iPad/mobile access)
Transport Controls:
- "play" - Start playback
- "stop" - Stop playback
- "record" - Enable recording
Track Controls:
- "solo track 3" - Solo track 3, unsolo all others
- "mute track 5" - Mute track 5
- "unmute track 2" - Unmute track 2
Fader Controls:
- "set fader 1 to 75" - Set track 1 volume to 75%
- "fader 3 to 50" - Set track 3 volume to 50%
Switch to "Music Theory Chat" mode and ask questions like:
- "What's a good chord progression for jazz?"
- "Explain the circle of fifths"
- "How do I modulate from C major to A minor?"
- "What scales work over a G7 chord?"
See SETUP.md for detailed hardware configuration instructions including:
- LoopMIDI virtual MIDI port setup
- FaderPort 16 HUI mode configuration
- DAW MIDI routing (Ableton Live, Logic Pro, etc.)
- Windows Firewall and network access
- Microphone permissions
┌─────────────────┐
│ Voice Input │
│ (Microphone) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Speech Recognition│
│ Google API/Sphinx │
└────────┬────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐
│ NLP Engine │───────│ Music Theory │
│ (Transformers) │ │ Chatbot │
└────────┬────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ DAW Commands │
│ Parser │
└────────┬────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐
│ MIDI Output │───────│ LoopMIDI │
│ (Mido) │ │ Virtual Port │
└─────────────────┘ └────────┬────────┘
│
▼
┌─────────────────┐
│ DAW Software │
│ (Ableton/Logic) │
└─────────────────┘
By default, the application tries to connect to these MIDI ports:
- "LoopMIDI Port 1"
- "LoopMIDI Port"
- "loopMIDI Port 1"
Configure your preferred port in .env:
MIDI_PORT_NAME=LoopMIDI Port 1
Default audio configuration:
AUDIO_SAMPLE_RATE=44100
AUDIO_BUFFER_SIZE=512
RECORD_DURATION=5
For iPad/mobile access:
SERVER_PORT=7860
SERVER_HOST=0.0.0.0
- Verify LoopMIDI is running and port name matches
- Check DAW MIDI preferences
- Restart the application and DAW
- Check Windows microphone permissions (Settings → Privacy → Microphone)
- Test microphone in Windows Sound settings
- Try different browsers (Chrome recommended for best compatibility)
- Ensure PocketSphinx is installed for offline recognition
- Verify devices are on the same WiFi network
- Check Windows Firewall allows port 7860
- Use direct IP address instead of hostname
- Clear browser cache
- Check internet connection (required for Google Speech API)
- Verify Hugging Face model download completed
- Try smaller transformer models for low-memory systems
- Check transformers library compatibility
Windows:
pip install pipwin
pipwin install pyaudiomacOS:
brew install portaudio
pip install pyaudioLinux:
sudo apt-get install portaudio19-dev python3-pyaudio
pip install pyaudioAudioCommandController/
├── app.py # Main application
├── requirements.txt # Python dependencies
├── .env.example # Environment template
├── .gitignore # Git ignore rules
├── README.md # This file
├── SETUP.md # Detailed hardware setup
├── faderport_data.json # Session state (auto-generated)
└── chat_history.json # Chat logs (auto-generated)
- gradio - Web UI framework
- transformers - AI models for chatbot
- torch - PyTorch for ML models
- speechrecognition - Voice recognition
- pyaudio - Audio I/O
- mido - MIDI communication
- python-rtmidi - MIDI backend
- pandas - Data management
- python-dotenv - Environment configuration
- pocketsphinx - Offline speech recognition
- numpy - Numerical operations
All track settings, fader positions, and mute/solo states are automatically saved to faderport_data.json and restored on application restart.
Music theory conversations are logged to chat_history.json for review and learning.
The application tries multiple transformer models in order of preference:
- facebook/blenderbot-400M-distill
- microsoft/DialoGPT-medium
- microsoft/DialoGPT-small
Enhanced Gradio audio settings ensure compatibility with iOS Safari and Chrome browsers for iPad control.
- Close unused browser tabs to free memory for AI models
- Use smaller transformer models on systems with limited RAM
- Disable chat history logging for faster performance
- Reduce audio buffer size for lower latency
Contributions are welcome! Please feel free to submit issues and pull requests.
This project is open source and available under the MIT License.
- PreSonus for FaderPort 16 hardware inspiration
- Gradio team for the excellent web UI framework
- Hugging Face for transformer models
- Speech Recognition library contributors
For issues, questions, or feature requests:
- Open an issue on GitHub
- Check SETUP.md for detailed configuration help
- Review troubleshooting section above
- Initial release
- Dual-mode voice control (Chat + DAW)
- 16-track MIDI control
- iOS-compatible web interface
- Session persistence
- Multi-model AI fallbacks
Happy Music Making!