MXESCO-DOCKER

MXESCO-DOCKER is a robust and modular application designed for processing audio files. It includes features for transcription, phonemization, metadata generation, and storage in a MongoDB database. Built with FastAPI, it leverages advanced libraries such as OpenAI's Whisper and Hugging Face's Wav2Vec2 for speech and phoneme recognition.

Features

Audio Transcription: Extracts text from audio files with word-level timestamps.
Phonemization: Converts audio data into phonemes with detailed character offsets.
Metadata Generation: Includes information about the transcriber model, phonemizer, and timestamps.
Data Storage: Stores processed data and raw audio in MongoDB using GridFS.
REST API: Exposes endpoints for uploading and processing audio files.
Containerization: Dockerized for ease of deployment.

Project Structure

MXESCO-DOCKER/
├── app/
│   ├── routes/
│   │   ├── __init__.py
│   │   ├── audio_routes.py
│   ├── services/
│   │   ├── __init__.py
│   │   ├── audio_processing.py
│   │   ├── corpus_app.py
│   │   ├── database.py
│   ├── utils/
│   │   ├── __init__.py
│   │   ├── phonemization.py
│   │   ├── timestamps.py
│   │   ├── transcription.py
│   ├── main.py
├── docker-compose.yml
├── Dockerfile
├── LICENSE
├── README.md
├── requirements.txt

Key Files

main.py: Entry point for the FastAPI application.
audio_routes.py: Defines API endpoints for processing audio files.
audio_processing.py: Handles transcription, phonemization, and metadata generation.
database.py: Saves metadata and audio files to MongoDB.
corpus_app.py: Processes word and phoneme data for enriched metadata.
utils/: Utility functions for timestamps, transcription, and phonemization.

Installation

Prerequisites

Docker and Docker Compose installed
Python 3.9+

Steps

Clone the repository:

git clone <repository_url>
cd mxesco-docker

Build and run the Docker containers:
```
docker compose up --build
```
The API will be available at http://localhost:8000.

Running Locally Without Docker

Install dependencies:
```
pip install -r requirements.txt
```
Start the FastAPI server:
```
uvicorn app.main:app --reload
```

Usage

API Endpoints

Process Audio File

Endpoint: /api/process-audio/
Method: POST
Description: Uploads an audio file for processing.

Example Request:

curl -X POST "http://127.0.0.1:8000/api/process-audio/" \
     -F "file=@example_audio.mp3"

Response:

{
    "status": "success",
    "message": "Audio processed and saved successfully."
}

Interactive API Documentation

Swagger UI: http://127.0.0.1:8000/docs
ReDoc: http://127.0.0.1:8000/redoc

Technologies Used

FastAPI: Web framework for building APIs.
PyTorch: For handling audio data and Wav2Vec2 model inference.
Whisper: OpenAI's speech-to-text library.
MongoDB & GridFS: For data persistence.
Docker: For containerized deployment.
Pydub: For audio file manipulation.
Phonemizer: For generating phonemes from text.

Configuration

Docker

docker-compose.yml:
- Defines two services:
  1. app: The FastAPI application.
  2. mongo: MongoDB database.
- Exposes ports 8000 for the application and 27017 for MongoDB.

Environment Variables

To customize settings, modify the environment variables in the docker-compose.yml file or create a .env file.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! To contribute:

Fork the repository.
Create a feature branch.
Commit your changes.
Submit a pull request.

Acknowledgements

OpenAI for Whisper
Hugging Face for Wav2Vec2
MongoDB for efficient data handling
Maestría en Ciencia de Datos, Universidad de Sonora (GitHub Repository)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MXESCO-DOCKER

Features

Project Structure

Key Files

Installation

Prerequisites

Steps

Running Locally Without Docker

Usage

API Endpoints

Process Audio File

Interactive API Documentation

Technologies Used

Configuration

Docker

Environment Variables

License

Contributing

Acknowledgements

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
app		app
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

MXESCO-DOCKER

Features

Project Structure

Key Files

Installation

Prerequisites

Steps

Running Locally Without Docker

Usage

API Endpoints

Process Audio File

Interactive API Documentation

Technologies Used

Configuration

Docker

Environment Variables

License

Contributing

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages