Member-only story
Automatic Speech Recognition with Gemma
2 min readSep 2, 2025
Press enter or click to view image in full size![]()
Iβve created a complete ASR (Automatic Speech Recognition) demo using Docker Compose with the following architecture:
ποΈ Architecture Overview
3 Microservices:
- Ollama Service β Runs Gemma 2:2B model for text enhancement
- ASR Service β FastAPI backend with Whisper for transcription
- Web UI β Nginx-served interactive frontend
π Key Features
Audio Input:
- β Browser-based recording with microphone
- β File upload with drag & drop (MP3, WAV, M4A, OGG)
Processing Pipeline:
- β Whisper (tiny model) for fast speech-to-text
- β Ollama Gemma 2:2B for text enhancement and correction
- β Processing time tracking
User Experience:
- β Real-time recording with timer
- β Health status monitoring
- β Side-by-side comparison of raw vs enhanced text
- β Responsive modern UI
π Quick Setup
- Create project structure:
mkdir asr-demo && cd asr-demo