Sitemap

Automatic Speech Recognition with Gemma

2 min readSep 2, 2025
Press enter or click to view image in full size
gemma

I’ve created a complete ASR (Automatic Speech Recognition) demo using Docker Compose with the following architecture:

πŸ—οΈ Architecture Overview

3 Microservices:

  1. Ollama Service β€” Runs Gemma 2:2B model for text enhancement
  2. ASR Service β€” FastAPI backend with Whisper for transcription
  3. Web UI β€” Nginx-served interactive frontend

πŸš€ Key Features

Audio Input:

  • βœ… Browser-based recording with microphone
  • βœ… File upload with drag & drop (MP3, WAV, M4A, OGG)

Processing Pipeline:

  • βœ… Whisper (tiny model) for fast speech-to-text
  • βœ… Ollama Gemma 2:2B for text enhancement and correction
  • βœ… Processing time tracking

User Experience:

  • βœ… Real-time recording with timer
  • βœ… Health status monitoring
  • βœ… Side-by-side comparison of raw vs enhanced text
  • βœ… Responsive modern UI

πŸ“ Quick Setup

  1. Create project structure:
mkdir asr-demo && cd asr-demo

--

--

Dhiraj Patra
Dhiraj Patra

Written by Dhiraj Patra

AI Strategy, Generative AI, AI & ML Consulting, Product Development, Startup Advisory, Data Architecture, Data Analytics, Executive Mentorship, Value Creation

No responses yet