Show HN: Sub-tools – AI-powered subtitle generation using WhisperX and Gemini

github.com

1 point

dohyeondk

5 hours ago


1 comment

dohyeondk 5 hours ago

I built sub-tools to solve a problem I had: creating accurate, multilingual subtitles for video content without spending hours on manual transcription or paying for expensive services.

I started with a pure-LLM solution, letting Gemini generate SRT from the audio file. It was slow and not accurate, so I had to make a few tweaks, including splitting the audio into smaller chunks and validating the SRT and retrying if not valid. It was okay until I took the new approach.

v0.8.0 now uses a three-stage AI pipeline:

1. WhisperX for word-level aligned transcription

2. Google Gemini for proofreading and error correction

3. Gemini again for context-aware translation

I'm satisfied with the result. I'd love for you to try it out and hear what you think.