Elevate

High-accuracy audio/video transcription and subtitle generation powered by ElevenLabs Scribe.

Elevate wraps the ElevenLabs Speech-to-Text API into a battle-tested CLI pipeline that handles everything from 30-second trailers to 3-hour movies. Drop in a file or a YouTube URL, get production-ready subtitles.

Why Elevate

State-of-the-art accuracy — ElevenLabs Scribe v2 delivers the lowest word error rate across 90+ languages, outperforming Whisper, Deepgram, and AssemblyAI on most benchmarks.
CJK-aware subtitle pipeline — purpose-built for Chinese, Japanese, and Korean. Sentence splitting respects CJK punctuation, line breaking uses character-width logic, and reading speed targets are tuned per script (CJK CPS vs Latin CPS).
Speaker diarization — up to 32 speakers, automatically labeled in the transcript.
Audio event tagging — [laughter], [applause], [music] and other non-speech sounds are captured with accurate timestamps.
URL transcription — transcribe YouTube, TikTok, or any hosted video/audio URL directly. ElevenLabs downloads the media server-side; nothing is saved locally.
Chunked processing — long files are automatically split, transcribed in parallel, and merged back with correct timestamps. Crash recovery via state files means you never re-upload a completed chunk.
API key rotation — add multiple ElevenLabs keys, each tracked with per-key usage stats. When one key hits its quota, the next one picks up automatically.
SOCKS5 proxy — native SOCKS5 support for regions where ElevenLabs is not directly reachable.
FFmpeg progress — real-time percentage display during audio extraction from video files.
Intelligent duration clamping — word-level timestamp correction prevents subtitles from displaying too long (common STT artifact), reducing >7s subtitle occurrences by ~50%.

Quick Start

Prerequisites

Go 1.21+ (to build from source)
FFmpeg (for video files)
An ElevenLabs API key — sign up free (4.5 hours STT/month, no credit card)

Install

git clone <repo-url> && cd elevate
go build -o elevate .

Add your API key

./elevate keys add sk-your-elevenlabs-key-here

You can add multiple keys for automatic rotation:

./elevate keys add sk-key-one
./elevate keys add sk-key-two
./elevate keys import keys.txt   # one key per line

Transcribe

# Local video file (auto-extracts audio, splits if >8min, generates SRT)
./elevate transcribe movie.mkv

# YouTube URL (zero download — ElevenLabs fetches it server-side)
./elevate transcribe --url "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# Batch process a directory
./elevate batch /path/to/videos/

# Specify language for better accuracy
./elevate transcribe --language zh movie_mandarin.mkv

# Choose a specific audio stream (0-based)
./elevate transcribe --stream 1 movie_with_multiple_audio.mkv

Output

For movie.mkv, Elevate produces:

File	Content
`movie.srt`	Production-ready subtitles
`movie.transcript.json`	Raw API response with word-level timestamps

Configuration

On first run, Elevate creates ~/.config/elevate/config.toml with sensible defaults:

[api]
model = "scribe_v2"
language = ""              # empty = auto-detect
diarize = true
tag_audio_events = true
timestamps_granularity = "word"

[proxy]
url = ""                   # e.g. "socks5://127.0.0.1:2080"

[subtitle]
min_duration = 0.8
max_duration = 7.0
cjk_cps = 9.0             # characters per second (CJK)
latin_cps = 21.0           # characters per second (Latin)
cjk_chars_per_line = 18
latin_chars_per_line = 42
clamp_factor = 2.5         # word duration clamping multiplier
max_word_duration = 3.0    # absolute max word duration (seconds)

[processing]
split_threshold_min = 8    # split files longer than N minutes
max_concurrent_uploads = 4
max_retries = 3

[output]
save_transcript_json = true

Key Management

elevate keys list           # show all keys with usage stats
elevate keys add <key>      # add and verify a key
elevate keys remove <key>   # remove a key
elevate keys import <file>  # bulk import from file

Keys are stored in ~/.config/elevate/keys.json with per-key usage tracking (request count, total audio seconds, last used timestamp). Keys rotate automatically — when one hits its quota, the next active key takes over.

Architecture

cmd/             CLI commands (cobra)
internal/
  api/           ElevenLabs HTTP client, retry logic, error classification
  config/        TOML config with auto-creation
  engine/        Orchestrator: probe → extract → split → upload → merge → generate
  keys/          Multi-key manager with round-robin rotation and usage tracking
  media/         FFmpeg wrapper: probe, extract, split, transcode, progress
  proxy/         SOCKS5 dialer integration
  subtitle/      Pipeline: word splitting → duration clamping → sentence merging → SRT
  util/          CJK detection, time formatting

Tech Stack

Component	Technology
Language	Go
STT API	ElevenLabs Scribe v1/v2
CLI	Cobra
Config	TOML
Media	FFmpeg/ffprobe
Proxy	`golang.org/x/net/proxy` (SOCKS5)

Known Limitations

ElevenLabs merged token bug — Scribe occasionally merges sentence-ending punctuation with the next word (e.g., ？Harry。). Affects ~10 tokens per 2-hour film, primarily with English names in CJK speech. Tracked upstream at elevenlabs-python#607.
Non-deterministic results — the STT model may return slightly different transcripts for the same audio across API calls. Use the seed parameter (planned) for reproducibility.
URL mode skips chunking — --url sends the full URL to ElevenLabs; local chunking does not apply. Files up to 10 hours / 3 GB are supported by the API.

License

GPL3

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
cmd		cmd
internal		internal
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Elevate

Why Elevate

Quick Start

Prerequisites

Install

Add your API key

Transcribe

Output

Configuration

Key Management

Architecture

Tech Stack

Known Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Elevate

Why Elevate

Quick Start

Prerequisites

Install

Add your API key

Transcribe

Output

Configuration

Key Management

Architecture

Tech Stack

Known Limitations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages