Skip to content

Ajatt-Tools/elevate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Elevate

High-accuracy audio/video transcription and subtitle generation powered by ElevenLabs Scribe.

Elevate wraps the ElevenLabs Speech-to-Text API into a battle-tested CLI pipeline that handles everything from 30-second trailers to 3-hour movies. Drop in a file or a YouTube URL, get production-ready subtitles.

Why Elevate

  • State-of-the-art accuracy β€” ElevenLabs Scribe v2 delivers the lowest word error rate across 90+ languages, outperforming Whisper, Deepgram, and AssemblyAI on most benchmarks.
  • CJK-aware subtitle pipeline β€” purpose-built for Chinese, Japanese, and Korean. Sentence splitting respects CJK punctuation, line breaking uses character-width logic, and reading speed targets are tuned per script (CJK CPS vs Latin CPS).
  • Speaker diarization β€” up to 32 speakers, automatically labeled in the transcript.
  • Audio event tagging β€” [laughter], [applause], [music] and other non-speech sounds are captured with accurate timestamps.
  • URL transcription β€” transcribe YouTube, TikTok, or any hosted video/audio URL directly. ElevenLabs downloads the media server-side; nothing is saved locally.
  • Chunked processing β€” long files are automatically split, transcribed in parallel, and merged back with correct timestamps. Crash recovery via state files means you never re-upload a completed chunk.
  • API key rotation β€” add multiple ElevenLabs keys, each tracked with per-key usage stats. When one key hits its quota, the next one picks up automatically.
  • SOCKS5 proxy β€” native SOCKS5 support for regions where ElevenLabs is not directly reachable.
  • FFmpeg progress β€” real-time percentage display during audio extraction from video files.
  • Intelligent duration clamping β€” word-level timestamp correction prevents subtitles from displaying too long (common STT artifact), reducing >7s subtitle occurrences by ~50%.

Quick Start

Prerequisites

  • Go 1.21+ (to build from source)
  • FFmpeg (for video files)
  • An ElevenLabs API key β€” sign up free (4.5 hours STT/month, no credit card)

Install

git clone <repo-url> && cd elevate
go build -o elevate .

Add your API key

./elevate keys add sk-your-elevenlabs-key-here

You can add multiple keys for automatic rotation:

./elevate keys add sk-key-one
./elevate keys add sk-key-two
./elevate keys import keys.txt   # one key per line

Transcribe

# Local video file (auto-extracts audio, splits if >8min, generates SRT)
./elevate transcribe movie.mkv

# YouTube URL (zero download β€” ElevenLabs fetches it server-side)
./elevate transcribe --url "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# Batch process a directory
./elevate batch /path/to/videos/

# Specify language for better accuracy
./elevate transcribe --language zh movie_mandarin.mkv

# Choose a specific audio stream (0-based)
./elevate transcribe --stream 1 movie_with_multiple_audio.mkv

Output

For movie.mkv, Elevate produces:

File Content
movie.srt Production-ready subtitles
movie.transcript.json Raw API response with word-level timestamps

Configuration

On first run, Elevate creates ~/.config/elevate/config.toml with sensible defaults:

[api]
model = "scribe_v2"
language = ""              # empty = auto-detect
diarize = true
tag_audio_events = true
timestamps_granularity = "word"

[proxy]
url = ""                   # e.g. "socks5://127.0.0.1:2080"

[subtitle]
min_duration = 0.8
max_duration = 7.0
cjk_cps = 9.0             # characters per second (CJK)
latin_cps = 21.0           # characters per second (Latin)
cjk_chars_per_line = 18
latin_chars_per_line = 42
clamp_factor = 2.5         # word duration clamping multiplier
max_word_duration = 3.0    # absolute max word duration (seconds)

[processing]
split_threshold_min = 8    # split files longer than N minutes
max_concurrent_uploads = 4
max_retries = 3

[output]
save_transcript_json = true

Key Management

elevate keys list           # show all keys with usage stats
elevate keys add <key>      # add and verify a key
elevate keys remove <key>   # remove a key
elevate keys import <file>  # bulk import from file

Keys are stored in ~/.config/elevate/keys.json with per-key usage tracking (request count, total audio seconds, last used timestamp). Keys rotate automatically β€” when one hits its quota, the next active key takes over.

Architecture

cmd/             CLI commands (cobra)
internal/
  api/           ElevenLabs HTTP client, retry logic, error classification
  config/        TOML config with auto-creation
  engine/        Orchestrator: probe β†’ extract β†’ split β†’ upload β†’ merge β†’ generate
  keys/          Multi-key manager with round-robin rotation and usage tracking
  media/         FFmpeg wrapper: probe, extract, split, transcode, progress
  proxy/         SOCKS5 dialer integration
  subtitle/      Pipeline: word splitting β†’ duration clamping β†’ sentence merging β†’ SRT
  util/          CJK detection, time formatting

Tech Stack

Component Technology
Language Go
STT API ElevenLabs Scribe v1/v2
CLI Cobra
Config TOML
Media FFmpeg/ffprobe
Proxy golang.org/x/net/proxy (SOCKS5)

Known Limitations

  • ElevenLabs merged token bug β€” Scribe occasionally merges sentence-ending punctuation with the next word (e.g., ?Harry。). Affects ~10 tokens per 2-hour film, primarily with English names in CJK speech. Tracked upstream at elevenlabs-python#607.
  • Non-deterministic results β€” the STT model may return slightly different transcripts for the same audio across API calls. Use the seed parameter (planned) for reproducibility.
  • URL mode skips chunking β€” --url sends the full URL to ElevenLabs; local chunking does not apply. Files up to 10 hours / 3 GB are supported by the API.

License

GPL3

About

πŸ₯’ Generate subtitles with ElevenLabs!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages