No description

Find a file

Brian de Heus 7b200056bb type type type		2025-10-10 01:50:43 +09:00
examples	Initial commit	2025-10-09 10:40:20 +09:00
hints	Initial commit	2025-10-09 10:40:20 +09:00
lib	type type type	2025-10-10 01:50:43 +09:00
.gitignore	Initial commit	2025-10-09 10:40:20 +09:00
LICENSE.md	Adds license	2025-10-10 01:03:34 +09:00
powersubs.py	type type type	2025-10-10 01:50:43 +09:00
pyproject.toml	Fixes description	2025-10-09 11:31:15 +09:00
README.md	Copyright is dead	2025-10-09 16:18:55 +09:00
uv.lock	Initial commit	2025-10-09 10:40:20 +09:00

README.md

PowerSubs

AI-powered subtitle translation tool that preserves formatting and uses context for accurate translations.

Features

Context-Aware Translation: Uses surrounding subtitle lines to improve translation accuracy
Format Preservation: Maintains all SRT formatting including tags (<i>, <b>, <font>), special characters, and line breaks
Multi-threaded Processing: Fast translation using concurrent workers
Flexible Model Support: Works with any OpenAI-compatible API (OpenAI, Anthropic, local models via OpenRouter, etc.)

Installation

Requires Python 3.10 or higher.

# Install dependencies with uv
uv sync

Configuration

PowerSubs works with any OpenAI SDK-compatible API endpoint. Create a .env file in the project root:

LLM_API_URL=https://api.openai.com/v1
LLM_API_KEY=your_api_key_here

Compatible providers:

OpenAI: LLM_API_URL=https://api.openai.com/v1
OpenRouter: LLM_API_URL=https://openrouter.ai/api/v1
Anthropic (via OpenAI SDK): LLM_API_URL=https://api.anthropic.com/v1
Local models: Point to your local OpenAI-compatible API endpoint (e.g., Ollama, vLLM, llama.cpp)

The API must support structured outputs with JSON schema (OpenAI's response_format parameter).

Usage

python powersubs.py \
  --input examples/input.srt \
  --output output.srt \
  --language ja \
  --model google/gemini-2.5-flash \
  --context_length 10 \
  --threads 8

Arguments

Argument	Required	Default	Description
`--input`	Yes	-	Path to input SRT file
`--output`	Yes	-	Path to output SRT file
`--language`	Yes	-	Target language (ISO 639-1 code, e.g., `ja`, `es`, `fr`)
`--model`	No	`google/gemini-2.5-flash`	Model to use for translation
`--context_length`	No	`10`	Number of surrounding lines to use as context
`--threads`	No	`8`	Number of concurrent translation threads

Example

# Translate English subtitles to Japanese
python powersubs.py \
  --input movie.srt \
  --output movie_ja.srt \
  --language ja

# Translate to Spanish with more context
python powersubs.py \
  --input show.srt \
  --output show_es.srt \
  --language es \
  --context_length 15 \
  --threads 16

How It Works

Parse: Reads and parses the input SRT file
Context Extraction: For each subtitle line, extracts N surrounding lines for context
Translation: Sends each line to the LLM with context and strict formatting rules
Validation: Uses Pydantic models to ensure structured responses
Compose: Writes translated subtitles to output file while preserving timing and formatting

Translation Hints

You can provide custom translation hints for each language by creating a file at hints/{language}.md where {language} is the ISO 639-1 language code.

For example, to add hints for Japanese translations, create hints/ja.md:

- Use informal language for casual conversations
- Translate "sensei" as "先生" not "teacher"
- Keep character names in English
- Use polite forms for formal dialogue

These hints will be automatically included in the translation prompt to guide the model's output.

Translation Quality

PowerSubs uses context-aware translation to improve quality:

Context Window: Each line is translated with awareness of surrounding dialogue
Format Rules: Explicit instructions ensure markup tags and special characters are preserved
Structured Output: JSON schema validation prevents malformed responses
Custom Hints: Optional language-specific translation guidance via hint files

Development

# Format code
uv run black .
uv run isort .

# Run with development dependencies
uv sync

Project Structure

powersubs/
├── powersubs.py          # Main script
├── lib/
│   ├── args.py          # CLI argument parsing
│   └── llm.py           # LLM interaction with structured output
├── examples/
│   └── input.srt        # Example SRT file
├── hints/               # Optional translation hints by language
│   └── {language}.md    # e.g., ja.md, es.md, fr.md
└── pyproject.toml       # Project configuration

Dependencies

openai (≥2.2.0): API client for LLM interactions
pydantic (≥2.12.0): Data validation and structured output
python-dotenv (≥1.1.1): Environment variable management
srt (≥3.5.3): SRT subtitle parsing and composition

License

MIT