No description
Find a file
2025-10-10 01:50:43 +09:00
examples Initial commit 2025-10-09 10:40:20 +09:00
hints Initial commit 2025-10-09 10:40:20 +09:00
lib type type type 2025-10-10 01:50:43 +09:00
.gitignore Initial commit 2025-10-09 10:40:20 +09:00
LICENSE.md Adds license 2025-10-10 01:03:34 +09:00
powersubs.py type type type 2025-10-10 01:50:43 +09:00
pyproject.toml Fixes description 2025-10-09 11:31:15 +09:00
README.md Copyright is dead 2025-10-09 16:18:55 +09:00
uv.lock Initial commit 2025-10-09 10:40:20 +09:00

PowerSubs

AI-powered subtitle translation tool that preserves formatting and uses context for accurate translations.

Features

  • Context-Aware Translation: Uses surrounding subtitle lines to improve translation accuracy
  • Format Preservation: Maintains all SRT formatting including tags (<i>, <b>, <font>), special characters, and line breaks
  • Multi-threaded Processing: Fast translation using concurrent workers
  • Flexible Model Support: Works with any OpenAI-compatible API (OpenAI, Anthropic, local models via OpenRouter, etc.)

Installation

Requires Python 3.10 or higher.

# Install dependencies with uv
uv sync

Configuration

PowerSubs works with any OpenAI SDK-compatible API endpoint. Create a .env file in the project root:

LLM_API_URL=https://api.openai.com/v1
LLM_API_KEY=your_api_key_here

Compatible providers:

  • OpenAI: LLM_API_URL=https://api.openai.com/v1
  • OpenRouter: LLM_API_URL=https://openrouter.ai/api/v1
  • Anthropic (via OpenAI SDK): LLM_API_URL=https://api.anthropic.com/v1
  • Local models: Point to your local OpenAI-compatible API endpoint (e.g., Ollama, vLLM, llama.cpp)

The API must support structured outputs with JSON schema (OpenAI's response_format parameter).

Usage

python powersubs.py \
  --input examples/input.srt \
  --output output.srt \
  --language ja \
  --model google/gemini-2.5-flash \
  --context_length 10 \
  --threads 8

Arguments

Argument Required Default Description
--input Yes - Path to input SRT file
--output Yes - Path to output SRT file
--language Yes - Target language (ISO 639-1 code, e.g., ja, es, fr)
--model No google/gemini-2.5-flash Model to use for translation
--context_length No 10 Number of surrounding lines to use as context
--threads No 8 Number of concurrent translation threads

Example

# Translate English subtitles to Japanese
python powersubs.py \
  --input movie.srt \
  --output movie_ja.srt \
  --language ja

# Translate to Spanish with more context
python powersubs.py \
  --input show.srt \
  --output show_es.srt \
  --language es \
  --context_length 15 \
  --threads 16

How It Works

  1. Parse: Reads and parses the input SRT file
  2. Context Extraction: For each subtitle line, extracts N surrounding lines for context
  3. Translation: Sends each line to the LLM with context and strict formatting rules
  4. Validation: Uses Pydantic models to ensure structured responses
  5. Compose: Writes translated subtitles to output file while preserving timing and formatting

Translation Hints

You can provide custom translation hints for each language by creating a file at hints/{language}.md where {language} is the ISO 639-1 language code.

For example, to add hints for Japanese translations, create hints/ja.md:

- Use informal language for casual conversations
- Translate "sensei" as "先生" not "teacher"
- Keep character names in English
- Use polite forms for formal dialogue

These hints will be automatically included in the translation prompt to guide the model's output.

Translation Quality

PowerSubs uses context-aware translation to improve quality:

  • Context Window: Each line is translated with awareness of surrounding dialogue
  • Format Rules: Explicit instructions ensure markup tags and special characters are preserved
  • Structured Output: JSON schema validation prevents malformed responses
  • Custom Hints: Optional language-specific translation guidance via hint files

Development

# Format code
uv run black .
uv run isort .

# Run with development dependencies
uv sync

Project Structure

powersubs/
├── powersubs.py          # Main script
├── lib/
│   ├── args.py          # CLI argument parsing
│   └── llm.py           # LLM interaction with structured output
├── examples/
│   └── input.srt        # Example SRT file
├── hints/               # Optional translation hints by language
│   └── {language}.md    # e.g., ja.md, es.md, fr.md
└── pyproject.toml       # Project configuration

Dependencies

  • openai (≥2.2.0): API client for LLM interactions
  • pydantic (≥2.12.0): Data validation and structured output
  • python-dotenv (≥1.1.1): Environment variable management
  • srt (≥3.5.3): SRT subtitle parsing and composition

License

MIT