AI-Powered Localization & Audio Narration

Much of the web is built for a narrow audience: English speakers who can read a screen. This guide demonstrates how to reach a broader audience using AI translation and voice synthesis across 11 languages that cover over 75% of the world's population.

The techniques described here enable access for non-English speakers, people with visual impairments, reading difficulties, or those who prefer listening to reading.

What you'll need

Vercel account (free tier)
Anthropic API key for Claude Opus 4.5
ElevenLabs API key for voice synthesis ($5/mo starter plan or pay-as-you-go)
Node.js for running build scripts

Claude Opus 4.5 provides high-quality, nuanced translations. ElevenLabs offers $5-22/mo subscription plans or pay-as-you-go pricing at $0.30 per 1,000 characters. Subsequent updates only regenerate changed content, keeping ongoing costs minimal.

How it works

Content is translated at build time using Claude Opus 4.5, cached on Vercel's edge network, and optionally synthesized into audio using ElevenLabs. Hash-based change detection ensures you only pay to regenerate what's changed.

Claude Opus 4.5

Nuanced, context-aware translation to 11 languages

→

Vercel KV

Global edge caching for instant delivery

→

ElevenLabs

Voice synthesis in 11 languages

User Experience

A flag icon in the header provides access to a language selector supporting 11 languages that cover over 75% of the world's population. Languages with audio narration available display a 🔊 indicator.

Language Selector

Audio narration is available in eleven languages: English, Spanish, Chinese, Hindi, Arabic, French, Portuguese, Russian, Indonesian, Japanese, and Korean. For these languages, a "Narrate this page" button appears that plays synthesized audio while highlighting paragraphs in sequence. Playback can be paused and resumed via the button or Option+P (Alt+P on Windows).

Developer Experience: Visual Workflow

The development environment provides immediate visual feedback. Modified content is flagged automatically, and the interface displays the exact commands needed to regenerate translations and audio.

The Workflow

Content changes are detected, flagged, and resolved through a three-step process.

1 Edit content, see MODIFIED badges

MODIFIED

Much of the web is built for a narrow audience: English speakers who can read a screen.

MODIFIED

This guide demonstrates how to expand that reach using AI translation and voice synthesis.

The techniques described here enable access for non-English speakers, people with visual impairments, or those who prefer listening.

2 Click buttons to copy commands

PRE-TRANSLATE SITE 🌍

node pre-translate.js (requires vercel dev @ 3000)

GENERATE NARRATION ⚡

node generate-narration.js localization/index.html --all-langs

3 Run in terminal, badges disappear

Much of the web is built for a narrow audience: English speakers who can read a screen.

This guide demonstrates how to expand that reach using AI translation and voice synthesis.

The techniques described here enable access for non-English speakers, people with visual impairments, or those who prefer listening.

Data Attributes

Implementation requires adding two data attributes to HTML elements that should be translated or narrated:

<!-- For translation only -->
<p data-l10n-id="page-1">This paragraph will be translated.</p>

<!-- For translation AND narration -->
<p data-narration="0" data-l10n-id="page-2">This will be translated and read aloud.</p>

Translation System: Claude Opus 4.5 + Vercel KV

The translation pipeline runs at build time, not runtime. A Node.js script extracts all translatable content, sends it to Anthropic's Claude Opus 4.5 model, and stores results in Vercel KV. The system supports 11 languages covering over 75% of the world's population.

Running the Script

# From project root, start local dev server (required for API access)
vercel dev

# In another terminal (also from project root), run translation
node pre-translate.js

# Translate a specific page only
node pre-translate.js --page=localization

# Translate to a specific language only
node pre-translate.js --lang=es

# Combine filters for one page, one language
node pre-translate.js --page=localization --lang=fr

# NOTE: If content is unchanged, the script will skip translation.
# To force regeneration (e.g., after API failures or for testing),
# delete the "_translationHash" line for that page in content-hashes.json

Narration System: ElevenLabs Voice Synthesis

Audio narration uses ElevenLabs' text-to-speech API. For English, a cloned voice provides consistency. For other languages, ElevenLabs' multilingual voices handle the synthesis. Audio files are saved to a structured folder hierarchy.

Audio File Structure

/audio/
├── en/
│   └── page-name/
│       ├── p0.mp3    # First narrated element
│       ├── p1.mp3    # Second narrated element
│       └── p2.mp3    # ...and so on
├── es/
│   └── page-name/
│       └── ...
├── zh/
├── hi/
├── ar/
└── fr/

Running the Script

# All commands run from project root

# Generate English narration
node generate-narration.js page-name/index.html

# Generate for a specific language
node generate-narration.js page-name/index.html --lang es

# Generate for all 11 narration-enabled languages
node generate-narration.js page-name/index.html --all-langs

# Resume interrupted generation (skips existing files)
node generate-narration.js page-name/index.html --all-langs --resume

Code & Implementation

The complete implementation is available as open-source code. The following scripts form the core of the system.

Key Scripts

pre-translate.js

The translation generation script. Extracts content, calls Claude Opus 4.5, and stores results in Vercel KV.

View on GitHub Gist →

translate-api.js

The Vercel serverless API endpoint. Handles translation requests via Claude Opus 4.5 and caches results in Vercel KV.

View on GitHub Gist →

generate-narration.js

The audio generation script. Synthesizes speech via ElevenLabs and saves MP3 files.

View on GitHub Gist →

speech.js

The client-side narration player. Manages playback, paragraph highlighting, and keyboard shortcuts.

View on GitHub Gist →

localization-manager.js

The client-side translation handler. Manages language switching, caching, and DOM updates.

View on GitHub Gist →

content-manager.js

The dev workflow tools. Handles change detection, MODIFIED badges, and generation buttons.

View on GitHub Gist →

Caveats & Limitations

This approach involves tradeoffs that warrant consideration:

Cost: Claude Opus 4.5 provides high-quality translations. ElevenLabs offers subscriptions ($5-22/mo) or pay-as-you-go ($0.30/1K characters). Hash-based caching ensures you only regenerate changed content.
Translation quality: AI translation has limitations. Idioms, cultural references, and domain-specific terminology may be mistranslated. For critical content (legal, medical), human translation remains the standard.
Voice consistency: Non-English narration uses different voice models, so the "speaker" sounds different across languages.
API dependencies: The system relies on external APIs that may change pricing or deprecate features over time.

This system is designed for content-focused sites with relatively static text. Real-time chat, user-generated content, and highly dynamic interfaces require different architectures.

✶✶✶✶

About the Author

Burton Rast is a designer, a photographer, and a public speaker who loves to make things.

LinkedIn Instagram Contact Home