AI-Powered Localization & Audio Narration

Much of the web is built for a narrow audience: English speakers who can read a screen. This guide demonstrates how to reach a broader audience using AI translation and voice synthesis across 11 languages that cover over 75% of the world's population.

The techniques described here enable access for non-English speakers, people with visual impairments, reading difficulties, or those who prefer listening to reading.

What you'll need

Claude Opus 4.5 provides high-quality, nuanced translations. ElevenLabs offers $5-22/mo subscription plans or pay-as-you-go pricing at $0.30 per 1,000 characters. Subsequent updates only regenerate changed content, keeping ongoing costs minimal.

How it works

Content is translated at build time using Claude Opus 4.5, cached on Vercel's edge network, and optionally synthesized into audio using ElevenLabs. Hash-based change detection ensures you only pay to regenerate what's changed.

Claude Opus 4.5

Nuanced, context-aware translation to 11 languages

โ†’

Vercel KV

Global edge caching for instant delivery

โ†’

ElevenLabs

Voice synthesis in 11 languages

User Experience

A flag icon in the header provides access to a language selector supporting 11 languages that cover over 75% of the world's population. Languages with audio narration available display a ๐Ÿ”Š indicator.

Language Selector

Audio narration is available in eleven languages: English, Spanish, Chinese, Hindi, Arabic, French, Portuguese, Russian, Indonesian, Japanese, and Korean. For these languages, a "Narrate this page" button appears that plays synthesized audio while highlighting paragraphs in sequence. Playback can be paused and resumed via the button or Option+P (Alt+P on Windows).

Developer Experience: Visual Workflow

The development environment provides immediate visual feedback. Modified content is flagged automatically, and the interface displays the exact commands needed to regenerate translations and audio.

The Workflow

Content changes are detected, flagged, and resolved through a three-step process.

1 Edit content, see MODIFIED badges
MODIFIED

Much of the web is built for a narrow audience: English speakers who can read a screen.

MODIFIED

This guide demonstrates how to expand that reach using AI translation and voice synthesis.

The techniques described here enable access for non-English speakers, people with visual impairments, or those who prefer listening.

2 Click buttons to copy commands
PRE-TRANSLATE SITE ๐ŸŒ
node pre-translate.js (requires vercel dev @ 3000)
GENERATE NARRATION โšก
node generate-narration.js localization/index.html --all-langs
3 Run in terminal, badges disappear

Much of the web is built for a narrow audience: English speakers who can read a screen.

This guide demonstrates how to expand that reach using AI translation and voice synthesis.

The techniques described here enable access for non-English speakers, people with visual impairments, or those who prefer listening.

Data Attributes

Implementation requires adding two data attributes to HTML elements that should be translated or narrated:

<!-- For translation only -->
<p data-l10n-id="page-1">This paragraph will be translated.</p>

<!-- For translation AND narration -->
<p data-narration="0" data-l10n-id="page-2">This will be translated and read aloud.</p>

Translation System: Claude Opus 4.5 + Vercel KV

The translation pipeline runs at build time, not runtime. A Node.js script extracts all translatable content, sends it to Anthropic's Claude Opus 4.5 model, and stores results in Vercel KV. The system supports 11 languages covering over 75% of the world's population.

Running the Script

# From project root, start local dev server (required for API access)
vercel dev

# In another terminal (also from project root), run translation
node pre-translate.js

# Translate a specific page only
node pre-translate.js --page=localization

# Translate to a specific language only
node pre-translate.js --lang=es

# Combine filters for one page, one language
node pre-translate.js --page=localization --lang=fr

# NOTE: If content is unchanged, the script will skip translation.
# To force regeneration (e.g., after API failures or for testing),
# delete the "_translationHash" line for that page in content-hashes.json

Narration System: ElevenLabs Voice Synthesis

Audio narration uses ElevenLabs' text-to-speech API. For English, a cloned voice provides consistency. For other languages, ElevenLabs' multilingual voices handle the synthesis. Audio files are saved to a structured folder hierarchy.

Audio File Structure

/audio/
โ”œโ”€โ”€ en/
โ”‚   โ””โ”€โ”€ page-name/
โ”‚       โ”œโ”€โ”€ p0.mp3    # First narrated element
โ”‚       โ”œโ”€โ”€ p1.mp3    # Second narrated element
โ”‚       โ””โ”€โ”€ p2.mp3    # ...and so on
โ”œโ”€โ”€ es/
โ”‚   โ””โ”€โ”€ page-name/
โ”‚       โ””โ”€โ”€ ...
โ”œโ”€โ”€ zh/
โ”œโ”€โ”€ hi/
โ”œโ”€โ”€ ar/
โ””โ”€โ”€ fr/

Running the Script

# All commands run from project root

# Generate English narration
node generate-narration.js page-name/index.html

# Generate for a specific language
node generate-narration.js page-name/index.html --lang es

# Generate for all 11 narration-enabled languages
node generate-narration.js page-name/index.html --all-langs

# Resume interrupted generation (skips existing files)
node generate-narration.js page-name/index.html --all-langs --resume

Code & Implementation

The complete implementation is available as open-source code. The following scripts form the core of the system.

Key Scripts

Caveats & Limitations

This approach involves tradeoffs that warrant consideration:

  • Cost: Claude Opus 4.5 provides high-quality translations. ElevenLabs offers subscriptions ($5-22/mo) or pay-as-you-go ($0.30/1K characters). Hash-based caching ensures you only regenerate changed content.
  • Translation quality: AI translation has limitations. Idioms, cultural references, and domain-specific terminology may be mistranslated. For critical content (legal, medical), human translation remains the standard.
  • Voice consistency: Non-English narration uses different voice models, so the "speaker" sounds different across languages.
  • API dependencies: The system relies on external APIs that may change pricing or deprecate features over time.

This system is designed for content-focused sites with relatively static text. Real-time chat, user-generated content, and highly dynamic interfaces require different architectures.

โœถโœถโœถโœถ

About the Author

Burton Rast is a designer, a photographer, and a public speaker who loves to make things.