Advanced

Voiceover

Translate your subtitles into any language and generate an AI voice track for them. Four steps from a transcribed recording to a fully localized version.

Voiceover in TinyRec is a localization pipeline: it reads your existing subtitle cues, translates them into a target language, and generates an AI voice track that ships alongside (or in place of) your original narration. Same recording, new language — without re-recording, re-scripting, or hiring a voice actor.

Before you start: generate subtitles

Voiceover is driven by your subtitle cues. Each cue’s text becomes the source for translation and synthesis, and the cue’s start/end timing becomes the segment boundary on the timeline.

If you haven’t transcribed yet, do that first — see Subtitles for the Whisper download + Generate workflow. Once your timeline has a row of subtitle cues, come back here.

Step 1 — Pick a TTS provider

Open the Voice tab in the editor’s settings panel. The first row is the provider picker.

Voiceover provider picker showing Gemini, OpenAI, ElevenLabs, Replicate and Grok options

Each provider has its own strengths:

  • Gemini (Google) — natural, conversational. Strong on English, decent on a few other languages.
  • OpenAI — six built-in voices, all polished. The fastest path to “ship-quality” narration.
  • Grok — punchy, less generic-corporate than the others.
  • ElevenLabs — the gold standard for character voices. Hundreds of voices, including custom-trained ones if you have an account.
  • Replicate — point at any voice model on Replicate; great for experimenting with cutting-edge open-source models.

Pick the one whose voice catalogue best fits the tone you want. You can switch later — TinyRec keeps the keys for each provider configured independently.

Step 2 — Enter your API key

Each provider needs an API key. Paste yours into the field that appears under the provider you picked.

API key input field with paste, save, and verify buttons

A few notes:

  • Keys are stored locally on your Mac (in the editor’s secure preferences). They never go through TinyRec’s servers — only the provider you picked sees them.
  • You only enter the key once per provider. Switch back to that provider next session and the key is still there.
  • Each provider’s key page is in their own dashboard — Google AI Studio for Gemini, platform.openai.com for OpenAI, ElevenLabs.io for ElevenLabs, replicate.com for Replicate.

Hit Verify to confirm the key is valid before continuing. If verification fails, double-check the key and that your account has TTS permissions enabled.

Step 3 — Pick a model, voice, and language

With the provider configured, pick the specific model. The same provider often offers several — a smaller/faster one and a larger/higher-quality one.

Voice model picker with model name, voice selector, and language dropdown

Three settings sit together:

  • Model — the underlying TTS model from this provider. Smaller models render faster and cost less per request; larger models sound noticeably better. Default is usually the right pick.
  • Voice — the voice character. Most providers expose a click-to-preview list — warm, energetic, clinical, dramatic, etc. Pick the one that matches the tone of your video.
  • Language — the target language for translation and synthesis. TinyRec will translate your subtitle cues into this language using the model, then synthesize the voice. Pick Vietnamese, Spanish, Japanese, French — whatever your audience speaks.

If your subtitles are already in the target language (e.g. you transcribed Vietnamese audio and want a Vietnamese voiceover), TinyRec skips the translation step and just runs the synthesis.

Step 4 — Generate the voiceover

Click Generate. TinyRec walks every subtitle cue, translates it into the target language, sends it to the provider with your chosen voice + model, and drops the resulting audio onto the timeline as voiceover segments — one per subtitle cue, timed to match.

Generate Voice Over button with progress indicator showing per-segment translation and synthesis

You’ll see the progress per cue as it runs (translation first, then voice synthesis). When it completes, the voiceover row on the timeline fills with segments lining up with your subtitles.

See it on the timeline

Each generated voiceover lands as its own segment on the Voiceover row. Scroll the timeline and you’ll see them sitting underneath the matching subtitle cues, with the same start and end times.

From here you can:

  • Preview — click play; the voiceover mixes against the project’s existing audio at the level set on the Audio tab.
  • Retime — drag a segment’s edges to nudge timing if a translated line is longer / shorter than the original.
  • Re-generate one segment — click a single voiceover segment and hit Re-generate in its panel. Useful when one specific line came out wrong but the rest are fine.
  • Edit text — every segment’s translated text is editable. Tweak the wording, hit re-generate, the segment updates in place.

Mixing with the original mic track

You’ll usually want to mute or duck the original microphone when shipping a localized version, so viewers hear the AI voice instead of your English. Open the Audio tab and:

  • Pull the Microphone slider to 0 to silence your original voice entirely.
  • Or leave it at a low level (10–20%) if you want a faint trace of the original underneath the voiceover.

The voiceover slider on the same tab controls the AI voice’s loudness independently.

Word-level timing for animated subtitles

If you’ve used the Word highlight subtitle animation, the voiceover’s per-word timestamps need to be re-aligned to the new audio. TinyRec handles this automatically — generated voiceovers come back with word-level timing data baked in, so the subtitle word-highlight animation hits the right word at the right moment in the new language.

What’s next

  • Subtitles — the source of every voiceover. The cleaner the transcript, the cleaner the translation and synthesis.
  • Audio editing — mix the voiceover against the original mic, system audio, and background music.
  • Multi-scene script flow — for projects where you want one voiceover per scene rather than one per cue.