<RETURN_TO_BASE

Google's Veo 3 AI Video Model Struggles with Unwanted Subtitles

Google's Veo 3 AI video model introduces dialogue and sound generation but struggles with unwanted subtitles that users find difficult and costly to remove.

A New Leap in AI-Generated Video

Google launched Veo 3, its latest generative video AI model, at the end of May. This model builds upon its predecessor by enabling users to generate sounds and dialogue, producing hyperrealistic eight-second clips that have been quickly adopted for ads, ASMR content, imagined trailers, and humorous street interviews. Darren Aronofsky, an Academy Award–nominated director, even used Veo 3 to create a short film titled Ancestra. Demis Hassabis, CEO of Google DeepMind, compared this advancement to emerging from the silent era of video generation.

The Subtitle Glitch

Despite the excitement, Veo 3 has encountered a significant issue: it often generates nonsensical, garbled subtitles in clips containing dialogue—even when users explicitly request no subtitles or captions. Removing these unwanted subtitles is challenging and costly. Users have had to regenerate clips multiple times, use third-party tools to remove subtitles, or crop videos to eliminate the text.

Ongoing Attempts to Fix the Problem

Josh Woodward, VP of Google Labs and Gemini, announced in early June that Google developed fixes to reduce the gibberish subtitles. However, more than a month later, users continue reporting subtitle problems on Google Labs’ Discord channel, highlighting the difficulty of addressing issues in complex AI models.

Access and Cost of Veo 3

Veo 3 is available to paying subscribers of Google's tiers starting at $249.99 monthly. To create an eight-second clip, users input text prompts describing the desired scene into Google's AI tools such as Flow or Gemini. Each generation consumes at least 20 AI credits, with credits purchasable at $25 per 2,500.

User Frustration and Financial Impact

Mona Weiss, an advertising creative director, shares that regenerating scenes to avoid random subtitles is expensive. She estimates up to 40% of scenes with dialogue contain unusable gibberish subtitles. Although Google offered a refund for the Veo 3 cost, it did not cover wasted credits, leading Weiss to decline the offer to retain access.

Why Are Subtitles Persistent?

The root cause likely lies in Veo 3’s training data, which probably includes YouTube videos, vlogs, gaming clips, and TikTok edits—many containing embedded subtitles as part of the video frames rather than separate text layers. According to Shuo Niu, an assistant professor researching AI and video platforms, the model learns to include subtitles because it tries to mimic human-created videos, many of which contain them.

Challenges in Removing Subtitles

Google would need to review every frame of training videos to remove or relabel those with embedded captions before retraining Veo 3, a process that could take weeks. Negative prompts like 'No subtitles' tend to be less effective in guiding AI models, explains AI researcher Tuhin Chakrabarty.

Google's Response and Industry Perspectives

A Google spokesperson states ongoing efforts to improve video creation quality, encouraging users to retry prompts and provide feedback. Documentary maker Katerina Cizek criticizes Google for launching Veo 3 prematurely, suggesting the company prioritized being first in lip-synced audio generation over fixing subtitle issues.

🇷🇺

Сменить язык

Читать эту статью на русском

Переключить на Русский