Google’s Veo 3 Now Generates AI Audio

Category: Technology | Published: 2025-07-17

From Silent Clips to Fully-Sounded Scenes

Announced at Google I/O 2025, the company’s annual developer conference, Veo 3 marks a significant leap in AI video generation by breaking the sound barrier. Unlike earlier models that produced silent clips requiring manual audio dubbing, Veo 3 natively generates both video and sound in response to user prompts. That includes environmental ambience, footsteps, character dialogue, and background music, all tightly synced with the generated visuals.

_“For the first time, we’re emerging from the silent era of video generation,”_ said Demis Hassabis, CEO of Google DeepMind. _“You can give Veo 3 a prompt describing characters and an environment, and suggest dialogue with a description of how you want it to sound.”_

This appears to mark a clear departure from the static video outputs of Veo 2, which could render realistic 1080p clips but had no inbuilt audio functionality. Veo 3’s ability to generate both media types simultaneously is underpinned by multimodal training, allowing it to understand and translate visual scenes into contextually accurate sound.