<RETURN_TO_BASE

Google DeepMind Unveils ‘Motion Prompting’ for Precise Video Generation Control at CVPR 2025

'Google DeepMind and partners have developed Motion Prompting, a novel method allowing detailed control of video generation via motion trajectories, showcased at CVPR 2025.'

Introducing Motion Prompting for Video Control

Researchers from Google DeepMind, the University of Michigan, and Brown University have presented an innovative approach called “Motion Prompting” at CVPR 2025. This method enables precise control over video generation by using specific motion trajectories rather than relying solely on text prompts.

What Are Motion Prompts?

Motion prompts are representations of movement that can be sparse or dense, capturing the motion of points over time. This flexible system can describe anything from subtle hair movements to complex camera motions. The team trained a ControlNet adapter on top of the Lumiere video diffusion model, using a dataset of 2.2 million videos annotated with motion tracks extracted by the BootsTAP algorithm.

Motion Prompt Expansion: From Simple Inputs to Detailed Motion

To simplify user interaction, the researchers developed "motion prompt expansion," which converts high-level inputs like mouse drags into detailed motion instructions for the model. Users can interact naturally with images, for example dragging a parrot’s head to make it turn, or manipulating hair, with the model generating realistic video sequences. The system even exhibits emergent physical behaviors, such as sand scattering realistically when pushed.

Diverse Applications of Motion Prompting

  • Object and Camera Control: Users can manipulate objects or camera angles precisely by interpreting mouse movements as instructions for geometric primitives. This allows complex movements like rotating a cat’s head or orbiting a scene with depth estimation.
  • Motion Transfer: Motion from a source video can be applied to different subjects in static images, such as transferring human head movements onto a macaque, effectively puppeteering the animal.

Performance and Evaluation

Extensive quantitative evaluations and human studies revealed that the Motion Prompting model outperforms recent techniques like Image Conductor and DragAnything in image quality and motion accuracy metrics. Participants preferred videos generated by this model for better motion adherence, realism, and visual quality.

Current Limitations and Future Prospects

Some limitations include occasional unnatural object stretching when parts are incorrectly locked to the background. However, these artifacts provide insights into the model’s understanding of physical dynamics. This research marks a significant advancement toward interactive and controllable generative video models, potentially setting new standards in AI-driven video production.

Additional Information

For more details, see the paper and project page. Follow the researchers on Twitter and join relevant communities to stay updated.

🇷🇺

Сменить язык

Читать эту статью на русском

Переключить на Русский