Sora Under Fire: Did OpenAI Train Its Video AI on Netflix, TikTok and Game Footage?
OpenAI’s new video generator Sora is drawing attention for its ability to create near-Hollywood-quality clips from simple prompts.
What researchers are seeing
Independent tests and demonstrations suggest Sora can reproduce elements that closely resemble copyrighted material. Reporters and researchers say outputs can look suspiciously like trailers for shows such as Wednesday or like studio intros from DreamWorks. Some examples even appear to echo TikTok clips with visible watermarks and recognizable video game logos.
How that might have happened
Training modern video models typically requires enormous amounts of footage. Researchers report that scraping videos from platforms like YouTube and TikTok has long been part of model development. Tools and scripts exist that let developers collect millions of clips in bulk, effectively turning these platforms into rich data sources for machine learning.
Platforms often forbid this kind of scraping in their terms of service, but enforcement is uneven. Companies such as Nvidia and Runway ML have been reported to rely on large scraped datasets when building their models, and OpenAI claims it trains on ‘publicly available and licensed data’. Yet critics say the outputs suggest otherwise.
Legal and ethical questions
The central question is whether training on copyrighted content constitutes a breach or falls under fair use. Legal experts, creators, and ethicists are divided. Some frame it as analogous to learning from a library book: the model ingests examples to learn patterns. Others argue it amounts to unpermitted use of creative labor, particularly when outputs echo distinctive, commercial works.
A group of YouTube creators has already filed a lawsuit against OpenAI alleging misuse of millions of hours of transcribed audio. If courts find for the creators, the ruling could narrow the scope of what developers can use for training large models.
The stakes for creators and industry
On one hand, tools like Sora could democratize content creation, giving indie creators access to production values that once required big budgets. On the other hand, if models can reproduce the look and feel of established franchises without compensation or permission, professional animators, filmmakers, and editors may see their work devalued.
Ethicists such as Margaret Mitchell stress that the debate isn’t only legal but also moral: creators deserve a say in how their work is used. If consent and licensing are ignored, the social costs could be significant.
What might happen next
Ongoing litigation and policy decisions will shape whether the industry tightens rules around training data or continues to rely on broadly scraped content. If courts and regulators clamp down, companies may need to obtain clearer licenses and build cleaner datasets. If not, the next wave of synthetic movies, ads, and games could feel eerily familiar yet belong to no one in particular.
For creators and viewers alike, the outcome will determine how much of culture remains tied to its original makers and how much is reshaped by models trained on that material.