Slop implies capability
If the current paradigm can produce convincing, satisfying slop, it can also be useful
I’ve been seeing a lot of people react negatively to OpenAI’s new Sora 2 video model. It’s understandable to not want more short form video content in the world, but I’m seeing too many people take from this that AI has no useful capabilities and is just a wasteful fad. I’ve had basically the opposite take: the ability to generate stupid but engaging slop is a pretty obviously wild new capability that shows the current deep learning paradigm is shockingly capable. If AI can produce engaging slop, it can do a ton of wildly useful things too. Slop implies capability.
This is my favorite Sora video, and AI video in general:
Watch that again, focusing on how the way the horse’s hooves move is so coherent and makes sense given what it’s doing.
This is obviously slop. It’s a meaningless short video that won’t benefit me at all beyond activating my dopamine for a moment. But it’s also a sign of pretty remarkable ability.
Sora wasn’t told the laws of physics, or how horses move, or what skateboards are. It was able to infer all this from scratch from intaking gigantic amounts of video data. By getting more and more data, and having more and more internal “knobs” it could adjust to respond to that data, it was eventually able to infer enough useful heuristics that it can create a convincing video of a horse riding skateboards. This is an especially great new video explanation of how this works:
There are a few things happening in AI right now that make objective assessments of its capability difficult. Among other things:
People are constantly overhyping what it can currently do, leading to general skepticism when it doesn’t match up.
The data center buildout has been massive, and headlines about the cost and impacts have led many to believe that AI needs to be a world-historic technology right this second to justify it.
It’s very easy to react to any new improvement in AI by just immediately identifying all the ways it doesn’t measure up to the buildout or the hype. There are also a lot of in-the-moment status games to win. But try to think back to yourself 5 years ago. Would you have been able to predict that AI video would be able to do this in 2025? Here was the best state of the art research on AI video at the time:
Now we can make videos like this:
Sora and chatbots both depend on the transformer architecture, the pattern-recognition and heuristics-generating tool that’s been the backbone of most AI progress over the last decade. Transformers take in huge amounts of training data, and adjust huge amounts of internal parameters to pick up on more and more subtle heuristics and patterns. This ability to detect such complex and nuanced patterns is on display in the horse video. There is no way you could personally code clearly-written rules into a machine to make it so generally knowledgeable about the subtleties of video and motion that you could just type “Make a horse jump a ramp” or “Make Spongebob get a ticket” and expect it to give a useful output. This is why deep learning is so useful. It’s a way to get a machine to build its own soup of internal heuristics for how to handle complex situations. This soup of heuristics both succeeds and fails in surprising ways. We don’t design the heuristics themselves, and don’t even know what they are or how they work, in the same way we don’t know how a lot of the heuristics in our own brains work. We can now make machines that can pick up on such subtle heuristics that they can infer how a horse would look skateboarding over a ramp to the point that their output looks photorealistic.
It seems obvious that if deep learning can pick up on the patterns required to make the horse video, it can probably be used in a lot of other more important places in society. If it can independently figure out some rough approximation of the laws of physics to the point that you can be tricked into thinking videos of complex motion are real, surely it can help with weather forecasting, or have a huge number of specific applications for simple problems in climate, or might soon pose some new emergent risks? Sora 2 makes it especially hard for me to understand how anyone can say that deep learning models are “just glorified auto-complete.”1 They seem to be able to develop some pretty useful internal heuristics that correlate pretty strongly with what’s actually up in the world. If “glorified autocomplete” systems can produce photorealistic videos of anyone in any describable situation, the term stops being useful as a way of predicting what models can and can’t do.
ChatGPT has only been with us for 3 years. The time from the release of Windows 95 to Windows 98. The jump in AI capabilities has seemed much more significant to me over these 3 years than the jump in usefulness from Windows 95 to 98. An AI having the ability to beat out TikTok and Instagram as the most-downloaded short form video app (even for a moment) would’ve been unthinkable… a few weeks ago? Now I’m seeing Sora videos flood Instagram Reels. AI beating the slop machines at their own game, even temporarily, just seems like another win for people in the “AI is shockingly capable, and the capabilities are improving fast, so this really really doesn’t seem like a fad” camp. I’m worried that a lot of people are getting distracted by the stupidity of the outputs to notice how strange of a world we’re living in now.
I’m aware that this criticism applies specifically to chatbots, but they rely on the same underlying architecture
Thought provoking article. Thanks!
One thing to add. imo there is a big difference between rough approximations (e.g. of the laws of physics) and exact solutions and I'd say it's a bit simplified to say that if one is given, the other will inevitably follow.
It's not that AI can't be useful for more specific cases but just wanted to highlight that imo your argument isn't fully logically sound.
Example:
1. Draw a crab: Very large number of solutions that will give an satisfying result
2. Solve this equation: Exactly one solution
The latter is way harder to solve. I see lots of AI slop in the first category.
Again, it doesn't mean that AI can't do more specific things, but imo one doesn't inevitably follow from the other.
Thanks again for these thought-provoking articles, would be interesting to hear your take on this.