I’ve been thinking a lot about the whole “data wall” thing with LLMs lately. It’s the idea that LLMs can’t or won’t improve because we’ve exhausted all the possible training data. I don’t buy it. The best models are appearing to plateau, but it’s not a lack of training data.
The real issue? We’re asking them a ton of questions that have clear-cut, correct answers. Think about it - if you ask an LLM what 2 + 2 is, and it says 4, how can you improve on that? You can’t.
It boils down to one idea: You can’t improve upon truth.
So here’s what’s happening:
It’s not a data wall we’re hitting - it’s just that squeezing out extra gains when they’re all correct so often is tough.
Now, here’s where things get interesting. Like many others, I think the next big leap is going to be in agentic and action-based improvements. We need to set up better evals for that kind of stuff though.
Take the LMSYS leaderboard, for example. It’s cool and all, but it’s mostly judging what a human finds as a good or bad response. That’s a start, but we need to go further.
In my work as an AI engineer, and in most complex AI projects I see shared on X, there’s a notion of a “planner.” What we really need is a planning-based eval. Here’s what I’m thinking for a planning-based eval:
This kind of eval would give us a much better idea of how these models can actually help in real-world applications. It’s not just about spitting out facts anymore - it’s about using that knowledge to create actionable plans.
So yeah, I don’t think LLM development is hitting a wall. We’re just at the point where we need to change how we’re measuring progress. And that’s pretty exciting if you ask me. The future of AI is all about action and planning, not just regurgitating facts. I can’t wait to see where this goes, and I’m super thankful I get to be here to experience it!
- Joseph
Sign up for my email list to know when I post more content like this. I also post my thoughts on Twitter/X.