Introduction: The Illusion of AI Progress
Over the past few years, artificial intelligence has made extraordinary progress.
Large language models can:
- Write essays
- Generate code
- Solve complex reasoning problems
Computer vision systems can:
- Recognize objects
- Interpret scenes
- Analyze images with high accuracy
From the outside, it appears that the “intelligence problem” has largely been solved.
So why aren’t robots everywhere?
Why can’t a humanoid robot walk into your kitchen, cook a meal, clean the dishes, and adapt to your home?
The answer is deceptively simple:
Because robots don’t just need intelligence—they need experience.
And experience, in the context of robotics, is fundamentally a data problem.
1. The Difference Between Digital AI and Physical AI
1.1 Abundance vs. Scarcity
Digital AI thrives on abundance.
- Billions of web pages
- Trillions of words
- Massive datasets
This abundance enables rapid training and iteration.
1.2 The Scarcity of Physical Data
Robotics operates in a fundamentally different regime.
There is no “internet of physical interactions” that robots can learn from at scale.
Every data point requires:
- A real-world action
- A physical environment
- Time and energy
This makes data:
- Expensive
- Slow to collect
- Difficult to standardize
2. Why Robotics Data Is Hard
2.1 The Cost of a Single Data Point
In software:
- Generating data is cheap
- Simulation is often sufficient
In robotics:
- Each interaction involves hardware
- Failures can cause damage
- Experiments take time
A single grasp attempt may take seconds or minutes.
Scaling this to billions of examples is non-trivial.
2.2 The Long Tail of the Real World
The physical world is messy.
A robot must handle:
- Different object shapes
- Changing lighting conditions
- Unexpected obstacles
- Human interference
Unlike digital environments, the real world has an infinite edge-case problem.
2.3 Lack of Standardization
In language models, text is standardized.
In robotics:
- Sensors vary
- Environments differ
- Tasks are not uniform
This makes it difficult to build universal datasets.
3. Simulation vs. Reality
3.1 The Promise of Simulation
Simulation offers:
- Scalability
- Speed
- Safety
Robots can train in virtual environments without physical constraints.
3.2 The Reality Gap
However, simulation has a critical limitation:
The sim-to-real gap.
- Physics may not match perfectly
- Sensor noise is different
- Real-world unpredictability is hard to model
A robot that performs well in simulation may fail in reality.
4. Data as the New Competitive Moat
4.1 Lessons from Autonomous Driving
In autonomous driving, companies that succeeded focused heavily on:
- Data collection
- Real-world testing
- Continuous learning
The same principle applies to robotics.
4.2 The Flywheel Effect
Data creates a feedback loop:
- More robots deployed
- More data collected
- Better models trained
- Improved performance
- More deployment
This creates a self-reinforcing advantage.
4.3 Why Early Deployment Matters
Companies that deploy early—even with imperfect systems—gain:
- Real-world data
- Faster iteration cycles
- Competitive advantage
Waiting for perfection can be a losing strategy.
5. The Role of Humanoid Robots in Data Collection
5.1 Why General-Purpose Bodies Matter
Humanoid robots can:
- Perform diverse tasks
- Operate in varied environments
- Collect broad datasets
This makes them valuable as data collection platforms.
5.2 Learning from Human Demonstration
One promising approach is:
- Observing humans
- Imitating actions
- Refining through practice
This combines:
- Human knowledge
- Machine scalability

6. The Missing Infrastructure
6.1 Data Pipelines for the Physical World
Robotics needs:
- Standardized data formats
- Shared datasets
- Scalable collection systems
This infrastructure is still in its early stages.
6.2 Hardware-Software Integration
Unlike software, robotics requires tight integration between:
- Sensors
- Actuators
- AI models
This complexity slows down progress.
7. Economic Implications
7.1 Capital Intensity
Robotics companies require:
- Hardware investment
- Physical testing environments
- Long development cycles
This makes them more capital-intensive than pure software companies.
7.2 Barriers to Entry
The data problem creates high barriers:
- Difficult for new entrants
- Advantage for well-funded players
- Importance of partnerships
8. Emerging Solutions
8.1 Self-Supervised Learning
Robots can learn from:
- Their own actions
- Trial and error
- Minimal human labeling
8.2 Shared Learning Systems
Future robots may:
- Share experiences
- Learn collectively
- Update models globally
This accelerates learning across fleets.
8.3 Hybrid Approaches
Combining:
- Simulation
- Real-world data
- Human guidance
offers the most practical path forward.
9. The Strategic Insight: Data > Models
The key insight is simple but profound:
Better models are not enough without better data.
In robotics:
- Data defines capability
- Experience defines intelligence
- Deployment defines success
10. Rethinking Progress in Robotics
Progress should not be measured by:
- Benchmarks
- Demos
- Prototype performance
But by:
- Real-world reliability
- Adaptability
- Scale of deployment
Conclusion: The Slow Path to Real Intelligence
The future of humanoid robots will not be determined by breakthroughs in algorithms alone.
It will be shaped by:
- Data collection
- Real-world experience
- Iterative learning
In this sense, robotics is less like software—and more like raising a child.
It learns slowly.
It makes mistakes.
It improves through experience.
And that is precisely why progress feels slower—but may ultimately be more profound.
Discussion about this post