物理AI是什么形状的?

物理AI是什么形状的?


Writing this title, many friends might already know what I want to say in this piece. Yes, it’s a question that has been debated for quite a while: Does "Embodied AI" necessarily have to be humanoid?

The catalyst, of course, was the surge of bizarre "robots" appearing as expected at CES, fueled by Physical AI. But it's also a question that has been circling in my mind for some time, and I've developed some preliminary views.

Before I lay out my perspectives, I need to make some "disclaimers":

Regardless of how concepts are defined, I remain extremely bullish on "Physical AI." In fact, I was likely one of the first in the market to be optimistic about embodied intelligence and "robots."

My views are merely the observations and reflections of an independent individual; they are not supported by extensive, solid data or empirical evidence. They are neither representative nor rigorous.

The capital market has its own laws of operation, and the core variables people care about differ at each stage. At this moment, my views certainly have nothing to do with core variables like "Tesla chain orders."

Now, I can share my "quick views":

Intuitively, it will take at least five years for a robot that begins to "look like a human," or for the "ChatGPT moment" of robotics (see, the expression is that imprecise), to arrive. After some rough calculations and thought, I’ve revised this answer to at least ten years.

In terms of "imagination," Physical AI or embodied intelligence must be "humanoid"; otherwise, its market value might be less than one-tenth. In the development of the human world to this point, the only things worth going "all-in" on are "creating humans" or "creating gods." Few would loudly proclaim, "We only need tools," even though, in reality, what we want are indeed "tools."

More and more people are clearly seeing the limitations of LLMs. "Phenomenal intelligence" based on data compression or knowledge compression simply cannot reach the kind of "intelligence" we humans perceive, even though we can't quite define what human intelligence is.

Of course, even if it is "humanoid," I firmly believe that such intelligence belongs to an entirely different dimension than human intelligence. Physical AI will have its own answers. However, for now, the biggest obstacle seems to be the severe lack of data—I once roughly calculated that it’s short by at least ten orders of magnitude.

The physical world we inhabit is prepared for humans, or rather, modified by humans to suit ourselves. If we truly create another kind of "human," it should have a completely different form: just as cars replaced horse-drawn carriages, electricity replaced manpower, and the internet replaced traditional communications. Not to be pedantic, "replacing" never means 100% elimination.

To be more specific, many objects in our daily lives are said to follow ergonomic principles. Their value and form exist to be "used by humans," but that doesn't mean that because humans have such dexterous hands, future Physical AI needs "dexterous hands." These may all be transitional forms.

This leads to the final question regarding complexity and the speed of progress. If complexity is too high and progress is too slow, transitional forms will persist longer, and perhaps eventually, "existence becomes its own justification."

I am increasingly inclined to believe that complexity is significantly higher than we expected, and it is likely that as we solve more problems, more challenges will emerge. This aligns with the basic facts of modern technological progress over the past few centuries.

So, if transitional forms are viable in the long term, I have some "imprecise" inferences:

We are still very, very far from "humanoid robots" suitable for multiple scenarios. The realistic choice is "machines" for specific scenarios.

As a Chinese person, it's likely clear that the highly developed domestic manufacturing industry has already compressed "pure manufacturing costs" (excluding materials, energy, etc.) to an extreme limit. There is actually very little room left for "robots" there. This conclusion was also the biggest point of skepticism when "embodied intelligence" was first discussed.

Much of the imaginative space lies in life scenarios. However, the complexity of life scenarios is extremely high, requiring the handling of many corner cases. In fact, the demand for "intelligence" is very high. Even for phenomenal intelligence, we still lack data by at least ten orders of magnitude. Some data can be solved through simulation and synthetic data, but most cannot. If it could, we would probably have reached L5 autonomous driving by now.

There are two significant differences between "robots" in life scenarios and smart driving cars: driving is essentially a relatively closed scenario with a defined objective function. Every problem encountered has an "optimal solution." There are no obvious obstacles to this path in terms of mathematical logic. Furthermore, for every car sold since the early days, there is a cost-bearer—consumers need cars, bear all costs, and provide sufficient data to manufacturers through their daily driving. This flywheel has been spinning for a long time and is now at a moment of accelerated breakthrough. What about "robots" in life scenarios?

What is the transitional form? I have some ideas but don't dare to discuss them too much in public. Perhaps I can only hint at it: under the same standard of technical reliability, is it easier and cheaper to implant very inexpensive automatic control modules into various doors, or is it easier and cheaper to teach a "robot" how to handle various "door handles"?

I particularly like a sentence that has been echoing in my mind since the second half of last year: "This world, ultimately, is physical." I also loved the feeling I had twenty years ago when I first opened an Economics textbook: "Isn't this just 'General Physics'?"

Of course, perhaps adding the second half would make it more complete: "The kind that lacks 'predictability'."

But it is precisely this "posteriori" nature that might offer more value to human society because this world, after all, belongs to "humans." We cannot escape that "invisible hand."

← Back to Blog