Finally, the moment has arrived: I no longer need to write while distracted by thoughts of presentation. With a series of tools upgrading from Gemini-2.5 to Gemini-3, and with Nano Banana Pro nearly perfectly solving issues with text rendering and the authenticity of visual elements.
Over the past decade, I've used a multitude of super text editors and note-taking tools—from the early days of Notion to Obsidian over the last five years. I even attempted to build several editors myself. Yet, just as I thought I was nearing completion, I ultimately returned to Google Docs. In truth, it could be any text editor—even vi would suffice. The reason for returning to Google Docs is simple: if the future digital world is model-driven, there isn't actually much humans need to record themselves: text, voice, photos, and videos.
The priority is no longer how to record, but how to preserve. Thus, simply because it syncs with Google Drive at all times and occasionally allows for Gemini-driven generation and optimization, there are no redundant steps.
Other text editors require setting up Google Drive synchronization and ensuring a consistent client experience across all PCs and phones. Everything should tend toward simplicity, not complexity.
Voice should be a more efficient tool than text; the input efficiency of mobile memos is definitely higher. However, a text editor serves as a great "working memory"—you can review preceding text anytime, stop to think, modify and reorganize words, or continue. Voice possesses a heavy temporal attribute; even with real-time translation, the efficiency drops drastically once you need to reorganize or edit, sometimes becoming less efficient than text. Yet, voice is an excellent prompter. Many of my inspirations come from the outdoors when I'm not facing a screen—on buses, subways, looking at the world through a camera, exercising, or just staring out the window...
Text is typically System 2—a slow thinking process. Voice is typical "attention," comprised of keywords; I’m not sure if it counts as System 1 or System 2—perhaps both.
The above preamble was written without any psychological pressure. The continuous progress of models and the expanding boundaries make me increasingly believe in two things: 1. Symbols do not necessarily represent intelligence, but they are the most important carriers of it; 2. Today's AI is taking massive strides toward becoming an unprecedented "executor." I am not yet certain if this will lead to the intelligence or AGI everyone expects, but this incredibly solid step will produce an impact far greater and more profound than any previous industrial revolution.
At the very least, the question humans should seriously consider now is not whether they will be replaced by AI, but how we can effectively divide labor with this new species. Since I can currently write text without inhibition, in the next three or four hours of foreseeable peace, I will record my past practices, experiences, and thoughts in text. Whether it is systematic doesn't matter—my "assistant" will help me organize it. Whether the layout looks good doesn't matter either—it will handle that too, even producing clear and concise graphics.
So, where to begin?
Let’s start with "White Space." Over the past three years, AI has continuously improved and expanded my efficiency and boundaries. This has become an obvious reality, but it is simultaneously doing the same for every serious user. From a competitive perspective, technological progress is fair. Therefore, its value to me has never been about what I produce, because no single output is strictly necessary. My work is merely helping the world consume energy faster—accelerating "entropy increase."
However, if I didn't engage in this "ineffective yet efficient" labor, I might lose many opportunities for thought experiments, lose the practical cognitive iterations of "symbolism," and lose the chance to increasingly feel the clear dividing line between thinking and execution.
Therefore, "white space" to me means vacating large amounts of time for "doing nothing," or pushing the boundaries of AI execution rather than executing in place of AI. To me, white space means using my eyes to see more and my brain to wander more: what I see will be scattered, and the wandering will be sparsely distributed.
Today's AI's greatest value to us is its ability to use mathematical models and technical means—which we needn't worry about—to add coordinates to these discrete and sparse points. This facilitates both model processing and human observation. This is essentially the true role of embedding. While translated as "insertion" in Chinese, it should really mean anchoring or coordinate setting. Model pre-training is about assigning increasingly precise coordinates to more information. Once all information is anchored, we see a very high-dimensional sparse space, filled with white space.
This is essentially the fundamental belief of symbolism: if we can assign precise enough coordinates to enough information, it should be intelligence itself. That should have been enough, but once anchored, AI is free; its white space is frictionless, so it "hallucinates" and speaks without inhibition, which frightens people.
Consequently, we have post-training—a form of human-corrected discipline: some phrasing doesn't fit human grammar and must be changed; some values are incorrect and cannot be spoken...
Humans cannot stand white space; we cannot bear the unshielded and unmasked. We must erect walls, build bridges, and use so-called "Chain of Thought" to make AI appear "intelligent."
It’s quite absurd, yet we all like it. Including myself, we want AI to be a qualified, efficient "beast of burden"—a great executor. Thus, we brought out the long-awaited word: Agent. The English meaning is accurate—proxy or executor—but our profound Chinese language had to make it more grand: adding a physical concept to intelligence, calling it a "Smart Body" (智能体).
In reality, we don't want these "Smart Bodies" to possess actual "intelligence." Isn't that an irony everyone, myself included, revels in?
The irony is merely in the naming; it doesn't really matter. At least under such "discipline," it can return to a normal track. Proof of model progress becomes more understandable: constantly refreshing benchmarks, working longer hours at lower costs, and completing more tasks.
So the question arises: Will AI replace humans?
Asked simply, the answer is definitely yes. Although human warmth is important, people's acceptance of AI customer service and AI streamers seems higher than we anticipated. Companies, especially tech firms, are seeing large numbers of entry-level technical and middle-management roles vanish. The issue of unemployment is being discussed more and more. Of course, there are counter-arguments: every industrial revolution eliminated many jobs but created many others.
I agree with both sides. But to me, the problem is simply time. The replacement of weavers by steam-driven looms lasted decades, while creating many machine-operator roles. The impact of electricity and internal combustion engines also lasted a long time and created many employment scenarios. The internet revolution shortened job displacement to a decade; while it created e-commerce, logistics, ride-hailing, and live-streaming, the global division of labor it drove significantly impacted traditional manufacturing.
Yet the Fourth Industrial Revolution we are experiencing is evolving in six-month cycles. The mass replacement of traditional customer service, junior programmers, and junior designers is already happening. Many in these roles relied on three or four years of professional education to adapt; expecting them to find and adapt to new roles in less than six months is difficult, even a low-probability event.
In a game of time, many people may not possess the ability to convert quickly.
When I attend industry conferences and hear almost everyone discussing how much labor their products can replace, I don't feel happy. I know it's almost inevitable, yet it is tragic.
I don't want to dwell on this for too long. Ultimately, doing one's best to manage one's own corner of the sky is already an exhaustive effort for many. I believe humanity will find a solution; history has proven that humans can adapt to any change.
However, our generation (born in the 70s and 80s) is both lucky and unlucky.
Our luck lies in the fact that when you grow up seriously following the "standard"—repeatedly doing standardized work, improving scores on standardized tests, regardless of university prestige—and work according to standardized patterns, as long as your luck isn't terrible, the results are generally passable, even if the variance is large.
Our misfortune, however, is that those very compliant standards have become the best reference samples for today's AI. Our hard-working pasts have become specific milestones. When decades of exhaustive effort are turned into snippets of binary code, we all become members of a long line waiting for judgment.
There’s no need to peddle anxiety, no need to trick people into thinking "it's over if you don't learn AI," and no need to criticize the educational system. It’s just that the good times of the past few decades gave us a groundless confidence in linear extrapolation. It's just the human-designed academic ladders and career plans starting to "backfire."
It is we who gave up interesting souls and critical thinking; it is we who have been constantly pursuing the "law of large numbers."
However, the higher institution that claimed to cultivate "free and useless souls" has changed.
I've always thought neural network training was fascinating. Regardless of the optimizer, gradient descent is widely used. In human society, gradient descent is essentially following the path of least resistance. But the accumulation of every path of least resistance might be miles away from the optimal path. But that's okay; gradient descent offers the strongest certainty. And maximal certainty is the easiest way to gain trust within a social division of labor.
Thus, humans and models have formed a closed loop on this point. AI is perfectly suited for "involution" (internal competition), isn't it?
This question, it seems, can only be addressed up to this point.
This is another example of AI moving further away from "human intelligence," except this "human intelligence" should be in quotes—it’s the thing we thought was intelligence, the very thing moving away from us.
I return to a question I thought I had an answer to: education. Or realistically, what should children learn? Which major should they choose? I used to be firm on math and computer science. Yet, I don't know when I became less certain. Although I still believe these are the most important foundational skills for the future, when we discuss "choosing a major," we are usually considering a very realistic problem: employment.
AI is invading human jobs industry by industry, no matter how beautifully we phrase it.
Unfortunately, I didn't finish what I intended to within that three-to-four-hour window of uninterrupted time. Perhaps I never really knew what I wanted to finish.
Instead, in the three days following that window, I completed much other "work."
My rhythm has increasingly become: either finish in one go or let it end in an uninspired mess.
Looking back at these words three days later, I no longer have that specific emotion, and thus I’ve lost the possibility of extending that train of thought in a short time.
I seemed to want to talk about education. I seemed to think I had an answer.
But perhaps I don't.
Maybe it's just because I can't find a "perfect answer" that convinces myself.
Just as everyone desires a perfect lens: razor-sharp focus, creamy bokeh, a soft and painterly style, incredibly sharp yet possessing perfectly natural transitions—every detail rendered clearly where needed, and every artistic touch feeling unforced. In a word: every pixel must be objectively and subjectively perfect.
Yes, such a lens does not exist.
But AI image generation will certainly be able to do it in the near future.
I imagined many ways this article might end, but none of those visions were even remotely close to the current state. Because this is just a picture.
I quite like this one. In the past, I would have thought about how to capture a slightly better state. Now, imperfection is fine. All our results will eventually be meaningless, but we might remember every moment we pressed the shutter.