Sep 3, 2025

ChatGPT“助手”功能发布后：OpenAI与Agent之间差一个硬件

OpenAI has rolled out a new ChatGPT assistant feature named "Tasks."

Simply put, it allows users to schedule "timed tasks" for the model, which it executes at the set time and returns results. For example, an event reminder, or perhaps a daily news summary?

The example above shows a task to search for the latest news and return the results. To test it immediately, I set the execution time to one minute after the current time.

A minute later, it indeed returned the latest news results.

This is likely the most effective method currently available. Theoretically, it could be more complex—for instance, connecting to a personal OneDrive or Google Drive to perform data calculations based on the latest documents and then returning the results.

With this idea in mind, I tried a small experiment. Since ChatGPT is already connected to OneDrive, I gave it this task: write a randomly generated poem to OneDrive every day.

Unfortunately, although GPT started the task on time, it could not complete the file-writing operation. The reason is simple: it could not obtain write permissions.

From the perspective of the execution program itself, however, there was no issue.

What about the "Apps" tool feature OpenAI added recently?

I connected it to iTerm2, hoping it could open the terminal and run "ls." Instead, I received a "reminder" every half hour.

I tried connecting to Cursor as well, but it was just one reminder after another.

Indeed, it possesses no capabilities beyond the application itself; it can only provide repeated reminders.

OpenAI can certainly continue its quest for the "sea of stars" by building more complex models or, as it is doing now, constantly adding features to evolve ChatGPT into a super app.

However, the AI era does not need a super app. An app that cannot transcend its own permissions is merely a generative assistant tool. An agent that can only offer advice but cannot directly execute tasks cannot be called a true Agent.

Perhaps the gap between OpenAI and a true Agent is simply a piece of hardware.

Something truly driven by GPT, rather than just an API call behind a ChatGPT icon.

For voice interaction, ChatGPT already has the Advanced Voice Mode.

For video chat, the current ChatGPT app allows for Q&A sessions with the camera on.

Reasoning models? o1's performance is already excellent.

Education? Interaction is possible via the camera, and it can assist with homework hints while providing companionship.

Perhaps OpenAI’s true ambition lies in humanoid robots with General Intelligence.

Yet, watching OpenAI cram features into its application one after another—each limited by the app's form factor, clearly wanting more while maintaining a facade of aloofness—I feel a bit sad for it.