Cowork的突破:衔接与批量化

Cowork的突破:衔接与批量化


Following yesterday's attempts, I wondered if it would be possible to use Claude's Cowork or Gemini's Antigravity for batch processing, and then build a website to view the evaluation and scoring results for each photo.

First, let's look at Cowork.

To batch process 342 images in a directory, Cowork assessed the quantity and opted for parallel processing.

It finished in about ten minutes.

Processing Completed

To be honest, I found the evaluation results somewhat unreasonable, showing a clear preference for different equipment. Is new always better? Is a larger sensor always better? Obviously not, but that's not the topic of this post.

The website was also built quickly. It's evident that the evaluations are a bit too sloppy, making me suspect the model likely didn't process the images one by one.

Automatically Generated Evaluation Website

Regardless, it's complete, and details can be updated incrementally.

For instance, updating using Gemini's Antigravity.

Updating with Gemini

Gemini Processing

Clearly, this one worked diligently. However, 342 images might have been too many; it only updated about forty or so.

Still, compared to the past, this is a massive leap forward.

(I have since optimized some processes to allow "Antigravity" to perform complete processing, but that's not the theme of this post; I'll demonstrate it later if there's a chance.)

If it were just these features and the capabilities shown last time, Cowork wouldn't have surprised me too much. The following two examples, while within the range of early predictions, prove that "thinking" of something is one thing, but actually implementing it is a completely different story.

In my quarterly offline (or online) exchanges, I prepare a series of materials individually, and AI has helped me a lot. In a previous article discussing Cowork, I demonstrated how to export a PDF into a series of images. Today, I added a requirement: can it complete a .doc document by automatically selecting image content, writing text, and reflecting it in the document?

Yes, it did it, all in one go.

Cowork Automated Document Generation Flow

Document Generation Details

Furthermore, it was rich in both text and images, effectively filtering the content in my directory.

Resulting Illustrated Document

This level of completion and fluency is something that even GPT-based Copilot hasn't achieved yet.

Of course, I no longer use Word docs, so it would be even better if it could write directly to Google Docs.

However, since Google Docs is usually opened via a browser, I used the following connector.

Google Doc Connector

Then followed a series of human-mimicking operations: opening the browser directly, entering Google Workspace, creating a new doc, and typing content.

Mimicking Human Browser Operations

Typing in Google Doc

However, it seems Cowork only knows how to "Insert Image" and not how to copy an image from a folder and paste it into the document. Thus, the "illustrated" part failed here.

That concludes the demonstration. As I've always said, none of this is particularly surprising. However, imagining or seeing it is one thing; achieving it is another.

In terms of application implementation, Cowork has achieved another major breakthrough. It can work for longer periods, so "batch processing" can increasingly realize the true positioning of an "assistant." By using MCP and Skills, it can utilize and connect more tools, delivering a "90% finished product" that allows humans to perform the final edits and decisions.

Yet, if you ask me if it's a disruptive product, I still have some question marks. Cowork relies on two foundations: first, the user needs very clear goals; second, Cowork itself can easily call various tools at the OS level, such as command-line tools, browsers, different document types, etc. However, now and for a long time in the foreseeable future, various unexpected situations will still arise during the connection process. Our desktops are not naturally prepared as an "ideal" operating environment for Cowork. Every new task is a start from scratch, and only the users themselves can provide enough "fault tolerance."

One viewpoint of mine remains unchanged: humans are the key to AI implementation, whether as a "booster" or a "barrier."

← Back to Blog