Gemini 2.0加持下的最佳AI应用已经出现,还完全免费

Gemini 2.0加持下的最佳AI应用已经出现,还完全免费


Over the past two years, my most important work platforms have been Obsidian and the Excalidraw for Obsidian plugin. Although AI features can already be integrated into the plugin, the solutions remain quite complex.

It's not that I haven't considered writing a custom plugin for Obsidian better suited to my needs, but with my current volume of data, Obsidian has basically reached its performance limit.

The main issue stems from the fact that Obsidian is file-system-based. While its closed-loop system allows for custom plugins, modifying the backend is significantly difficult.

Half a year ago, I turned my attention to two open-source alternatives: tldraw and Blocksuite. However, during the process of rewriting, I found tldraw's backend functionality to be relatively weak—though it has immense potential, the development workload is huge. Blocksuite's document structure is complex, and influenced by the commercial version Affine, the code changes drastically with every update.

I have always believed that as model capabilities continue to improve, many complex development architectures can be optimized. Current "pain points" might soon be resolved due to enhanced model power. Consequently, I spent over six months in a state of internal conflict: wondering if the solutions I was considering were worth the time to implement or if I should wait for others to do it.

Then, at the end of the year, "bliss" arrived: tldraw computer.

In the AI era, all we need is a blank Canvas. This view is gaining more recognition, and more and more products are moving in this direction.

However, there is no doubt that it was only after the launch of Gemini 2.0 that this "possibility" truly became a reality:

  1. We need a model that can truly handle various modalities and complete diverse tasks;
  2. We need to build simple, beautiful, and high-performance interactions on top of such a model backend.

The first item is thanks to Google DeepMind, and the second belongs to the tldraw team.

A simple example screenshot is as follows:

image

Story Generation --> Image Generation --> Audio Narration --> Audio Transcript.

Since ComfyUI last year, I have used many low-code AI tools, but none can match tldraw computer in being both powerful and simple.

  1. As mentioned above, it supports all modalities;
  2. The model requires no configuration—just use it;
  3. No parameters need to be adjusted for any component;
  4. Component types are very simple, with only five: Prompts, Text, Image, Audio, and Flow Control.

image

It can even perfectly handle loops and branching logic.

Additionally, there's Claude 3.5's strongest feature: Artifacts.

image

On the right of the image above is a web-based PPT generated directly: Gemini 2.0 handled the code.

Alright, I'm going to try letting a model play chess against another model.

So far, this is the most perfectly matched AI tool for my application field.

I also believe many exciting updates are coming:

  1. Open Source: tldraw itself is open-source. I believe that once this experimental tool matures, open-sourcing the code is highly probable.
  2. Support for interaction with embeds: tldraw supports a range of embeds. Integrating the data flow between them would be "perfect."

image

  1. Voice or chat interface interaction: If we could even skip the steps of manually drawing components and dragging them, and instead generate complex workflows directly through dialogue, that would be "god-tier."

The above isn't hard.

Thanks to Google DeepMind's Gemini 2.0—not just for the model capability, but for opening up all modality APIs and making it free.

← Back to Blog