Sep 3, 2025

Meta开源405B模型为什么重要？

Just as the leaks suggested, Meta has released Llama-3.1, with the greatest expectation naturally surrounding the 405B model, which reaches the level of GPT-4o.

Of course, the 8B and 70B models with significantly improved capabilities were released simultaneously. It is expected that information regarding model usage and performance evaluations will be updated continuously over the coming period. Within less than 12 hours of the models being released for download, each model saw over 5,000 downloads.

2024-07-24-meta开源405b模型为什么重要-1s1tnm-1771993467932-7382.jpg

Regarding the scores disclosed by Meta, a brief discussion was also held in yesterday's article: Llama-3.1? 405B parameters? Open source but the best model?

Since I haven't had time to conduct my own evaluation yet, it is not appropriate to make subjective assessments of its capabilities. I plan to publish another article after deploying the 405B model for inference sometime next week. However, after reading Mark Zuckerberg's open letter today ("Open Source AI Is The Path Forward"), I feel it is necessary to seriously discuss one question: Why is Meta's release of LlaMA-3.1 today so important?

First, Zuckerberg answered this question in his open letter, using the Linux open-source ecosystem as an example. He believes that: 1. In the field of large models, open-source models will become the industry standard and the most important foundation; 2. Through open source, developers can meet specific needs, protect data, reduce costs, and avoid reliance on closed vendors; 3. For the world, open-source models will also help stimulate innovation, drive the widespread application of AI technology, and promote transparent and fair technological development.

Open Source AI Is the Path Forward

https://about.fb.com/news/2024/07/open-source-ai-is-the-path-forward/

Regardless of individual evaluations, Meta is increasingly becoming one of the most important forces in the field of large models. Founder Zuckerberg's high level of attention, even moving to the front lines to oversee progress, is a key reason. Zuckerberg's open letter is well-written, with a grand vision, and is well worth a careful read.

Although this is naturally driven primarily by commercial "calculations," the open-sourcing of the LLaMA-3.1-405B model still holds significant meaning:

This is the first "widely recognized" open-source model (open weights) to reach the level of GPT-4o/Claude 3.5. In just over a year, the best open-source models have closed the gap with the best closed-source models. Looking back in the future, today might likely be defined as the "true beginning of the new AI era" (the use of closed-source models is inherently unequal, whereas open-weight models have no barriers to access);
LLaMA-3.1 was trained on 15T tokens, which essentially means all public text information on the internet has been "compressed" into the model. If GPT-4(o) is an "encyclopedia" with reasoning capabilities, then the release of LLaMA-3.1-405B means everyone can own a "private encyclopedia" with reasoning capabilities. Its impact on various industries and society will far exceed that of a single closed-source model (an enhanced version of "a single spark can start a prairie fire");
The greatest significance of the 405B model is not direct inference, but its use in distilling efficient "small models." Objectively speaking, the 405B model has high hardware requirements. From an inference perspective, even using an INT4 quantized version, it requires nearly 200GB of VRAM. While I have always believed Apple Silicon chips and systems are excellent hardware for inference, and an M2 Studio with an M2 Ultra chip (192GB RAM) should be able to infer the INT4 version as a single machine, the inference speed will certainly not be optimistic (5 tokens per second would be a good performance). For many, direct inference with the 405B model is unrealistic, but we can use the 405B model alongside high-quality vertical domain data through knowledge distillation to obtain highly capable "small models." This might be the correct way to unlock B-side implementation;
A model that everyone can own will inevitably drive rapid progress in "inference optimization," continuously lowering computing costs. The 405B model provides all developers and enterprises with a brand-new choice to deploy GPT-4-level models privately. Although inference speeds may be slow for now, as mentioned above, I believe "greatest experts are among the people." A large number of "inference optimization" methods will surely emerge in a short time, significantly increasing inference speed and thus rapidly reducing computing costs and accelerating this round of AI penetration (though for most people, this is not necessarily a pleasant thing);
If OpenAI was indeed holding back before, they must now present something real—this is a good thing;
For most model companies, this is also a good thing: the path of LLaMA-3.1 is quite replicable. Although the business model of making money through the model itself is basically dead, there will surely be many other paths;
I like to refer to the development of photography: although professional camera capabilities are stronger than ever, the most widely used cameras have long been smartphones. Similarly, the rapid development of models may ultimately give rise to new forms of "hardware," although we have only seen prototypes and no mature products yet (e.g., Friend: another useful but potentially highly controversial AI hardware);
Perhaps, based on 405B, many applications and business processes are worth redesigning;