What is open-source AI?
Open-source AI refers to AI models released under open-source licenses, but it's not quite that simple.
OpenAI doesn't create open AI models: its various GPT and DALL·E models are all proprietary or closed-source. And what about Meta's Llama ? No matter how many times Mark Zuckerberg says it, it's not open source – although it is, unlike OpenAI's models.
Generally speaking, there are three main types of AI models:
- Monopoly
- Open source
- Open
These categories apply to both large language models (LLMs) and text-to-image models. Things are still being shaped, and the Open Source Initiative is currently developing a rigorous definition of what is required for an AI model to truly be considered open source, but let's look at the current situation.
What is open source?
Before examining open-source AI models, let's take a look back at what open source actually means. It's not just a random trendy term: the Open Source Initiative (OSI) maintains a definition that fully describes its philosophy and fundamental requirements. It's released under the Creative Commons Attribution 4.0 International License, but here's the main idea.
Open source doesn't just mean you can freely download or access the source code. It must be available for anyone to use and modify in any way they want and for any purpose. Open-source licenses shouldn't restrict any "areas of operation," which is where many open-source AI models fall short.
OSI maintains a list of approved licenses, but some of the major ones are the Apache 2.0 License, the MIT License, and the GNU Public License.
What is a proprietary AI model?
Proprietary AI models are some of the most popular and powerful models available. These models are developed and operated by private companies, and their source code, training strategies, model weights, and even details like the number of parameters are often kept secret. The only way to access a proprietary model is through some official service such as a chatbot, API, or tool built using the API.
Let's take OpenAI's GPT-4o as an example . We don't know what data it was trained on or how many parameters it has. The only way to access it is through ChatGPT , OpenAI's API, or an application that uses GPT-4o, such as Perplexity or Zapier Chatbots.
And of course, OpenAI charges for access to GPT-4o. If you want to use it—and it's one of the best AI models available—you can pay $20/month for ChatGPT Plus , or pay to use the API by subscribing to another service or building something yourself. You can't just download GPT-4o and run it on your own server.
The same is true for all other proprietary AI models, including:
- GPT-4o mini and DALL·E 3 from OpenAI
- Claude 3 and Claude 3.5 from Anthropic
- Gemini and Imagen 3 from Google
- Command R and R+ from Cohere
- Midjourney
What is open-source AI?
Open-source AI refers to AI models released under an open-source license, but it's not quite that simple. Researchers have discovered that many models that claim to be open-source are actually not. This process is called "open-washing," and it complicates things significantly… even for those writing about AI models.
The chart shows the degree of "openness" of several AI models.
Currently, the Open Source Initiative is working to develop a definition of open-source artificial intelligence (AI) because existing licenses don't truly cover all the technical aspects of the current generation of AI models. To truly meet the requirements and philosophy of open-source software, not only the model's source code needs to be freely available, but also the training data, training code, parameters, and much more. The software needs to be shared under an open-source license, while things like training data and how-to descriptions need to be shared under Creative Commons licenses—or similar open licenses.
Furthermore, it's difficult to overstate the level of freedom offered by open-source licenses. The strictest licenses essentially require you to publicly release everything you build with them—and give credit to the original developers. That's it!
What is an open AI model?
Open models fill the gap between proprietary, closed AI models and the ideal of truly open-source AI models. (Until the OSI provides its definition, the closest model to that ideal is OLMo 7B).
Simply put, open AI models are offered for free to a certain extent. Typically, you can download them from Hugging Face and other modeling platforms and run them on your own device after agreeing to any licensing terms provided. You can usually retrain them with your own data to create your own models, and then build your own chatbots and applications based on them. In most cases, you can delve into elements like model weighting and system architecture to understand how they work (to the best of your ability).
Open licenses may still allow for widespread use, but they have some additional limitations that the open-source model wouldn't. For example, Llama 3's license allows commercial use for up to 700 million monthly users and blocks certain uses. You or I could build something with it, but Apple and Google couldn't. Similarly, Gemma 2's prohibited uses policy, among other things, forbids "facilitating or encouraging users to commit any kind of crime." Understandably, Google doesn't want to see unhealthy chatbots "powered by Google Gemma" flooding the media.
These limitations, while understandable, contradict the open-source philosophy, so you can understand why things have become so controversial. Many researchers are exploring ways to categorize the different models based on their level of openness to make things clearer. If any of these become popular, we will certainly let you know.
The best open-source and open-source AI models.
Below is a list of all the open and open-source models worth knowing about today. Their position on the scale from open to open is still being debated until we have a better definition.
|
AI model |
Developer |
Model type |
License |
Parameter |
Note |
|---|---|---|---|---|---|
|
Llama 3.1 |
Meta |
LLM |
Customize |
8B, 70B, 405B |
Restrictions on usage and number of users. |
|
Gemma 2 |
|
LLM |
Customize |
2B, 9B, 27B |
Users are restricted. |
|
Phi-3 |
Microsoft |
LLM |
MIT |
3.8B, 7B, 14B |
|
|
Mixtral 8x7B |
Mistral |
LLM |
Apache 2.0 |
8x7B |
|
|
Mistral 7B |
Mistral |
LLM |
Apache 2.0 |
7B |
|
|
DBRX |
Databricks/Mosaic |
LLM |
Customize |
Equivalent to 36B |
Mixture of Experts: The number of parameters is very complex. |
|
OLMo 7B |
Allen Institute for AI |
LLM |
Apache 2.0 |
7B |
The best open-source AI model you can find. |
|
FLUX.1 [schnell] |
Black Forest Labs |
Image creator |
Customize |
N/A |
Non-commercial use |
|
FLUX.1 [dev] |
Black Forest Labs |
Image creator |
Apache 2.0 |
N/A |
|
|
Stable Diffusion |
Stability AI |
Image creator |
Customize |
N/A |
Previous versions of Stable Diffusion, including 1.5, 2.1, and SDXL, are all available under open licenses. |
Should we use open-source or open-source AI models?
While there aren't as many top-tier open-source AI models as would be desirable, the best open models are surprisingly competitive with proprietary alternatives. For example, the Llama 3 405B and FLUX.1 can compete head-to-head with the GPT-4o and DALL·E 3. If you have the technical skills to use an open-source model, you can achieve similar performance at a much lower cost and with greater freedom.
- What is open source software?
- 4 great benefits of using open source software
- 10 best open source web browsers
- What is source code? Learn about Source Code
- How is AI helping open-source programmers 'do everything themselves'?
- Cisco Linksys WRT160NL - 'toy' specifically for open source people
- What is the difference between open source software and closed source software?
- 10 Best Free Open Source Tools for Students