The US has a super cheap AI chatbot to compete with DeepSeek
The AI model the researchers created is capable of 'reasoning' for less than $50 via cloud computing. The model, called s1, performs similarly to other state-of-the-art reasoning models such as OpenAI's o1 and DeepSeek's R1 when performing tests measuring mathematical and programming abilities.
The US has a super cheap AI chatbot to compete with DeepSeek Picture 1
The cost of training AI chatbots is not cheap.
The s1 model is now available on GitHub, along with the data and source code used in the training process. The team says they started with an existing base model and then refined it through a process called 'distillation', which extracts the reasoning capabilities of another AI model by training on its answers. Specifically, s1 is distilled from Google's Gemini 2.0 Flash Thinking Experimental reasoning model.
This distillation method is similar to the one used by researchers at the University of California, Berkeley, to develop an AI reasoning model for about $450 last month. It is also the form in which the DeepSeek R1 AI model that is currently making waves has been developed. This has sparked interest in the research community as researchers with limited budgets can still innovate in the field of AI.
However, this development also raises many questions about the commercialization of AI models, especially when a model worth millions of dollars can be replicated at low cost. OpenAI previously accused DeepSeek of improperly collecting data from its API to serve as model distillation.
The US has a super cheap AI chatbot to compete with DeepSeek Picture 2
However, distillation techniques are proving effective.
The s1 team is currently working on ways to optimize inference performance and extend the model's 'thinking' time before giving an answer. This is one of the breakthroughs in OpenAI's o1 model that other AI labs are trying to replicate.
The research paper argues that reasoning models can be fine-tuned using a relatively small dataset through a process called supervised tuning (SFT), in which the AI model is trained to mimic certain behaviors. Notably, SFT is typically cheaper than the large-scale reinforcement learning approach that DeepSeek used for its R1 model.
Google currently offers free access to Gemini 2.0 Flash Thinking Experimental, albeit limited to day-to-day operations. However, Google's terms prohibit reverse engineering the model to develop competing services, so it will be interesting to see how Google responds to s1.
The US has a super cheap AI chatbot to compete with DeepSeek Picture 3
Distillation will help budget-constrained researchers create super-cheap inference AI models.
To train s1, the researchers said, they created a dataset of 1,000 carefully curated questions, along with answers and 'thought' processes from the Gemini 2.0 model. After training, s1 achieved strong performance on certain AI benchmarks in just under 30 minutes using 16 Nvidia H100 GPUs.
Niklas Muennighoff, a Stanford University researcher involved in the project, said he could rent the necessary computers for about $20. To improve s1's accuracy, the team used a simple trick: adding the word 'wait' to the inference process, which helped the model come up with more accurate answers.
Currently, Meta, Google, and Microsoft are expected to invest hundreds of billions of dollars in AI infrastructure by 2025, with a portion of that going toward training next-generation AI models. While distillation has proven to be an effective way to replicate AI model capabilities at low cost, it still doesn't produce new AI models that are significantly better than existing ones.
You should read it
- Questions DeepSeek Doesn't Want to Answer
- AI Chatbot DeepSeek Delivers Disastrous Results
- Reasons not to use DeepSeek
- Decoding the 'distillation' technique that brought DeepSeek success
- Smarter Free ChatGPT to Take on DeepSeek
- How to use DeepSeek Chatbot
- DeepSeek 'lied' about the cost of developing AI chatbot?
- Everything you need to know about OpenAI
- OpenAI artificial intelligence defeated 5 professional Dota 2 players
- When Apple looked up at the sky and saw the DeepSeek star
- Is ChatGPT Plus or Perplexity the better AI chatbot?
- OpenAI artificial intelligence defeated the current world champion Dota 2
May be interested
Impenetrable encryption thanks to AI and lasers: Against even quantum computers?
Serious security vulnerability on AMD CPUs has been patched
Cryptocurrency-Stealing Malware Suddenly Sneaks Into the App Store
iPhone screenshot reading malware appears for the first time
Malware using machine learning technology, downloaded more than 240,000 times by Android users
Record number of users and messages sent per day via Zalo