Cerebras Launches World's Fastest AI Inference Technology, 20x Performance Than NVIDIA
Cerebras Systems has just officially announced Cerebras Inference, which is considered the world's fastest AI inference solution.
This Cerebras Inference delivers performance of up to 1,800 tokens per second for Llama 3.1 8B (8 billion parameters) models and 450 tokens per second for Llama 3.1 70B, which is up to nearly 20 times faster than NVIDIA GPU-based AI inference solutions available in today's hyperscale clouds worldwide, including Microsoft Azure.
In addition to its incredible performance, the new inference solution is also very cheap to use, at a fraction of what is offered by popular GPU cloud platforms. For example, customers can get a million tokens for just 10 cents, providing a 100x price performance advantage for AI workloads.
Cerebras' 16-bit precision and 20x faster inference speed will enable developers to build next-generation high-performance AI applications without compromising on speed or cost. This breakthrough price/performance is made possible by the Cerebras CS-3 system and its Wafer Scale Engine 3 (WSE-3) AI processor. The CS-3 delivers 7,000x more memory bandwidth than the Nvidia H100, solving the technical challenge of memory bandwidth for generative AI.
Cerebras Inference is currently available at the following three levels:
- The Free Tier offers free API access and generous usage limits to anyone who signs up.
- The Developer Tier is designed for flexible, serverless deployments, providing users with API endpoints at a fraction of the cost of existing alternatives on the market, with the Llama 3.1 8B and 70B models priced at just 10 cents and 60 cents per million tokens respectively.
- The Enterprise Tier offers fine-tuned models, custom service level agreements, and dedicated support. Ideal for continuous workloads, businesses can access Cerebras Inference via a Cerebras-managed private cloud or on-premises.
With record performance, competitive pricing, and open API access, Cerebras Inference sets a new standard for open LLM development and deployment. As the only solution capable of delivering both high-speed training and inference, Cerebras opens up entirely new possibilities for AI.
With AI trends evolving rapidly and NVIDIA currently holding a dominant position in the market, the emergence of companies like Cerebras and Groq signals a potential shift in the dynamics of the entire industry. As the demand for faster and more cost-effective AI inference solutions increases, solutions like Cerebras Inference are well-positioned to take a chance on NVIDIA's dominance, especially in the inference space.
You should read it
- Wafer Scale Engine 3: The world's largest computer chip contains 4 trillion transistors
- Redmi Note 7 is fitted with wheels to make skis
- Forensics use bacteria in the mouth to estimate the victim's death time
- Instructions on how to remove less interactive friends on Facebook
- Adjust admin section of WordPress
- Complete on Photos apps on iPhone and iPad - Part 1: Positioning and Albums
- Gigabyte introduced the Pine Trail platform netbook
- Protect smartphones from 'freezing dead'
- Google has inadvertently 'helped' the hacker
- Ubisoft opened a game development studio in Da Nang, recruiting 100 people
- How to Open a .Zip File Without Winzip
- Yelp hits brakes on GoFundMe button for helping businesses during coronavirus lockdown
Maybe you are interested
5 simple steps to help increase safety and security of Windows 10 Instructions for installing and configuring Microsoft Security Essentials What is a keylogger? Choose headphones like in-ear, on-ear or over-ear? Which Sony MDR- ZX310 and Sennheiser HD 201 headphones are better? How to download GIF images from Twitter to phones and computers