Cerebras Launches World's Fastest AI Inference Technology, 20x Performance Than NVIDIA

This Cerebras Inference delivers performance of up to 1,800 tokens per second for Llama 3.1 8B (8 billion parameters) models and 450 tokens per second for Llama 3.1 70B, which is up to nearly 20 times faster than NVIDIA GPU-based AI inference solutions available in today's hyperscale clouds worldwide, including Microsoft Azure.

In addition to its incredible performance, the new inference solution is also very cheap to use, at a fraction of what is offered by popular GPU cloud platforms. For example, customers can get a million tokens for just 10 cents, providing a 100x price performance advantage for AI workloads.

Cerebras' 16-bit precision and 20x faster inference speed will enable developers to build next-generation high-performance AI applications without compromising on speed or cost. This breakthrough price/performance is made possible by the Cerebras CS-3 system and its Wafer Scale Engine 3 (WSE-3) AI processor. The CS-3 delivers 7,000x more memory bandwidth than the Nvidia H100, solving the technical challenge of memory bandwidth for generative AI.

Cerebras Launches World's Fastest AI Inference Technology, 20x Performance Than NVIDIA Picture 1

Cerebras Inference is currently available at the following three levels:

The Free Tier offers free API access and generous usage limits to anyone who signs up.
The Developer Tier is designed for flexible, serverless deployments, providing users with API endpoints at a fraction of the cost of existing alternatives on the market, with the Llama 3.1 8B and 70B models priced at just 10 cents and 60 cents per million tokens respectively.
The Enterprise Tier offers fine-tuned models, custom service level agreements, and dedicated support. Ideal for continuous workloads, businesses can access Cerebras Inference via a Cerebras-managed private cloud or on-premises.

With record performance, competitive pricing, and open API access, Cerebras Inference sets a new standard for open LLM development and deployment. As the only solution capable of delivering both high-speed training and inference, Cerebras opens up entirely new possibilities for AI.

With AI trends evolving rapidly and NVIDIA currently holding a dominant position in the market, the emergence of companies like Cerebras and Groq signals a potential shift in the dynamics of the entire industry. As the demand for faster and more cost-effective AI inference solutions increases, solutions like Cerebras Inference are well-positioned to take a chance on NVIDIA's dominance, especially in the inference space.

Cerebras

Isabella Humphrey

Update 29 August 2024

You should read it

May be interested

How to optimize the NVIDIA Video Card to play the best game
nvidia cards are generally appreciated for their performance, low temperatures and good compatibility with good games. if you're using an nvidia card, the following tips will help you change the performance of this graphics card and give you a smoother and more stable gaming experience.
AMD Radeon RX 9070 GRE specs and performance leaked: completely 'crushing' Nvidia RTX 5060 Ti?
this week, nvidia launched the rtx 5060 ti and rtx 5060 desktop graphics cards with list prices for the versions of $429, $379 and $299 respectively.
Nvidia's STEAL AI offers the ability to support inference for better computer vision models
computer science researchers from nvidia, university of toronto and the institute of vector artificial intelligence in toronto have devised a more accurate method of detection and prediction where an object starts and ends.
Xiaomi launches Mi 9 Pro 5G: Snapdragon 855+ chip, the world's fastest 30W wireless charger, 5G connectivity
mi 9 pro 5g has a strong configuration, supports 5g connectivity and the fastest wireless charging technology available today.
AMD Instinct MI300X is currently the fastest GPU in the Geekbench OpenCL rankings, 19% more powerful than NVIDIA RTX 4090
the first performance evaluation results of the amd instinct mi300x 192 gb gpu have been published according to the geekbench opencl rating scale.
How to use the NVIDIA Canvas app
nvidia canvas, formerly known as nvidia gaugan, is a free painting application that uses sophisticated ai modeling to turn simple drawings into realistic landscapes.
AMD demonstrated FreeSync, the FPS sync technology
at ces 2014, amd demonstrated freesync, the technology that competes with nvidia's g-sync, and, according to its name, amd's feature allows for smoother image synchronization without having to add a card. expand as nvidia technology.
MacBooks can come back using Nvidia chips
upgrading macbooks early next year may return to using nvidia instead of amd over the past 3 years.
Lenovo launches a high-performance, secure, high-performance ThinkPad X1 Carbon Gen 9 laptop
officially launches the completely redesigned thinkpad x1 carbon gen 9 with a host of important upgrades in performance, connectivity, and security. the starting price reference in the us market is 1,429 usd.
The fastest, most expensive graphics card, behind the global AI race
nvidia a100 possesses impressive parameters such as 80 gb memory, 2 tb / s bandwidth with a retail price of up to $ 30,000 and is considered the most expensive graphics card today.

Cerebras Launches World's Fastest AI Inference Technology, 20x Performance Than NVIDIA

You should read it

May be interested

Tech info