Google Introduces New LLM Development Method: Faster, Stronger, and Cheaper

Google Research introduces speculative cascades to help large language models (LLMs) run faster, more cost-effectively, and maintain superior quality.

Since OpenAI launched GPT-3 in 2022 – the platform behind ChatGPT – large language models (LLMs) have taken the world by storm. They are widely used in many fields, from coding to search. However, the process of generating feedback (called inference ) is quite slow and computationally expensive. As more and more people use LLMs, speeding up, reducing costs while ensuring quality becomes a matter of survival for developers.

 

There are currently two methods that were hoped to solve this problem: cascades and speculative decoding .

  • Cascades : use a small, faster model to process first, then switch to a larger model if needed. This saves on computation costs but has the disadvantage of having to 'wait' for the small model to decide. If it is uncertain, the response time is still prolonged and the quality of the answer is subject to fluctuations.
  • Speculative decoding : a small model acts as a 'draft', predicting tokens in parallel. The larger model then quickly verifies the results. This method prioritizes speed but is quite strict: if just 1 token is wrong, the entire draft is discarded, even if most of the answers are correct. This sometimes causes the speed advantage to be lost and the computational cost does not decrease as expected.

 

Clearly, both approaches have their limitations. So Google Research has developed a new hybrid approach called speculative cascades. The core of this is a flexible delay rule that can decide to accept the results of the small model or switch to the larger model depending on the situation. This avoids the 'waiting' bottleneck of cascades and escapes the 'reject all drafts' stricture of speculative decoding.

In other words, even if the small model gives an answer that does not match the large model, the system can still accept it if it is a reasonable answer.

Images 1 of Google Introduces New LLM Development Method: Faster, Stronger, and Cheaper

Google Research tested this method on models like Gemma and T5 for a variety of language tasks: text summarization, inference, and coding. The results showed that speculative cascades outperformed traditional methods in terms of cost-to-performance and speed. In many cases, they produced correct answers faster than speculative decoding.

Right now, this is just lab research. But if successful and implemented in the real world, users will have the opportunity to experience LLM that is faster, more powerful, and significantly cheaper.

Close
Category

System

Windows XP

Windows Server 2012

Windows 8

Windows 7

Windows 10

Wifi tips

Virus Removal - Spyware

Speed ​​up the computer

Server

Security solution

Mail Server

LAN - WAN

Ghost - Install Win

Fix computer error

Configure Router Switch

Computer wallpaper

Computer security

Mac OS X

Mac OS System software

Mac OS Security

Mac OS Office application

Mac OS Email Management

Mac OS Data - File

Mac hardware

Hardware

USB - Flash Drive

Speaker headset

Printer

PC hardware

Network equipment

Laptop hardware

Computer components

Advice Computer

Game

PC game

Online game

Mobile Game

Pokemon GO

information

Technology story

Technology comments

Quiz technology

New technology

British talent technology

Attack the network

Artificial intelligence

Technology

Smart watches

Raspberry Pi

Linux

Camera

Basic knowledge

Banking services

SEO tips

Science

Strange story

Space Science

Scientific invention

Science Story

Science photo

Science and technology

Medicine

Health Care

Fun science

Environment

Discover science

Discover nature

Archeology

Life

Travel Experience

Tips

Raise up child

Make up

Life skills

Home Care

Entertainment

DIY Handmade

Cuisine

Christmas

Application

Web Email

Website - Blog

Web browser

Support Download - Upload

Software conversion

Social Network

Simulator software

Online payment

Office information

Music Software

Map and Positioning

Installation - Uninstall

Graphic design

Free - Discount

Email reader

Edit video

Edit photo

Compress and Decompress

Chat, Text, Call

Archive - Share

Electric

Water heater

Washing machine

Television

Machine tool

Fridge

Fans

Air conditioning

Program

Unix and Linux

SQL Server

SQL

Python

Programming C

PHP

NodeJS

MongoDB

jQuery

JavaScript

HTTP

HTML

Git

Database

Data structure and algorithm

CSS and CSS3

C ++

C #

AngularJS

Mobile

Wallpapers and Ringtones

Tricks application

Take and process photos

Storage - Sync

Security and Virus Removal

Personalized

Online Social Network

Map

Manage and edit Video

Data

Chat - Call - Text

Browser and Add-on

Basic setup