ChatGPT's Advanced Voice Mode Gets a Major Update

OpenAI introduced Advanced Voice Mode last year with the launch of GPT-4o . It uses native multimodal models like GPT-4o and can respond to audio input in just 232 milliseconds, with an average response time of 320 milliseconds — about the same speed as a human in a typical conversation. It can also generate more natural-sounding audio, recognize non-verbal cues like how fast you speak, and respond to emotion.

 

Earlier this year, OpenAI released a small update to its Enhanced Speech Mode that reduced pauses and improved pronunciation. Today, OpenAI is rolling out a significant upgrade to the mode that makes it even more natural and human-like. Responses now have more subtle intonation, more realistic rhythm, including pauses and emphasis, and more accurate expression of certain emotions like empathy and sarcasm.

ChatGPT's Advanced Voice Mode Gets a Major Update Picture 1

 

This update also introduces support for language translation. ChatGPT users can now use Enhanced Voice Mode to translate between languages. Simply ask ChatGPT to start translating, and it will continue translating throughout the conversation until told to stop. This feature effectively replaces dedicated voice translation apps.

Currently, the updated Enhanced Voice Mode is only available to paid ChatGPT users. OpenAI also notes that there are some known limitations with this latest update, which are outlined below:

This update may occasionally result in slight degradation of audio quality, such as unusual changes in tone and pitch, which are particularly noticeable with some voice options. Rare
instances of hallucinations in voice mode persist, sometimes resulting in unwanted sounds such as commercials, jumbled speech, or background music.
While there are still some minor limitations, the continued stream of improvements points to a future where the line between human and AI conversations will become increasingly blurred.

3.5 ★ | 2 Vote

May be interested