OpenAI’s new advanced voice mode for ChatGPT is starting to roll out a small number of users who subscribe to ChatGPT Plus. The feature, which the company showcased at its GPT-4o launch event in May, was criticized for sounding similar to Hollywood actress Scarlett Johansson and was later delayed due to safety concerns.
During the OpenAI event, the new voice mode made a cameo with more capabilities compared to ChatGPT’s current voice mode. OpenAI employees were able to interrupt and ask the chatbot to tell a story in different ways, and the chatbot took their interruptions in stride to adjust its responses.
The advanced mode was set to release in alpha in June, but OpenAI delayed the rollout by one month in order to “reach our bar to launch.” As part of that delay, the company said it was “improving the model’s ability to detect and refuse certain content.”
Also Read: OpenAI Testing Its Google Search Rival SearchGPT
OpenAI Faced Heat For Its Safety Policies
Taya Christianson, OpenAI spokesperson, said the company tested the voice model’s capabilities with more than 100 external red teamers. The company has recently faced a lot of scrutiny about its safety policies, so this pause could have been the right move. The company has also “added new filters that will recognize and block certain requests to generate music or other copyrighted audio,” Christianson says.
During OpenAI’s event, one of the biggest criticisms of the new mode was how much the onstage voice, called “Sky,” sounded like Scarlett Johansson, who voiced an AI in the movie Her. The new mode will only use four preset voices made with the help of voice actors.“We’ve made it so that ChatGPT cannot impersonate other people’s voices, both individuals and public figures, and will block outputs that differ from one of these preset voices,” the company said.
OpenAI plans to bring the new mode to all ChatGPT Plus users in the fall, as per Christianson.
Also Read: Gemini Chatbot For Android Gets AI-Backed Image Editing
GPT-4o Is Among OpenAI’s Most Effective Models
GTP-4o is one of the most effective AI models the company has produced. It is small and offers minimal latency, which is the amount of time taken to display a response. OpenAI claims that it will support text, picture, audio, and video as well as the vision and text that are now supported by the API.
"The model has a context window of 128K tokens, supports up to 16K output tokens per request, and has knowledge up to October 2023. Thanks to the improved tokenizer shared with GPT-4o, handling non-English text is now, even more, cost-effective," it added in the blog post.
OpenAI claimed that according to its Preparedness Framework, it used both automatic and human evaluations for safety. In order to find out potential issues, OpenAI also evaluated the AI model with 70 outside experts from other disciplines.