OpenAI starts rolling out its Her-like voice mode for ChatGPT
Key Takeaways
- OpenAI has begun rolling out an advanced voice mode for ChatGPT, starting with a limited group of users.
- The new voice mode aims to make interactions with ChatGPT feel more natural and lifelike, similar to talking to a friend.
- This update follows extensive testing and feedback to improve safety and performance.
- The advanced voice mode could pose a challenge to existing digital assistants like Alexa and Siri.
- OpenAI plans to monitor the rollout closely and release a detailed report on its findings.
The Evolution of ChatGPT's Voice Mode
Early Versions and Limitations
ChatGPT did have a less sophisticated voice mode before. It relied on transcribing spoken questions into text before responding. This method was slow and often felt unnatural. Users found it hard to have a smooth conversation with the AI.
Introduction of Advanced Voice Mode
OpenAI stunned users when it showed off an updated voice mode for the most advanced version of ChatGPT earlier this year. The new voice mode uses OpenAI's cutting-edge AI model to directly process and understand audio inputs. This makes the interaction more seamless and efficient. The advanced voice mode in ChatGPT will be more than audio answers. The model will be conversant and can hold multiple conversations at once.
User Reactions and Feedback
The ease of conversing with ChatGPT's advanced voice mode could encourage users to engage with the tool more often. This new feature might even pose a challenge to other virtual assistants. Users have been excited about the lifelike quality of the voice, which sounds far from the robotic voices of other digital assistants. The advanced voice mode can respond in real time, adjust to interruptions, and even understand emotions and non-verbal cues.
The advanced voice mode will be limited to ChatGPT's four preset voices – Juniper, Breeze, Cove, and Ember – made in collaboration with professional voice actors.
Technical Aspects of ChatGPT's Advanced Voice Mode
AI Model and Processing
Unlike earlier versions, the new voice features use OpenAI's latest AI model to directly process and understand audio inputs. This means that ChatGPT can now handle spoken questions without converting them to text first. This allows for a more seamless and efficient voice interaction experience.
Real-Time Response Capabilities
ChatGPT's advanced voice mode responds in real-time, making conversations feel more natural. It can adjust to interruptions, make giggling noises when a user makes a joke, and even judge a speaker’s emotional state based on their tone of voice. This real-time capability is a significant improvement over previous versions.
Safety and Testing Measures
OpenAI has implemented rigorous safety and testing measures to ensure the new voice mode is reliable and secure. The advanced voice mode is currently being rolled out to a small number of ChatGPT Plus users, with plans to expand access to all Plus users in the fall.
Comparing ChatGPT's Voice Mode to Other Digital Assistants

Voice Quality and Realism
Far from the kind of robotic voice that people have come to associate with digital assistants like Alexa or Siri, the ChatGPT advanced voice mode sounds remarkably lifelike. It responds in real time, can adjust to being interrupted, can make giggling noises when a user makes a joke, and can judge a speaker’s emotional state based on their tone of voice. (During the initial demo, it also sounded suspiciously like Scarlett Johansson).
Interaction Experience
ChatGPT already has a less sophisticated voice mode. But the rollout of a more advanced voice mode could mark a major turning point for OpenAI, transforming what was already a significant AI chatbot into something more akin to a virtual, personal assistant that users can engage in natural, spoken conversations in much the same way that they would chat to a friend. The ease of conversing with ChatGPT’s advanced voice mode could encourage users to engage with the tool more often, and pose a challenge to
Market Implications
advertisement
CHATGPT VOICE MODE
Unlike previous versions of ChatGPT, which relied on transcribing spoken questions into text before responding, the new voice features utilize OpenAI's cutting-edge AI model to directly process and understand audio inputs. The new voice mode enables a more seamless and efficient voice interaction experience without the need for intermediate text conversion, according to an official video posted on Instagram.
OpenAI stunned users when it demonstrated an updated voice mode for the most advanced version of ChatGPT earlier this year.
Previous versions of ChatGPT have had the ability to listen to spoken questions and respond with audio by transcribing the questions into text, running them through its AI algorithm, and then reading its text response out loud. However, the new voice features are built on OpenAI’s latest AI model, which directly processes audio without needing to convert it to text first. That allows the bot to listen to multiple voices at once and determine a person’s tone of voice, responding differently based on what
User Experience with ChatGPT's Advanced Voice Mode
Ease of Use
The new voice mode is designed to be very easy to use. Unlike older versions, which needed to change spoken questions into text first, the advanced voice mode can understand and process audio directly. This makes talking to ChatGPT feel more natural and smooth.
Engagement and Interaction
Some ChatGPT Plus users can now access advanced voice mode, but others may wait months. The advanced voice mode sounds very lifelike, far from the robotic voices of other digital assistants. It can even giggle at jokes and understand emotions based on tone. This makes conversations more engaging and fun.
Potential Challenges
While the advanced voice mode is impressive, it is not yet perfect. For example, it is not yet optimized for use with in-car Bluetooth or speakerphone. OpenAI is slowly enrolling users in the alpha phase to ensure quality and fix any issues that come up.
The rollout of this advanced feature seems to be in a phased manner, giving users their first access to GPT-4o’s hyperrealistic audio responses.
Future Prospects for ChatGPT's Voice Technology
Planned Updates and Improvements
OpenAI has big plans for the future of ChatGPT's voice technology. The company aims to make voice interactions even more natural and seamless. With the introduction of GPT-4o, users can expect smoother handling of interruptions and better management of group conversations. The team is also working on filtering out background noise and adapting to different tones, making the experience more user-friendly.
Expansion to Wider User Base
Currently, the advanced voice mode is available to a limited group of users. However, OpenAI plans to roll it out to a broader audience soon. This gradual release allows the company to closely monitor its usage and make necessary adjustments. People in the alpha group will receive alerts in the ChatGPT app and emails with instructions on how to use the new features.
Long-Term Vision and Goals
OpenAI's long-term vision includes transforming ChatGPT into a virtual personal assistant that users can engage with naturally, much like talking to a friend. This could significantly increase user engagement and make ChatGPT a more integral part of daily life. The company is also exploring the potential of integrating this advanced voice mode with other AI tools, further expanding its capabilities.
The future of conversational AI and voice assistants looks promising, with the global voice-based smart speaker market potentially reaching $30 billion by 2024. This indicates a growing demand for more advanced and natural voice interactions.
Ethical and Social Implications of Human-Like AI Voices

Trust and Reliability
The rise of human-like AI voices brings up important questions about trust and reliability. As AI voices become more realistic, it becomes harder to tell them apart from real human voices. This can lead to potential misuse, such as scams or deepfake audios. It's crucial to ensure that these technologies are used ethically to avoid harming people.
Impact on Human Interaction
Human-like AI voices can change how we interact with technology and each other. People might start forming relationships with AI bots that respond to their emotions. This could affect real-life relationships and how we communicate with one another. It's important to think about these changes and how they might impact society.
Regulatory Considerations
As AI voice technology advances, there is a growing need for regulations to ensure ethical use. Policies must be put in place to prevent misuse and protect privacy. This includes blocking attempts to create voices of real people without their consent. Working with experts and the public can help shape these regulations to enhance human experiences while addressing complex technological challenges.
As AI continues to evolve, our collective responsibility is to uphold ethical standards and ensure that these innovations enhance, rather than harm, our digital lives.
OpenAI is making significant strides in the field of artificial intelligence. The company is focused on creating tools that enhance creativity and productivity. Recently, OpenAI showcased an updated voice mode for ChatGPT, which has generated excitement among users. This new feature is part of a broader effort to develop AI that can understand and respond to human emotions, with input from representatives of 45 languages and 29 different regions.
Here are some key initiatives OpenAI is pursuing:
- Research Grants: OpenAI has established a program to support research in AI, encouraging collaboration within the industry.
- Safety Measures: The company is committed to ensuring that its AI tools are safe for public use, implementing filters to prevent copyright issues.
- Voice Technology: The advanced voice mode is just one example of how OpenAI is expanding its offerings to compete with other tech giants.
OpenAI believes that its research will eventually lead to artificial general intelligence, which can solve complex human-level problems.
As OpenAI continues to innovate, it is also addressing potential challenges, such as legal issues related to AI-generated content. The company is aware of the scrutiny it faces and is taking steps to navigate these waters carefully. Overall, OpenAI's initiatives reflect a commitment to advancing AI technology while considering ethical implications and user safety.
Conclusion
OpenAI's rollout of the advanced voice mode for ChatGPT marks a significant step forward in the world of AI technology. This new feature, which allows users to have natural, spoken conversations with the chatbot, brings us closer to the sci-fi dream of having a virtual assistant that feels almost human. Despite some initial backlash and delays, the phased introduction of this voice mode shows OpenAI's commitment to innovation and user experience. As more people get access to this feature, it will be interesting to see how it changes the way we interact with AI and what new possibilities it will unlock.
Frequently Asked Questions
What is ChatGPT's advanced voice mode?
ChatGPT's advanced voice mode is a new feature that lets users talk to ChatGPT using their voice and hear lifelike responses in real-time.
Who can access the alpha version of the advanced voice mode?
The alpha version is available to a small group of ChatGPT Plus users who will receive an alert in the app and an email with instructions.
Why was the launch of the advanced voice mode delayed?
The launch was delayed due to backlash over the voice sounding too similar to actress Scarlett Johansson, which led to further testing and improvements.
How does the advanced voice mode differ from previous versions?
Unlike older versions that converted spoken words to text, the advanced voice mode processes audio inputs directly, making interactions smoother and more natural.
What safety measures were taken before the release?
OpenAI conducted extensive testing with over 100 external experts in 45 languages to identify potential safety risks and improvements.
How does ChatGPT's voice mode compare to other digital assistants?
ChatGPT's advanced voice mode offers more lifelike and real-time responses, making interactions feel more natural compared to other digital assistants like Alexa or Siri.