Everyone’s favorite chatbot can now do just that see, hear and speak. On Monday, OpenAI announced modern multimodal capabilities for ChatGPT. Users can now make voice calls or share images using ChatGPT in real time.
Audio and multimodal capabilities have become the next stage in the fierce competition for generative AI. Meta recently launched AudioCraft to generate music using artificial intelligence, and Google Bard and Microsoft Bing have implemented multimodal features for chatting. Just last week, Amazon unveiled an improved version of Alexa that will be powered by its own LLM (Immense Language Model), and even Apple is experimenting with AI-generated voice via personal voice.
Voice features will be available on iOS and Android. Like Alexa or Siri, you can tap to talk to ChatGPT and ChatGPT will speak to you in one of your five preferred voice options. Unlike current voice assistants, ChatGPT is powered by more advanced LLM, so you’ll hear the same type of conversational and artistic response that OpenAI’s GPT-4 and GPT-3.5 are able to create through text. An example shared by OpenAI in the announcement is generating a bedtime story from a voice message. So exhausted parents at the end of a long day can outsource their creativity to ChatGPT.
The tweet may have been deleted
Multimodal recognition is something that has been predicted for some time, and now it is being launched in a user-friendly way for ChatGPT. When GPT-4 was released last March, OpenAI demonstrated its ability to understand and interpret images and handwritten text. It will now be part of everyday ChatGPT employ. Users can upload an image of something and query ChatGPT about it – identifying a cloud or preparing a meal plan based on a photo of the contents of the fridge. The multimodal service will be available on all platforms.
As with any development of generative AI, there are stern ethical and privacy issues to consider. To mitigate the risk of audio deepfakes, OpenAI says it only uses its audio recognition technology in the specific employ case of “voice chat.” In addition, voice actors with whom they “worked directly” took part in the film. That said, the announcement didn’t mention whether user voices can be used to train the model if you opt for voice chat. Regarding ChatGPT’s multimodal capabilities, OpenAI says it has “taken technical measures to significantly limit ChatGPT’s ability to analyze and make direct statements about people because ChatGPT is not always exact and these systems should respect individuals’ privacy.” But the true test of its nefarious uses will not be known until it is released into the wild.
Voice chat and images will be made available to ChatGPT Plus and Enterprise users within the next two weeks, with all users “shortly thereafter.”