Chat GPT, developed by OpenAI, is introducing new capabilities that allow you to interact through voice and images, offering an intuitive interface and more ways to integrate ChatGPT into your daily life. In a recent announcement on its website, OpenAI decided to reveal these new features in advance. Likewise he also highlighted the benefits they bring and the challenges they present in the growing AI market.
Topics of this article:
ChatGPT: voice interaction
With the new voice functionality, users can have interactive conversations with ChatGPT. This allows you to use the assistant even on the move, increasing the potential of the chatbot. For example, a user could ask ChatGPT to tell a children's story while on the go, making it more enjoyable.
Or, during a dinner with friends, a debate on a specific topic could emerge; in this case, users can use the bot to obtain accurate information and resolve the debate constructively.
ChatGPT's voice technology uses a advanced text-to-speech model. In collaboration with professional voice actors, this model is able to generate humanoid audio from text and short voice samples, making interaction with ChatGPT even more natural and intuitive. Also, thanks to Whisper, an open-source speech recognition system developed by OpenAI, spoken words are transcribed into text with great precision, allowing the chatbot to understand and respond effectively to user requests.
ChatGPT: visual interaction
As above, the AI model can now analyze one or more images, allowing users to solve problems, plan meals or analyze complex graphs. For example, a user could submit a photo of the contents of their refrigerator. The chatbot should therefore be used analyze the foods present and suggest recipes based on these ingredients, also providing step by step instructions for preparation.
Furthermore, if the user needs to focus on a particular element in the image, ChatGPT's mobile app includes a drawing tool which allows you to highlight specific areas of the image, making communication and analysis even more precise and personalized.
Image understanding is powered by the GPT-3.5 and GPT-4 multimodal models. These advanced models they apply their language skills to a wide range of images, such as photographs, screenshots and documents that contain both text and images, allowing ChatGPT to understand and interpret the visual context accurately and in detail.
When and for whom it will be available
In the next two weeks OpenAI will implement voice and images in ChatGPT for users users with Plus and Enterprise subscriptions.
The function that allows voice interaction will be available on iOS and Android but not on the web version, which is the one used by most people.
The function that allows visual interaction will be available instead on all platforms, therefore Android, iOS and web.
Source | OpenAI