Are you interested in REQUESTS? Save with our coupons on WHATSAPP o TELEGRAM!

ChatGPT is now an assistant that can see, hear and speak

25 September 2023

Chat GPT, developed by OpenAI, is introducing new capabilities that allow you to interact through voice and images, offering an intuitive interface and more ways to integrate ChatGPT into your daily life. In a recent announcement on its website, OpenAI decided to reveal these new features in advance. Likewise he also highlighted the benefits they bring and the challenges they present in the growing AI market.

Topics of this article:

ChatGPT: voice interaction

With the new voice functionality, users can have interactive conversations with ChatGPT. This allows you to use the assistant even on the move, increasing the potential of the chatbot. For example, a user could ask ChatGPT to tell a children's story while on the go, making it more enjoyable.

A story created by the chatbot

Or, during a dinner with friends, a debate on a specific topic could emerge; in this case, users can use the bot to obtain accurate information and resolve the debate constructively.

ChatGPT's voice technology uses a advanced text-to-speech model. In collaboration with professional voice actors, this model is able to generate humanoid audio from text and short voice samples, making interaction with ChatGPT even more natural and intuitive. Also, thanks to Whisper, an open-source speech recognition system developed by OpenAI, spoken words are transcribed into text with great precision, allowing the chatbot to understand and respond effectively to user requests.

ChatGPT: visual interaction

As above, the AI model can now analyze one or more images, allowing users to solve problems, plan meals or analyze complex graphs. For example, a user could submit a photo of the contents of their refrigerator. The chatbot should therefore be used analyze the foods present and suggest recipes based on these ingredients, also providing step by step instructions for preparation.

Furthermore, if the user needs to focus on a particular element in the image, ChatGPT's mobile app includes a drawing tool which allows you to highlight specific areas of the image, making communication and analysis even more precise and personalized.

Image understanding is powered by the GPT-3.5 and GPT-4 multimodal models. These advanced models they apply their language skills to a wide range of images, such as photographs, screenshots and documents that contain both text and images, allowing ChatGPT to understand and interpret the visual context accurately and in detail.

It is worth mentioning that from poco OpenAI has integrated not only that Canva but also DALL-E 3 in ChatGPT, or the generative image model.