ChatGPT has changed a lot since its introduction late last year, especially if you pay for it. GPT-4 is a powerful upgrade over GPT-3.5, and plugins connect you to third-party services to make ChatGPT even more useful. Now the chatbot is changing again, expanding beyond text-based conversations to support both visual and audio content.
OpenAI announced these changes to ChatGPT in an announcement on Monday. As long as you’re subscribed to Plus, you’ll be able to both speak directly to ChatGPT as well as share images in your chats, which has the potential to improve the bot’s usefulness. You can ask ChatGPT questions about specific elements of an image, using a drawing tool to focus the AI where you want it to. Plus, you can have actual conversations with ChatGPT, as it not only understands your voice, but now has a voice of its own.
Sharing images with ChatGPT
The company showed off the following example in its introductory video: You might be struggling to figure out how to lower the seat on your bike. You can fire up the ChatGPT app, snap a pic of your bike, and ask ChatGPT for help. It’ll give you an overview of the solution, but may ask for another picture for more context. You can then take another picture (maybe a close up of the seat itself), then draw a circle around the lock on the seat to focus ChatGPT’s attention there.
According to OpenAI’s example video, ChatGPT will be able to differentiate between different parts of the bike: You might ask if a particular part is the lever of the bike, and ChatGPT could respond saying it’s actually a bolt that needs an Allen wrench to loosen. But more impressive than that, you can share your bike’s manual as well as your tool box and ask whether you have the right tool for the job. ChatGPT will analyze your images, then confirm or deny: If you have the right tool, it’ll tell you where it is in your tool box.
Finally, the user in the example video thanked ChatGPT for its help, once again confirming my suspicion that AI companies think we need to be nice to our future robot overlords.
The possibilities here go beyond the above example. You can take a picture of the contents of your refrigerator and ask ChatGPT to help you plan dinner, or send an image of a building and ask for the history of its construction. This feature is available on all ChatGPT platforms for Plus and Enterprise users, and is rolling out over the next two weeks.
Have a conversation with ChatGPT
Of course, ChatGPT not only uses visual elements now, but auditory ones as well. In a second example video, a user asks ChatGPT with their voice to tell them a story about the “super-duper sunflower hedgehog named Larry,” a reference to the company’s video introducing DALL·E 3 to the world. They want ChatGPT to start by telling them a little about Larry. ChatGPT, of course, produces an intro for Larry, before the user cuts in to ask what his house was like. This goes on, back and forth, as if ChatGPT was an actual storyteller improvising on the fly.
What’s cool about this feature is how natural it is compared to text-based chats. You can tap the microphone button to interrupt ChatGPT at any time, so you can ask for more context about something the bot is explaining, or redirect the conversation entirely. Voice is only available in the ChatGPT app on iOS and Android, and is also rolling out over the next two weeks to Plus and Enterprise users. You’ll find the option to opt in to the feature in Settings > New Features. Then, tap the headphones button in the top right and choose which voice you like out of five options. (OpenAI actually worked with voice actors on this feature.)
Of course, as with all AI features, these options won’t be perfect, and will be susceptible to the same hallucination risks we’ve seen in the past. OpenAI is well aware of that fact: The company tested the image model with red teamers against issues like extremism and scientific proficiency, and developed the feature around their experience with the Be My Eyes app, which helps vision-impaired individuals use their phones to “see” for them. The model is also limited in its ability to return results about people.
Long story short: Don’t rely on these features for any high-risk or serious situations. Adjusting your bike seat? Sure. Changing a tire on a car? Hard pass.