Knowledge base

September 29, 2023

ChatGPT Can Now Speak, Listen and Analyze Images

OpenAI has announced that ChatGPT can now “see, hear and speak,” or rather, understand spoken words, respond with a synthetic voice and process images.

The update to the chatbot – OpenAI’s largest since the introduction of GPT-4 – allows users to sign up for conversational conversations via ChatGPT’s mobile app and choose from five different synthetic voices for the bot to respond with. Users can also share images with ChatGPT and highlight areas of interest or analysis (think: “What type of clouds are these?”).

The changes will be rolled out to paying users within the next two weeks, OpenAI said. While voice functionality will be limited to the iOS and Android apps, imaging capabilities will be available on all platforms.

AI Competition

Amid a growing battle in artificial intelligence between leaders such as OpenAI, Microsoft, Google and Anthropic, comes this comprehensive feature update. Technology giants are vying to encourage consumers to use generative AI daily, introducing not only new chatbot apps but also innovative features, particularly this summer. Google has announced a series of updates to its Bard chatbot, and Microsoft has added visual search to Bing.

Early this year, an additional $10 billion investment by Microsoft in OpenAI led to the largest AI investment of the year, according to PitchBook. In April, the startup reportedly completed a $300 million equity sale valued between $27 billion and $29 billion, backed by investors such as Sequoia Capital and Andreessen Horowitz.

Concerns about Synthetic Voices

Experts have raised concerns about AI-generated synthetic voices, which in this case may offer users a more natural experience, but may also enable more convincing deepfakes. Cyber threat actors and researchers have already begun researching ways to use deepfakes to infiltrate cybersecurity systems.

OpenAI acknowledged these concerns in its announcement on Monday, saying synthetic voices were “created with voice actors we have worked with directly,” rather than collected from unknowns.

The press release also gave little information about how OpenAI would use consumers’ voice input, or how the company would secure that data should it be used. The company’s terms and conditions state that consumers own their imports “to the extent permitted by applicable law.”

OpenAI referred to its own guidelines on speech interactions, according to which the company does not retain audio clips and does not use the audio clips to improve models. However, the company also says it sees transcriptions as input and can use them to improve large-scale language models.

Want to know more?

Get in touch


To all blogs

Tech Updates: Microsoft 365, Azure, Cybersecurity & AI – Wekelijks in je Mailbox.