Get all your news in one place.
100’s of premium titles.
One app.
Start reading
TechRadar
TechRadar
Eric Hal Schwartz

Gemini just got a new highly-requested feature that trumps ChatGPT

Google Gemini on Android Auto.
  • Google’s Gemini AI assistant now supports audio file uploads.
  • The AI will transcribe, summarize, and extract key information from recordings.
  • The feature turns 10 minutes of voice memos, meetings, lectures, and interviews into searchable documents.

Google Gemini has just learned how to listen and make sense of what it hears. You can now upload audio files to the AI assistant on the web or through the mobile apps and get transcriptions, summaries, and key details.

For anyone who’s ever let a voice memo rot in their phone or dreaded the task of rewatching a meeting recording, this update could be the AI equivalent of hiring a personal note-taker.

That said, it can only handle 10 minutes of audio at a time, so no long meetings just yet. You can upload the audio files directly by selecting audio from the usual file upload options. What makes it different from Gemini’s earlier Gemini Live voice features is that this isn’t just speaking to the AI in real time.

Gemini Live is useful for casual commands, but this is more about getting the AI to process data as it does with the other formats. Notably, audio file uploading has apparently been the most requested feature from users, according to Google’s VP of Gemini Josh Woodward..

AI audio

I tested it by uploading a couple of sketches from old comedy albums and a phone conversation with a friend. The AI successfully transcribed all the words said in each case, with a couple of small name-related errors. It was also good at pulling out key elements and things set for a to-do list.

The demand for audio and Google's response hint at how AI tools are evolving to match how we save information in audio logs and voice memos. Turning that into something searchable has usually meant using external transcription software. Gemini’s new feature collapses that process into a single step.

What makes the addition feel particularly timely is the way it dovetails with other recent Gemini improvements. Google has already integrated Gemini into apps like, begun testing a card-based visual interface, and significantly expanded Gemini’s personalization options. The ability to process audio continues that trend.

The audio option isn't unique to Gemini among AI assistants, but it can at least match some of what ChatGPT can do thanks to its Whisper transcription model. In fact, in my testing, I preferred Google's offering.

Anthropic’s Claude also handles audio in some developer tools, and Perplexity can extract data from YouTube videos. But Gemini’s execution is more focused on everyday use cases.

And the output isn’t just a dumb transcription. You can ask Gemini to simplify the language, extract speaker-specific comments, generate questions based on the content, or create a study guide from a classroom discussion. Of course, the 10-minute limit puts some restraint on making it part of everyday life. Free-tier users also face daily usage limits.

Google hasn’t released a formal pricing breakdown for high-volume audio processing, but it's part of the regular Gemini quota, so anyone planning to feed it a dozen hours of legal depositions should pace themselves.

You might also like

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.