
Google has launched a new Batch Mode for its Gemini API. It is designed to handle large-scale, non-urgent AI tasks, but at a much cheaper cost. The new asynchronous endpoint is aimed at developers and enterprises that need to process large jobs without requiring real-time responses.
With Batch Mode, you can offload heavy jobs to the cloud and receive results within 24 hours, at 50% less cost than standard synchronous calls. The move is clearly targeting high-throughput use cases like bulk content generation, model benchmarking, and data labelling.
According to Google, there are three main benefits of Batch Mode. It includes lower costs, which means you pay half the price compared to regular API calls. There’s also higher throughput, looser rate limits allow more queries to be processed. It also simplifies operations, so developers don’t have to manage queuing or retry logic on their own.
The workflow is straightforward: you package multiple requests into a single file, submit the job, and collect the results after processing. The system is particularly suited for workloads where data is prepared in advance and response time is not critical.
Interestingly, many companies are already using the Batch Mode. For example, Reforged Labs uses Gemini 2.5 Pro to analyse large volumes of video ads. Batch Mode has helped them cut costs and scale operations. Apart from that, the Vals AI runs extensive model evaluations for industries like finance and healthcare. The new mode allows them to run large batches of evaluation queries without running into rate limits.
That’s not all. Recently, Google made its Veo 3 text-to-video model available in public preview via Vertex AI. Earlier, it was limited to select users, but now the model is open to all Google Cloud customers and partners.
If you are interested in the latest updates on Gemini and other AI models, join us on WhatsApp, where we share the latest tech news, in-depth reviews, analysis, and more.