Key Highlights
- Google introduces Flex and Priority as new inference tiers for the Gemini API
- Flex tier provides 50% cost reduction for workloads that can tolerate latency
- Priority tier charges 75–100% premium for mission-critical, real-time operations
- Batch API maintains 50% discount with processing windows up to 24 hours
- Caching tier charges vary by token volume and retention period
On April 2, Google unveiled a comprehensive update to its Gemini API pricing structure, introducing five specialized service tiers: Standard, Flex, Priority, Batch, and Caching. This strategic enhancement empowers developers with greater flexibility in optimizing their applications based on cost efficiency, performance speed, and system reliability requirements.
The newly introduced Flex tier targets non-urgent background operations that can afford delayed responses. By leveraging underutilized compute resources during off-peak periods, this tier delivers a 50% price reduction compared to standard rates. Response latency varies between 1 and 15 minutes without guaranteed delivery times. Ideal applications include customer relationship management updates, computational research tasks, and autonomous agent workflows.
What distinguishes Flex from the pre-existing Batch API is its synchronous endpoint architecture. This eliminates the complexity of managing file inputs/outputs and continuously checking job status, offering developers a more streamlined integration path while maintaining equivalent cost benefits.
Conversely, the Priority tier addresses the opposite end of the performance spectrum. Priced at 75% to 100% above standard rates, this premium option delivers lightning-fast response times ranging from milliseconds to just a few seconds, specifically engineered for time-sensitive business operations.
Google identifies ideal Priority tier applications as live customer service chatbots, real-time fraud prevention systems, and automated content moderation workflows. When Priority tier usage surpasses allocated capacity limits, the system intelligently redirects excess requests to the Standard tier rather than rejecting them entirely.
Complete Tier Structure Overview
The previously launched Batch API continues to operate with a 50% discount from standard pricing, accommodating processing delays of up to 24 hours. This option serves developers running substantial offline computational tasks where immediate results aren’t essential.
The Caching tier employs a pricing model calculated by token quantity and storage duration. Google identifies optimal use cases as conversational AI with extensive system prompts, recurring analysis of large-scale video content, or database queries across expansive document repositories.
Both Flex and Priority tiers operate through the identical service_tier parameter within API calls. This design allows developers to switch between performance levels through a single configuration modification, with the API response providing confirmation of which tier processed each individual request.
Flex tier access extends to all paid subscription users across GenerateContent and Interactions API endpoints. Priority tier availability is restricted to Tier 2 and Tier 3 paid accounts utilizing those same endpoints.
Developer Benefits
The consolidated interface represents the most significant advancement in this update. Previously, managing both background processing and interactive workloads necessitated maintaining separate architectural frameworks for synchronous and asynchronous operations. This update enables both workload types to operate through identical synchronous endpoints.
Google positioned this enhancement as integral to its comprehensive strategy supporting AI agent development, which frequently demands simultaneous handling of low-priority background tasks alongside time-critical interactive operations.
The pricing update was officially announced by Gemini API product manager Lucia Loher alongside engineering lead Hussein Hassan Harrirou on April 2, 2026.





