G
GetLLMs
ModelAI model APIs

Gemini 3.5 Flash

Gemini 3.5 Flash is Google's stable Flash-tier Gemini 3 model for sustained frontier performance on agentic coding, multi-step workflows, long-context tasks, and multimodal understanding through the Gemini API and Google product surfaces.

Why it matters

Gemini 3.5 Flash changes the usual Flash-model decision from only speed and cost to agentic capability. Readers need to know the exact model ID, supported modalities, token limits, tool support, pricing, and where it is available before using it in coding agents, enterprise workflows, or model API comparisons.

Source-backed summary

Google AI for Developers lists Gemini 3.5 Flash as a stable Gemini 3 model with model code gemini-3.5-flash, text/image/video/audio/PDF inputs, text output, a 1,048,576 token input limit, a 65,536 token output limit, thinking, structured outputs, function calling, file search, URL context, code execution, search grounding, Maps grounding, caching, Batch API, Flex inference, and Priority inference. Google DeepMind describes it as the next iteration in the Gemini 3 Flash line, based on Gemini 3 Flash, and suited for agents, coding, long-horizon tasks, and multimodal work. Community sampling on May 20, 2026 found three themes: excitement about Antigravity speed and agentic coding, skepticism about benchmark-to-real-work transfer, and concern that the higher Flash-tier price may erase the usual Flash cost advantage.

Primary use cases
  • Run fast iterative coding agents and sub-agent workflows.
  • Process long multimodal context with text, image, video, audio, or PDF input.
  • Use Gemini API tool support such as function calling, structured outputs, search grounding, file search, URL context, and code execution.
  • Compare Flash-tier pricing against Gemini 3 Flash, Gemini 3.1 Pro, Claude, GPT, and other coding-capable models.
What Google documents

Google documents Gemini 3.5 Flash as a stable model in the Gemini 3 family with the model code gemini-3.5-flash. It accepts text, image, video, audio, and PDF inputs and returns text output. The developer docs list a 1,048,576 token input limit, a 65,536 token output limit, January 2025 knowledge cutoff, and support for thinking, function calling, structured outputs, search grounding, Maps grounding, URL context, file search, code execution, context caching, Batch API, Flex inference, and Priority inference.

  • Use it when the task needs fast agentic coding loops, multi-step workflows, multimodal inputs, or long-context reasoning.
  • Do not treat it as an image, audio, Live API, or computer-use output model; the documented output is text.
  • For production API work, use the stable model string gemini-3.5-flash rather than the older gemini-3-flash-preview string.
Agent and coding fit

Google positions Gemini 3.5 Flash around sub-agent deployment, complex coding cycles, multi-step workflows, and long-horizon tasks. DeepMind benchmark tables emphasize agentic terminal coding, SWE-Bench Pro, MCP workflows, tool use, OSWorld-style UI control, financial analysis, multimodal chart reasoning, and long-context evaluation. Those results are useful for model screening, but teams should still run their own task-specific evals before replacing an existing production coding or agent model.

Cost and availability caveats

The Gemini API pricing page lists standard paid pricing at $1.50 per 1M input tokens and $9.00 per 1M output tokens, with lower Batch and Flex prices and higher Priority prices. Community threads focus heavily on whether that price still feels like a Flash-tier bargain. Use official pricing for factual cost fields, and use Reddit or forum discussion only as evidence of user concern and comparison demand.

What Reddit and X discussion adds

X launch posts from Google, Google DeepMind, GitHub, and Google employees reinforce the official framing: Gemini 3.5 Flash is aimed at agentic coding, long-horizon work, tool use, cache efficiency, and broad availability. Reddit discussion is more mixed. Antigravity users report unusually fast output, more detailed artifacts, and a model that feels closer to a Pro-class coding assistant, while other threads warn that benchmark strength may not transfer to every real task and that teams should run their own evals before swapping it into production.

  • Positive signal: Antigravity and coding-tool users notice speed, structured output, and stronger tool-workflow fit.
  • Skeptical signal: Reddit users question benchmark-heavy marketing and report task-specific underperformance in some saved evals.
  • Cost signal: Reddit users repeatedly compare the official $1.50/$9.00 standard price against older Flash models, Gemini Pro-class pricing, GPT, Claude, and Flash-Lite expectations.

Gemini 3.5 Flash FAQ

Page-level questions for Gemini 3.5 Flash.

What is the model ID for Gemini 3.5 Flash?+

The stable Gemini API model ID is gemini-3.5-flash. Google lists gemini-3-flash-preview separately as the preview version pattern, so production API users should verify they are calling the stable gemini-3.5-flash string when they want the new 3.5 Flash model.

Is Gemini 3.5 Flash good for coding agents?+

Gemini 3.5 Flash is explicitly positioned for agentic coding, sub-agent deployment, multi-step workflows, and long-horizon tasks. It supports tool-oriented API features such as function calling, structured outputs, file search, URL context, search grounding, and code execution. For production coding agents, run task-specific evals because benchmark strength does not automatically prove reliability in a particular repo or harness.

Does Gemini 3.5 Flash generate images, audio, or live voice output?+

No. The Gemini 3.5 Flash developer page lists text, image, video, audio, and PDF inputs, but text output. It also lists image generation, audio generation, Live API, and computer use as not supported. Use separate Gemini media or Live API models when the output needs image, audio, or real-time voice behavior.

Why are users debating Gemini 3.5 Flash pricing?+

Users are debating pricing because the official Gemini API page lists standard paid pricing at $1.50 per 1M input tokens and $9.00 per 1M output tokens, which is higher than many older Flash-tier expectations. That does not make the model overpriced by itself; the right comparison depends on output quality, latency, cache use, Batch or Flex pricing, and whether it replaces a more expensive Pro-class model in the workflow.

What are early users saying about Gemini 3.5 Flash?+

Early user discussion is split. X launch posts and some Antigravity users emphasize speed, agentic coding, tool use, and more detailed coding artifacts. Reddit threads add caution: several users want real-task evals before trusting benchmark claims, and some report that the higher token price or token-hungry behavior can weaken the cost advantage. Treat these as early community signals, not settled benchmark facts.