Google's Gemini 3.5 Pro can now read 2 million tokens at once
Key takeaways
- Gemini 3.5 Pro is reaching general availability with a 2-million-token context window, the largest in any production frontier model
- That is enough to hold an entire codebase, a long novel, or a year of meeting transcripts in a single prompt
- A new Deep Think reasoning mode ships with it, but sits behind the $250-a-month Ultra tier
- The race has shifted from raw capability to context length and who can afford the best features
Google is moving Gemini 3.5 Pro into general availability, and the headline number is the context window: 2 million tokens. That is double what Gemini 3.5 Flash offers and, for now, the largest window in any production frontier model you can actually pay to use.
What 2 million tokens actually means
A token is roughly three-quarters of a word. Two million of them is a lot of reading. To make it concrete: a full novel runs around 100,000 tokens. The entire Harry Potter series is about 1.1 million. A year of a busy team's meeting transcripts sits near 1.5 million.
So a 2-million-token window means you can drop an entire codebase, a stack of contracts, or a year of notes into a single prompt and ask questions across all of it at once. No splitting documents, no losing the thread halfway through. For anyone doing research, legal review or large-scale code work, that is the feature that matters more than another point on a benchmark.
The catch is the price
The launch also ships a Deep Think reasoning mode, where the model works through harder problems step by step before answering. It is gated behind the Ultra subscription at $250 a month. That is the part worth watching.
The AI race has quietly changed shape. Eighteen months ago the fight was about raw capability: whose model was smartest. Now the smart-enough models are everywhere, and the competition has moved to two things, how much the model can hold in its head at once, and how much the best features cost. Frontier capability is increasingly something you rent at a premium, not a baseline everyone gets.
None of this happens in a vacuum either. The same week, access economics across the industry looked shaky, with one rival model pulled offline for days over export controls. The pitch of cheap, unlimited AI is running into the reality of what serving these models actually costs. A 2-million-token window is genuinely useful. It is also expensive to run, and someone has to pay for it.