MIT Startup Claims to Have Cracked a Core Mathematical Bottleneck Holding Back LLMs
Key takeaways
- Startup Subquadratic claims to have solved the quadratic scaling problem in transformer attention mechanisms
- Standard attention mechanisms scale quadratically with sequence length, making long contexts extremely compute-intensive
- The company came out of stealth in May 2026 with the announcement
- Independent verification of the claims has not yet been confirmed
Miami-based AI startup Subquadratic came out of stealth last month with a bold claim: it has solved a fundamental mathematical bottleneck that has been limiting how efficiently large language models can process long sequences of text. If the claim holds up, it could meaningfully reduce the compute cost of running and training AI models.
The bottleneck in question relates to the attention mechanism at the heart of transformer-based models. Standard attention scales quadratically with sequence length, meaning doubling the context window roughly quadruples the compute required. Subquadratic, as the name suggests, says it has found an approach that avoids this penalty without sacrificing model quality.
This is exactly the kind of claim that deserves healthy scepticism. The AI field has seen plenty of 'we solved attention' announcements that turned out to be limited to specific use cases or achieved their efficiency gains by quietly trading off capability. The key questions are whether the improvement generalises across model sizes, whether it holds at the scale frontier labs actually train at, and whether independent researchers can reproduce the results.
That said, if Subquadratic's approach is even partially as effective as claimed, the implications are significant. Longer context windows at lower cost would benefit everything from document analysis to code generation, and could push the economics of AI deployment in a more accessible direction.