Thinking Machines Lab Cracks LLM Nondeterminism with Breakthrough Batch-Invariant Kernels

Startup Stories

AI startup Thinking Machines Lab, founded by former OpenAI CTO Mira Murati, has announced a major breakthrough in large language model (LLM) inference. In its blog post “Defeating Nondeterminism in LLM Inference”, the company highlighted how existing systems struggle with reproducibility due to the lack of batch invariance in inference kernels.

Batch invariance ensures that a model’s output for a given prompt remains consistent regardless of batch size or grouping. Current inference processes fail at this, as operations like matrix multiplications, attention, and normalization introduce subtle numerical shifts that lead to diverging results over long generations.

To address this, the startup developed batch-invariant kernels for critical operations such as RMSNorm, matmul, and attention. Testing on Qwen-3-8B, they found that under default settings, 1,000 runs produced 80 unique completions, whereas the new kernels delivered identical outputs across all runs, achieving full reproducibility.

Though slightly slower, Thinking Machines argues this trade-off is crucial for research, safety, and debugging, redefining how future inference engines should be built.

Apurva Pundir takes charge as Chief People & Culture Officer at Trehan Iris

Sep 15, 2025

Ajay Malik Takes Charge as Chief Strategy Officer at RISE Infraventures Limited

Sep 15, 2025