Advertisement

Thinking Machines Lab Cracks LLM Nondeterminism with Breakthrough Batch-Invariant Kernels

Thinking Machines Lab Cracks LLM Nondeterminism with Breakthrough Batch-Invariant Kernels Startup Stories


AI startup Thinking Machines Lab, founded by former OpenAI CTO Mira Murati, has announced a major breakthrough in large language model (LLM) inference. In its blog post “Defeating Nondeterminism in LLM Inference”, the company highlighted how existing systems struggle with reproducibility due to the lack of batch invariance in inference kernels.

Batch invariance ensures that a model’s output for a given prompt remains consistent regardless of batch size or grouping. Current inference processes fail at this, as operations like matrix multiplications, attention, and normalization introduce subtle numerical shifts that lead to diverging results over long generations.

To address this, the startup developed batch-invariant kernels for critical operations such as RMSNorm, matmul, and attention. Testing on Qwen-3-8B, they found that under default settings, 1,000 runs produced 80 unique completions, whereas the new kernels delivered identical outputs across all runs, achieving full reproducibility.

Though slightly slower, Thinking Machines argues this trade-off is crucial for research, safety, and debugging, redefining how future inference engines should be built.