CXO TechBOT

DeepSeek Unveils FlashMLA Kernel for Hopper GPUs, Enhancing AI Computational Efficiency

DeepSeek, a leading Chinese AI lab by High-Flyer startup, has launched its “Open Source Week” with the release of FlashMLA, an advanced decoding kernel optimized for Hopper GPUs. Designed to process variable-length sequences, FlashMLA is now in production and available for exploration on GitHub.

The FlashMLA kernel supports BF16 and features a paged KV cache with a block size of 64. On the H800 GPU, it delivers 3000 GB/s in memory-bound configurations and achieves 580 TFLOPS in compute-bound tasks. Inspired by FlashAttention 2 & 3 and Cutlass, FlashMLA enhances computational efficiency, particularly in AI applications, and could impact industries such as cryptocurrency trading and high-performance computing.

DeepSeek has also announced plans to open-source five new repositories in the coming week. “We’re a tiny team at DeepSeek exploring AGI (Artificial General Intelligence). Starting next week, we’ll be sharing our progress with full transparency,” the company shared on X.

Currently, DeepSeek hosts 14 open-source models and repositories on Hugging Face, including its latest DeepSeek-R1 and DeepSeek-V3 models, known for their cost-efficient, state-of-the-art performance.

DeepSeek Unveils FlashMLA Kernel for Hopper GPUs, Enhancing AI Computational Efficiency

Latest Stories

Community

About

Digital

Search

DeepSeek Unveils FlashMLA Kernel for Hopper GPUs, Enhancing AI Computational Efficiency

Latest Stories

Join the Tech Revolution

Community

About

Digital