DeepSeek, a leading Chinese AI lab by High-Flyer startup, has launched its “Open Source Week” with the release of FlashMLA, an advanced decoding kernel optimized for Hopper GPUs. Designed to process variable-length sequences, FlashMLA is now in production and available for exploration on GitHub.
The FlashMLA kernel supports BF16 and features a paged KV cache with a block size of 64. On the H800 GPU, it delivers 3000 GB/s in memory-bound configurations and achieves 580 TFLOPS in compute-bound tasks. Inspired by FlashAttention 2 & 3 and Cutlass, FlashMLA enhances computational efficiency, particularly in AI applications, and could impact industries such as cryptocurrency trading and high-performance computing.
DeepSeek has also announced plans to open-source five new repositories in the coming week. “We’re a tiny team at DeepSeek exploring AGI (Artificial General Intelligence). Starting next week, we’ll be sharing our progress with full transparency,” the company shared on X.
Currently, DeepSeek hosts 14 open-source models and repositories on Hugging Face, including its latest DeepSeek-R1 and DeepSeek-V3 models, known for their cost-efficient, state-of-the-art performance.