Side-by-side comparison of stars, features, and trends
FlashMLA is a library of high-performance attention kernels developed by DeepSeek to power their V3 and V3.2-Exp models. The repository provides specialized implementations for both sparse and dense attention, supporting efficient prefill and decoding stages. These kernels are designed for modern GPU architectures to deliver significant performance improvements in compute-bound workloads.
FlashMLA is a library of high-performance attention kernels developed by DeepSeek to power their V3 and V3.2-Exp models. The repository provides specialized implementations for both sparse and dense attention, supporting efficient prefill and decoding stages. These kernels are designed for modern GPU architectures to deliver significant performance improvements in compute-bound workloads.