- [2026-06-10] FFPA Technical Report is available. The PDF can be found at FFPA: Efficient Flash Prefill Attention for Large Head Dimensions via Split-D.
xlite-dev
Pinned Loading
Repositories
- ffpa-attn Public
🤖FFPA: Extends FlashAttention-2 via Split-D for large headdims, 1.5x~3×↑🎉 vs SDPA, up to 430T🎉 on H200.
xlite-dev/ffpa-attn’s past year of commit activity - .github Public
xlite-dev/.github’s past year of commit activity - sglang Public Forked from sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
xlite-dev/sglang’s past year of commit activity - deepseek-v4-for-copilot Public Forked from Vizards/deepseek-v4-for-copilot
Pick DeepSeek V4 from the Copilot Chat model picker — and keep everything else Copilot already gives you.
xlite-dev/deepseek-v4-for-copilot’s past year of commit activity - cache-dit Public Forked from vipshop/cache-dit
A PyTorch-native inference engine with cache, parallelism, quantization for Diffusion Transformers.
xlite-dev/cache-dit’s past year of commit activity - LeetCUDA Public
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
xlite-dev/LeetCUDA’s past year of commit activity - diffusers Public Forked from huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
xlite-dev/diffusers’s past year of commit activity - svdquant-kernels Public Forked from ultism/svdquant-kernels
Cross-architecture CUDA kernels for SVDQuant (W4A4 with low-rank correction)
xlite-dev/svdquant-kernels’s past year of commit activity - flash-attention Public Forked from Dao-AILab/flash-attention
Fast and memory-efficient exact attention
xlite-dev/flash-attention’s past year of commit activity - cutlass Public Forked from NVIDIA/cutlass
CUDA Templates and Python DSLs for High-Performance Linear Algebra
xlite-dev/cutlass’s past year of commit activity
Top languages
Loading…
Most used topics
Loading…
