Blog

March 10, 2025

Sorting-Free GPU Kernels for LLM Sampling

Read more »
February 11, 2025
FlashInfer is accepted to MLSys 2025, congratulations!
January 25, 2025
Two papers, Palu and Fiddler, are accepted to ICLR 2025! Come find us in Singapore!
December 16, 2024

FlashInfer 0.2 - Efficient and Customizable Kernels for LLM Inference Serving

Read more »
February 23, 2024
Punica and Atom are accepted to MLSys 2024, congratulations!
February 02, 2024

Cascade Inference: Memory Bandwidth Efficient Shared Prefix Batch Decoding

Read more »
February 02, 2024

Accelerating Self-Attentions for LLM Serving with FlashInfer

Read more »
September 11, 2023

Potentials of Multitenancy Fine-Tuned LLM Serving

Read more »
May 13, 2023

Dissecting Batching Effects in GPT Inference

Read more »
May 01, 2023

Bringing Hardware Accelerated Language Models to Consumer Devices

Read more »