Home
People
Research
Blog
Publications
Talks
Code
Blog
March 10, 2025
Sorting-Free GPU Kernels for LLM Sampling
Read more »
February 11, 2025
FlashInfer
is accepted to
MLSys 2025
, congratulations!
January 25, 2025
Two papers,
Palu
and
Fiddler
, are accepted to
ICLR 2025
! Come find us in Singapore!
December 16, 2024
FlashInfer 0.2 - Efficient and Customizable Kernels for LLM Inference Serving
Read more »
February 23, 2024
Punica
and
Atom
are accepted to
MLSys 2024
, congratulations!
February 02, 2024
Cascade Inference: Memory Bandwidth Efficient Shared Prefix Batch Decoding
Read more »
February 02, 2024
Accelerating Self-Attentions for LLM Serving with FlashInfer
Read more »
September 11, 2023
Potentials of Multitenancy Fine-Tuned LLM Serving
Read more »
May 13, 2023
Dissecting Batching Effects in GPT Inference
Read more »
May 01, 2023
Bringing Hardware Accelerated Language Models to Consumer Devices
Read more »