Home
People
Research
Blog
Publications
Talks
Code
Blog
February 23, 2024
Punica
and
Atom
are accepted to
MLSys 2024
, congratulations!
February 02, 2024
Cascade Inference: Memory Bandwidth Efficient Shared Prefix Batch Decoding
Read more »
February 02, 2024
Accelerating Self-Attentions for LLM Serving with FlashInfer
Read more »
September 11, 2023
Potentials of Multitenancy Fine-Tuned LLM Serving
Read more »
May 13, 2023
Dissecting Batching Effects in GPT Inference
Read more »
May 01, 2023
Bringing Hardware Accelerated Language Models to Consumer Devices
Read more »