Research Projects

Fiddler

CPU-GPU Orchestration for Fast Inference of MoE Models

Read more »

Punica

Serving multiple LoRA finetuned LLM as one

Read more »

Atom

Low-bit Quantization for Efficient and Accurate LLM Serving

Read more »

FlashInfer

Kernel Library for LLM Serving

Read more »

SparseTIR

Compiler for Sparsity in Deep Learning

Read more »

Dynamic Tensor Rematerialization

Checkpointing deep learning models as a dynamic analysis

Read more »

Reticle

Low-level Intermediate Representation (IR) for Programming Modern FPGAs

Read more »

Glenside

Hardware-software partition exploration with e-graphs.

Read more »

Parameter Box, Parameter Hub and Parameter Link

Parameter Server for Efficient Distributed Deep Neural Network Training for Clusters, Datacenters, and the Public Clouds

Read more »

Relay High-Level Intermediate Representation (IR)

High level IR for optimizing machine learning models.

Read more »

VTA Deep Learning Accelerator

Hardware/Software Deep Learning Acceleration Stack

Read more »

Sequential Model Specialization

Fast Video Classification via Adaptive Cascading of Deep Models

Read more »

TVM Stack

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

Read more »

XGBoost

A Scalable Tree Boosting System

Read more »