SAMPL: Home

SAMPL is an interdisciplinary machine learning research group exploring problems spanning multiple layers of the system stack including deep learning frameworks, specialized hardware for training and inference, new intermediate representations, differentiable programming, and various applications. We are part of the Paul G. Allen School of Computer Science & Engineering at the University of Washington. Our group is a collaboration between researchers from Sampa, Syslab, PLSE, EFESLab and CMU Catalyst.

News

March 10, 2025

Sorting-Free GPU Kernels for LLM Sampling

Read more »
February 11, 2025
FlashInfer is accepted to MLSys 2025, congratulations!
January 25, 2025
Two papers, Palu and Fiddler, are accepted to ICLR 2025! Come find us in Singapore!
December 16, 2024

FlashInfer 0.2 - Efficient and Customizable Kernels for LLM Inference Serving

Read more »
February 23, 2024
Punica and Atom are accepted to MLSys 2024, congratulations!

Older posts…

Research

Fiddler

CPU-GPU Orchestration for Fast Inference of MoE Models

Punica

Serving multiple LoRA finetuned LLM as one

Atom

Low-bit Quantization for Efficient and Accurate LLM Serving

FlashInfer

Kernel Library for LLM Serving

SparseTIR

Compiler for Sparsity in Deep Learning

Dynamic Tensor Rematerialization

Checkpointing deep learning models as a dynamic analysis

Reticle

Low-level Intermediate Representation (IR) for Programming Modern FPGAs

Glenside

Hardware-software partition exploration with e-graphs.

Parameter Box, Parameter Hub and Parameter Link

Parameter Server for Efficient Distributed Deep Neural Network Training for Clusters, Datacenters, and the Public Clouds

Relay High-Level Intermediate Representation (IR)

High level IR for optimizing machine learning models.

VTA Deep Learning Accelerator

Hardware/Software Deep Learning Acceleration Stack

Sequential Model Specialization

Fast Video Classification via Adaptive Cascading of Deep Models

TVM Stack

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

XGBoost

A Scalable Tree Boosting System

Mission

Model, data, and computing to support learning are three pillars of machine learning. Advances in these three factors enabled breakthroughs in the past, and we believe will enable significant future advances as well. Specialized hardware architectures such as GPUs and TPU-like accelerators fuel the ever-growing computing demand for machine learning workloads. We need to address new challenges in scheduling, networking, storage and programming abstraction to build scalable systems that benefit from emerging hardware architectures and deal with ever-growing available data. Importantly, future models and learning algorithms need to be co-designed with the hardware, and system-level factors need to inform the design of hardware-software stack. We need to build common, reusable infrastructure that works across hardware backends, and use learning to make smarter systems. These challenges and research questions span multiple areas of computer science. Hence, we formed SAMPL (System, Architecture, Machine learning, and Programming language), a joint research group to conduct this-cross stack research. We focus on system, hardware and learning model co-design and build novel architectures, programming abstractions, and learning systems to enable future intelligent systems. We strive to both build tools usable by the general community as well explore new directions in machine learning systems.