Weihao Cui

News

Feb 24, 2026	One paper accepted to SIGMOD 2026.
Jan 31, 2026	One paper accepted to EuroSys 2026.
Jan 17, 2026	PD-Multiplexing has been accepted by ASPLOS 2026.
Dec 18, 2025	Honored to be selected for the CCF Doctoral Dissertation Incentive Program 2025.
Dec 10, 2025	Two papers accepted to NSDI 2026.
Nov 08, 2025	One paper accepted to HPCA 2026.
Oct 15, 2025	Serving as the Web Chair for ICPP 2026. Submission details are available in the Call for Papers.
Sep 28, 2025	PD-Multiplexing has been merged into SGLang.

Selected publications

arXiv

Efficient Function-as-a-Service for Large Language Models with TIDAL

Weihao Cui, Ziyi Xu, Han Zhao, Quan Chen, Zijun Li, Bingsheng He, and Minyi Guo

arXiv preprint arXiv:2503.06421, 2025
ASPLOS ’26

Towards High-Goodput LLM Serving with Prefill-decode Multiplexing

Yukang Chen^*, Weihao Cui^*, Han Zhao^*, Ziyi Xu, Xiaoze Fan, Xusheng Chen, Yangjie Zhou, Shixuan Sun, Bingsheng He, and Quan Chen

In Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026
NSDI ’26

Flare: Anomaly Diagnostics for Divergent LLM Training in GPU Clusters of Thousand-Plus Scale

Weihao Cui, Ji Zhang, Han Zhao, Chao Liu, Wenhao Zhang, Jian Sha, Bingsheng He, Minyi Guo, and Quan Chen

In Proceedings of the 23rd USENIX Symposium on Networked Systems Design and Implementation, 2026
OSDI ’23

Optimizing Dynamic Neural Networks with Brainstorm

Weihao Cui, Zhenhua Han, Lingji Ouyang, Yichuan Wang, Ningxin Zheng, Lingxiao Ma, Yuqing Yang, Fan Yang, Jilong Xue, Lili Qiu, and 4 more authors

In 17th USENIX Symposium on Operating Systems Design and Implementation, 2023
ATC ’22

DVABatch: Diversity-aware Multi-Entry Multi-Exit Batching for Efficient Processing of DNN Services on GPUs

Weihao Cui, Han Zhao, Quan Chen, Hao Wei, Zirui Li, Deze Zeng, Chao Li, and Minyi Guo

In 2022 USENIX Annual Technical Conference, 2022
SC ’21

Enable Simultaneous DNN Services Based on Deterministic Operator Overlap and Precise Latency Prediction

Weihao Cui, Han Zhao, Quan Chen, Ningxin Zheng, Jingwen Leng, Jieru Zhao, Zhuo Song, Tao Ma, Yong Yang, Chao Li, and 1 more author

In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2021