raphael/prof.png

Weihao Cui | 崔炜皞

Xtra Computing Group, National University of Singapore

Currently, I am a postdoc research fellow working with Prof. Bingsheng He in National University of Singapore. I also work closely with Prof. Minyi Guo, Prof. Quan Chen and Dr. Han Zhao.

I obtained my Ph.D. degree at Department of Computer Science and Engineering (CSE), Shanghai Jiao Tong University, China, supervised by Prof. Quan Chen on AI System and Cloud Computing.

Feel free to contact me via weihao DOT tsui AT gmail DOT com.

News

Feb 24, 2026 One paper accepted to SIGMOD 2026.
Jan 31, 2026 One paper accepted to EuroSys 2026.
Jan 17, 2026 PD-Multiplexing has been accepted by ASPLOS 2026.
Dec 18, 2025 Honored to be selected for the CCF Doctoral Dissertation Incentive Program 2025.
Dec 10, 2025 Two papers accepted to NSDI 2026.
Nov 08, 2025 One paper accepted to HPCA 2026.
Oct 15, 2025 Serving as the Web Chair for ICPP 2026. Submission details are available in the Call for Papers.
Sep 28, 2025 PD-Multiplexing has been merged into SGLang.

Selected publications

  1. arXiv
    Efficient Function-as-a-Service for Large Language Models with TIDAL
    Weihao Cui, Ziyi Xu, Han Zhao, Quan Chen, Zijun Li, Bingsheng He, and Minyi Guo
    arXiv preprint arXiv:2503.06421, 2025
  2. ASPLOS ’26
    Towards High-Goodput LLM Serving with Prefill-decode Multiplexing
    Yukang Chen*, Weihao Cui*, Han Zhao*, Ziyi Xu, Xiaoze Fan, Xusheng Chen, Yangjie Zhou, Shixuan Sun, Bingsheng He, and Quan Chen
    In Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026
  3. NSDI ’26
    Flare: Anomaly Diagnostics for Divergent LLM Training in GPU Clusters of Thousand-Plus Scale
    Weihao Cui, Ji Zhang, Han Zhao, Chao Liu, Wenhao Zhang, Jian Sha, Bingsheng He, Minyi Guo, and Quan Chen
    In Proceedings of the 23rd USENIX Symposium on Networked Systems Design and Implementation, 2026
  4. OSDI ’23
    Optimizing Dynamic Neural Networks with Brainstorm
    Weihao Cui, Zhenhua Han, Lingji Ouyang, Yichuan Wang, Ningxin Zheng, Lingxiao Ma, Yuqing Yang, Fan Yang, Jilong Xue, Lili Qiu, and 4 more authors
    In 17th USENIX Symposium on Operating Systems Design and Implementation, 2023
  5. ATC ’22
    DVABatch: Diversity-aware Multi-Entry Multi-Exit Batching for Efficient Processing of DNN Services on GPUs
    Weihao Cui, Han Zhao, Quan Chen, Hao Wei, Zirui Li, Deze Zeng, Chao Li, and Minyi Guo
    In 2022 USENIX Annual Technical Conference, 2022
  6. SC ’21
    Enable Simultaneous DNN Services Based on Deterministic Operator Overlap and Precise Latency Prediction
    Weihao Cui, Han Zhao, Quan Chen, Ningxin Zheng, Jingwen Leng, Jieru Zhao, Zhuo Song, Tao Ma, Yong Yang, Chao Li, and 1 more author
    In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2021