Publications

PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution

Large Language Models (LLMs) have emerged as powerful operators for evolutionary search, yet the design of efficient search scaffolds …

Minghao Yan, Bo Peng, Benjamin Coleman, Ziqi Chen, Zhouhang Xie, Shuo Chen, Zhankui He, Noveen Sachdeva, Isabella Ye, Weili Wang, Chi Wang, Ed H. Chi, Fernando Pereira, Wang-Cheng Kang, Derek Zhiyuan Cheng, Beidou Wang

TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs

Speculative decoding (SD) has proven effective for accelerating LLM inference by quickly generating draft tokens and verifying them in …

Minjae Lee, Wonjun Kang, Byeongkeun Ahn, Christian Classen, Kevin Galim, Seunghyuk Oh, Minghao Yan, Hyung Il Koo, Kangwook Lee

What Limits Agentic Systems Efficiency?

Large Language Models (LLMs), such as OpenAI-o1 and DeepSeek-R1, have demonstrated strong reasoning capabilities. To further enhance …

Song Bian, Minghao Yan, Anand Jayarajan, Gennady Pekhimenko, Shivaram Venkataraman

Diamond: Harnessing GPU Resources for Scientific Deep Learning

Modern research computing cyberinfrastructure, such as ACCESS-CI and NAIRR Pilot, offers GPU resources across geographically …

Haotian Xie, Rohan Marwaha, Minu Mathew, Song Bian, Gengcong Yang, Minghao Yan, Yadu Babuji, Owen Price, Yinzhi Wang, Volodymyr Kindratenko, Shivaram Venkataraman, Kyle Chard, Ian T. Foster, Zhao Zhang

PLoRA: Efficient LoRA Hyperparameter Tuning for Large Models

Low-rank Adaptation (LoRA) has gained popularity as a fine-tuning approach for Large Language Models (LLMs) due to its low resource …

Minghao Yan, Zhuang Wang, Zhen Jia, Shivaram Venkataraman, Yida Wang

Scaling Inference-Efficient Language Models

Scaling laws are powerful tools to predict the performance of large language models. However, current scaling laws fall short of …

Song Bian, Minghao Yan, Shivaram Venkataraman

Humanity’s Last Exam

Scale AI

Decoding Speculative Decoding

Speculative Decoding is a widely used technique to speed up inference for Large Language Models (LLMs) without sacrificing quality. …

Minghao Yan, Saurabh Agarwal, Shivaram Venkataraman

PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices