Minghao Yan
Minghao Yan
Home
Publications
Publications
Type
Conference paper
Preprint
Date
2025
2024
2023
2022
2021
Scaling Inference-Efficient Language Models
Scaling laws are powerful tools to predict the performance of large language models. However, current scaling laws fall short of …
Song Bian*
,
Minghao Yan
,
Shivaram Venkataraman
PDF
Humanity’s Last Exam
Scale AI
PDF
Decoding Speculative Decoding
Speculative Decoding is a widely used technique to speed up inference for Large Language Models (LLMs) without sacrificing quality. …
Minghao Yan
,
Saurabh Agarwal
,
Shivaram Venkataraman
PDF
PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices
As neural networks (NN) are deployed across diverse sectors, their energy demand correspondingly grows. While several prior works have …
Minghao Yan
,
Hongyi Wang
,
Shivaram Venkataraman
PDF
Distributed SLIDE: Enabling Training Large Neural Networks on Low Bandwidth and Simple CPU-Clusters via Model Parallelism and Sparsity
More than 70% of cloud computing is paid for but sits idle. A large fraction of these idle compute are cheap CPUs with few cores that …
Minghao Yan
,
Nicholas Meisburger
,
Tharun Medini
,
Anshumali Shrivastava
PDF
PairConnect: A Compute-Efficient MLP Alternative to Attention
Zhaozhuo Xu
,
Minghao Yan
,
Junyan Zhang
,
Anshumali Shrivastava
PDF
Fast Processing and Querying of 170TB of Genomics Data via a Repeated And Merged BloOm Filter (RAMBO)
Gaurav Gupta*
,
Minghao Yan
,
Benjamin Coleman
,
Bryce Kille
,
R. A. Leo Elworth
,
Tharun Medini
,
Todd Treangen
,
Anshumali Shrivastava
PDF
Video
Cite
×