Minghao Yan
Minghao Yan
Home
Preprints
Publications
Publications
Type
Conference paper
Preprint
Date
2024
2023
2022
2021
Decoding Speculative Decoding
Speculative Decoding is a widely used technique to speed up inference for Large Language Models (LLMs) without modifying its outcome. …
Minghao Yan
,
Saurabh Agarwal
,
Shivaram Venkataraman
PDF
PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices
As neural networks (NN) are deployed across diverse sectors, their energy demand correspondingly grows. While several prior works have …
Minghao Yan
,
Hongyi Wang
,
Shivaram Venkataraman
PDF
Distributed SLIDE: Enabling Training Large Neural Networks on Low Bandwidth and Simple CPU-Clusters via Model Parallelism and Sparsity
More than 70% of cloud computing is paid for but sits idle. A large fraction of these idle compute are cheap CPUs with few cores that …
Minghao Yan
,
Nicholas Meisburger
,
Tharun Medini
,
Anshumali Shrivastava
PDF
PairConnect: A Compute-Efficient MLP Alternative to Attention
Zhaozhuo Xu
,
Minghao Yan
,
Junyan Zhang
,
Anshumali Shrivastava
PDF
Fast Processing and Querying of 170TB of Genomics Data via a Repeated And Merged BloOm Filter (RAMBO)
Gaurav Gupta*
,
Minghao Yan
,
Benjamin Coleman
,
Bryce Kille
,
R. A. Leo Elworth
,
Tharun Medini
,
Todd Treangen
,
Anshumali Shrivastava
PDF
Video
Cite
×