User profiles for Woosuk Kwon

Woosuk Kwon

PhD student, UC Berkeley
Verified email at berkeley.edu
Cited by 1125

Efficient memory management for large language model serving with pagedattention

W Kwon, Z Li, S Zhuang, Y Sheng, L Zheng… - Proceedings of the 29th …, 2023 - dl.acm.org
High throughput serving of large language models (LLMs) requires batching sufficiently
many requests at a time. However, existing systems struggle because the key-value cache (KV …

Feedback stabilization of linear systems with delayed control

W Kwon, A Pearson - IEEE Transactions on Automatic control, 1980 - ieeexplore.ieee.org
Feedback controls based on the receding horizon method have proven to be a useful and
easy tool in stabilizing linear ordinary differential systems. In this paper the receding horizon …

A fast post-training pruning framework for transformers

W Kwon, S Kim, MW Mahoney… - Advances in …, 2022 - proceedings.neurips.cc
Pruning is an effective way to reduce the huge inference cost of Transformer models. However,
prior work on pruning Transformers requires retraining the models. This can add high …

Graphene: Strong yet lightweight row hammer protection

Y Park, W Kwon, E Lee, TJ Ham… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
Row Hammer is a serious security threat to modern computing systems using DRAM as main
memory. It causes charge loss in DRAM cells adjacent to a frequently activated aggressor …

Nimble: Lightweight and parallel gpu task scheduling for deep learning

W Kwon, GI Yu, E Jeong… - Advances in Neural …, 2020 - proceedings.neurips.cc
Deep learning (DL) frameworks take advantage of GPUs to improve the speed of DL inference
and training. Ideally, DL frameworks should be able to fully utilize the computation power …

{SkyPilot}: An intercloud broker for sky computing

…, M Luo, WL Chiang, R Bhardwaj, W Kwon… - … USENIX Symposium on …, 2023 - usenix.org
To comply with the increasing number of government regulations about data placement and
processing, and to protect themselves against major cloud outages, many users want the …

Learned token pruning for transformers

…, S Shen, D Thorsley, A Gholami, W Kwon… - Proceedings of the 28th …, 2022 - dl.acm.org
Efficient deployment of transformer models in practice is challenging due to their inference
cost including memory footprint, latency, and power consumption, which scales quadratically …

The ATSC link-layer protocol (ALP): Design and efficiency evaluation

W Kwon, J Hwang, HK Yang, S Hwang… - IEEE Transactions …, 2016 - ieeexplore.ieee.org
In this paper, a novel data link layer protocol adopted in the ATSC 3.0 terrestrial broadcast
standard is described. The data link layer is a protocol layer between the physical layer and …

Risk factors for early septic failure after two-stage exchange total knee arthroplasty for treatment of periprosthetic joint infection

…, KK Park, BW Cho, JY Park, I Kim, HM Kwon - Journal of Orthopaedics …, 2024 - Springer
Background The cause of early septic failure after two-stage exchange revision total knee
arthroplasty (TKA) for chronic periprosthetic joint infection (PJI) and the factors affecting it are …

Optimizing Speculative Decoding for Serving Large Language Models Using Goodput

X Liu, C Daniel, L Hu, W Kwon, Z Li, X Mo… - arXiv preprint arXiv …, 2024 - arxiv.org
Reducing the inference latency of large language models (LLMs) is crucial, and speculative
decoding (SD) stands out as one of the most effective techniques. Rather than letting the …