Google Scholar

User profiles for Woosuk Kwon

Woosuk Kwon

PhD student, UC Berkeley

Verified email at berkeley.edu

Cited by 1125

[PDF] acm.org

Efficient memory management for large language model serving with pagedattention

W Kwon, Z Li, S Zhuang, Y Sheng, L Zheng… - Proceedings of the 29th …, 2023 - dl.acm.org

High throughput serving of large language models (LLMs) requires batching sufficiently
many requests at a time. However, existing systems struggle because the key-value cache (KV …

Save Cite Cited by 635 Related articles All 4 versions

Feedback stabilization of linear systems with delayed control

W Kwon, A Pearson - IEEE Transactions on Automatic control, 1980 - ieeexplore.ieee.org

Feedback controls based on the receding horizon method have proven to be a useful and
easy tool in stabilizing linear ordinary differential systems. In this paper the receding horizon …

Save Cite Cited by 593 Related articles All 5 versions

[PDF] neurips.cc

A fast post-training pruning framework for transformers

W Kwon, S Kim, MW Mahoney… - Advances in …, 2022 - proceedings.neurips.cc

Pruning is an effective way to reduce the huge inference cost of Transformer models. However,
prior work on pruning Transformers requires retraining the models. This can add high …

Save Cite Cited by 97 Related articles All 9 versions View as HTML

[PDF] github.io

Graphene: Strong yet lightweight row hammer protection

Y Park, W Kwon, E Lee, TJ Ham… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org

Row Hammer is a serious security threat to modern computing systems using DRAM as main
memory. It causes charge loss in DRAM cells adjacent to a frequently activated aggressor …

Save Cite Cited by 114 Related articles All 5 versions

[PDF] neurips.cc

Nimble: Lightweight and parallel gpu task scheduling for deep learning

W Kwon, GI Yu, E Jeong… - Advances in Neural …, 2020 - proceedings.neurips.cc

Deep learning (DL) frameworks take advantage of GPUs to improve the speed of DL inference
and training. Ideally, DL frameworks should be able to fully utilize the computation power …

Save Cite Cited by 64 Related articles All 8 versions View as HTML

[PDF] usenix.org

{SkyPilot}: An intercloud broker for sky computing

…, M Luo, WL Chiang, R Bhardwaj, W Kwon… - … USENIX Symposium on …, 2023 - usenix.org

To comply with the increasing number of government regulations about data placement and
processing, and to protect themselves against major cloud outages, many users want the …

Save Cite Cited by 55 Related articles All 3 versions View as HTML

[PDF] acm.org

Learned token pruning for transformers

…, S Shen, D Thorsley, A Gholami, W Kwon… - Proceedings of the 28th …, 2022 - dl.acm.org

Efficient deployment of transformer models in practice is challenging due to their inference
cost including memory footprint, latency, and power consumption, which scales quadratically …

Save Cite Cited by 116 Related articles All 5 versions

The ATSC link-layer protocol (ALP): Design and efficiency evaluation

W Kwon, J Hwang, HK Yang, S Hwang… - IEEE Transactions …, 2016 - ieeexplore.ieee.org

In this paper, a novel data link layer protocol adopted in the ATSC 3.0 terrestrial broadcast
standard is described. The data link layer is a protocol layer between the physical layer and …

Save Cite Cited by 16 Related articles

[PDF] springer.com

Risk factors for early septic failure after two-stage exchange total knee arthroplasty for treatment of periprosthetic joint infection

…, KK Park, BW Cho, JY Park, I Kim, HM Kwon - Journal of Orthopaedics …, 2024 - Springer

Background The cause of early septic failure after two-stage exchange revision total knee
arthroplasty (TKA) for chronic periprosthetic joint infection (PJI) and the factors affecting it are …

Save Cite Cited by 2 Related articles All 9 versions

[PDF] arxiv.org

Optimizing Speculative Decoding for Serving Large Language Models Using Goodput

X Liu, C Daniel, L Hu, W Kwon, Z Li, X Mo… - arXiv preprint arXiv …, 2024 - arxiv.org

Reducing the inference latency of large language models (LLMs) is crucial, and speculative
decoding (SD) stands out as one of the most effective techniques. Rather than letting the …

Save Cite Related articles View as HTML

Create alert

Cite

Advanced search

Saved to My library