default search action
51st ISCA 2024: Buenos Aires, Argentina
- 51st ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2024, Buenos Aires, Argentina, June 29 - July 3, 2024. IEEE 2024, ISBN 979-8-3503-2658-1
- Ishita Chaturvedi, Bhargav Reddy Godala, Yucan Wu, Ziyang Xu, Konstantinos Iliakis, Panagiotis-Eleftherios Eleftherakis, Sotirios Xydis, Dimitrios Soudris, Tyler Sorensen, Simone Campanoni, Tor M. Aamodt, David I. August:
GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction. 1-16 - Yunzhe Liu, Xinyu Li, Tingting Zhang, Tianyi Liu, Qi Guo, Fuxin Zhang, Jian Wang:
AVM-BTB: Adaptive and Virtualized Multi-level Branch Target Buffer. 17-31 - Anubhav Bhatla, Navneet, Biswabandan Panda:
The Maya Cache: A Storage-efficient and Secure Fully-associative Last-level Cache. 32-44 - Ruibing Song, Chunshu Wu, Chuan Liu, Ang Li, Michael C. Huang, Tong Geng:
DS-GL: Advancing Graph Learning via Harnessing Nature's Power within Scalable Dynamical Systems. 45-57 - Hao-Wei Chiang, Chin-Fu Nien, Hsiang-Yun Cheng, Kuei-Po Huang:
ReAIM: A ReRAM-based Adaptive Ising Machine for Solving Combinatorial Optimization Problems. 58-72 - Cansu Demirkiran, Guowei Yang, Darius Bunandar, Ajay Joshi:
Mirage: An RNS-Based Photonic Accelerator for DNN Training. 73-87 - Rahul Bera, Adithya Ranganathan, Joydeep Rakshit, Sujit Mahto, Anant V. Nori, Jayesh Gaur, Ataberk Olgun, Konstantinos Kanellopoulos, Mohammad Sadrosadati, Sreenivas Subramoney, Onur Mutlu:
Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution. 88-102 - Peiyi Li, Ji Liu, Alvin Gonzales, Zain H. Saleem, Huiyang Zhou, Paul D. Hovland:
QuTracer: Mitigating Quantum Gate and Measurement Errors by Tracing Subsets of Qubits. 103-117 - Pratyush Patel, Esha Choukse, Chaojie Zhang, Aashaka Shah, Íñigo Goiri, Saeed Maleki, Ricardo Bianchini:
Splitwise: Efficient Generative LLM Inference Using Phase Splitting. 118-132 - Michele Marazzi, Tristan Sachsenweger, Flavien Solt, Peng Zeng, Kubo Takashi, Maksym Yarema, Kaveh Razavi:
HiFi-DRAM: Enabling High-fidelity DRAM Research by Uncovering Sense Amplifiers with IC Imaging. 133-149 - Qijing Huang, Po-An Tsai, Joel S. Emer, Angshuman Parashar:
Mind the Gap: Attainable Data Movement and Operational Intensity Bounds for Tensor Algorithms. 150-166 - Xiaofeng Hou, Tongqiao Xu, Chao Li, Cheng Xu, Jiacheng Liu, Yang Hu, Jieru Zhao, Jingwen Leng, Kwang-Ting Cheng, Minyi Guo:
A Tale of Two Domains: Exploring Efficient Architecture Design for Truly Autonomous Things. 167-181 - Weihang Li, Andrés Goens, Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin:
Determining the Minimum Number of Virtual Networks for Different Coherence Protocols. 182-197 - Jianming Tong, Anirudh Itagi, Prasanth Chatarasi, Tushar Krishna:
FEATHER: A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching. 198-214 - Shuangliang Chen, Saptadeep Pal, Rakesh Kumar:
Waferscale Network Switches. 215-229 - Guillem López-Paradís, Isaac M. Hair, Sid Kannan, Roman Rabbat, Parker Murray, Alex Lopes, Rory Zahedi, Winston Zuo, Jonathan Balkind:
The Case For Data Centre Hyperloops. 230-244 - Si Ung Noh, Junguk Hong, Chaemin Lim, Seongyeon Park, Jeehyun Kim, Hanjun Kim, Youngsok Kim, Jinho Lee:
PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices. 245-260 - Junyu Zhou, Yuhao Liu, Yunong Shi, Ali Javadi-Abhari, Gushu Li:
Bosehedral: Compiler Optimization for Bosonic Quantum Computing. 261-276 - Yuwei Jin, Zirui Li, Fei Hua, Tianyi Hao, Huiyang Zhou, Yipeng Huang, Eddy Z. Zhang:
Tetris: A Compilation Framework for VQA Applications in Quantum Computing. 277-292 - Hanrui Wang, Pengyu Liu, Daniel Bochen Tan, Yilian Liu, Jiaqi Gu, David Z. Pan, Jason Cong, Umut A. Acar, Song Han:
Atomique: A Quantum Compiler for Reconfigurable Neutral Atom Arrays. 293-309 - Alireza Seif, Haoran Liao, Vinay Tripathi, Kevin Krsulich, Moein Malekakhlagh, Mirko Amico, Petar Jurcevic, Ali Javadi-Abhari:
Suppressing Correlated Noise in Quantum Computers via Context-Aware Compiling. 310-324 - Daniel Bochen Tan, Murphy Yuezhen Niu, Craig Gidney:
A SAT Scalpel for Lattice Surgery: Representation and Synthesis of Subroutines for Surface-Code Fault-Tolerant Quantum Computing. 325-339 - Yunjae Lee, Hyeseong Kim, Minsoo Rhu:
PreSto: An In-Storage Data Preprocessing System for Training Recommendation Models. 340-353 - Daehyeon Baek, Soojin Hwang, Jaehyuk Huh:
pSyncPIM: Partially Synchronous Execution of Sparse Matrix Operations for All-Bank PIM Architectures. 354-367 - Yitu Wang, Shiyu Li, Qilin Zheng, Linghao Song, Zongwang Li, Andrew Chang, Hai Li, Yiran Chen:
NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing. 368-381 - Haifeng Liu, Long Zheng, Yu Huang, Jingyi Zhou, Chaoqiang Liu, Runze Wang, Xiaofei Liao, Hai Jin, Jingling Xue:
Enabling Efficient Large Recommendation Model Training with Near CXL Memory Processing. 382-395 - Zhiheng Yue, Huizheng Wang, Jiahao Fang, Jinyi Deng, Guangyang Lu, Fengbin Tu, Ruiqi Guo, Yuxuan Li, Yubin Qin, Yang Wang, Chao Li, Huiming Han, Shaojun Wei, Yang Hu, Shouyi Yin:
Exploiting Similarity Opportunities of Emerging Vision AI Models on Hybrid Bonding Architecture. 396-409 - Yujeong Choi, Jiin Kim, Minsoo Rhu:
ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models. 410-423 - Liao Chen, Shutian Luo, Chenyu Lin, Zizhao Mo, Huanle Xu, Kejiang Ye, ChengZhong Xu:
Derm: SLA-aware Resource Management for Highly Dynamic Microservices. 424-436 - Jovan Stojkovic, Pulkit A. Misra, Íñigo Goiri, Sam Whitlock, Esha Choukse, Mayukh Das, Chetan Bansal, Jason Lee, Zoey Sun, Haoran Qiu, Reed Zimmermann, Savyasachi Samal, Brijesh Warrier, Ashish Raniwala, Ricardo Bianchini:
SmartOClock: Workload- and Risk-Aware Overclocking in the Cloud. 437-451 - Jaylen Wang, Daniel S. Berger, Fiodar Kazhamiaka, Celine Irvene, Chaojie Zhang, Esha Choukse, Kali Frost, Rodrigo Fonseca, Brijesh Warrier, Chetan Bansal, Jonathan Stern, Ricardo Bianchini, Akshitha Sriraman:
Designing Cloud Servers for Lower Carbon. 452-470 - Jovan Stojkovic, Nikoleta Iliakopoulou, Tianyin Xu, Hubertus Franke, Josep Torrellas:
EcoFaaS: Rethinking the Design of Serverless Environments for Energy Efficiency. 471-486 - Joseph Rogers, Taha Soliman, Magnus Jahre:
AIO: An Abstraction for Performance Analysis Across Diverse Accelerator Architectures. 487-500 - Joonho Whangbo, Edwin Lim, Chengyi Lux Zhang, Kevin Anderson, Abraham Gonzalez, Raghav Gupta, Nivedha Krishnakumar, Sagar Karandikar, Borivoje Nikolic, Yakun Sophia Shao, Krste Asanovic:
FireAxe: Partitioned FPGA-Accelerated Simulation of Large-Scale RTL Designs. 501-515 - Nikos Karystinos, Odysseas Chatzopoulos, George-Marios Fragkoulis, George Papadimitriou, Dimitris Gizopoulos, Sudhanva Gurumurthi:
Harpocrates: Breaking the Silence of CPU Faults through Hardware-in-the-Loop Program Generation. 516-531 - Nathan Zhang, Rubens Lacouture, Gina Sohn, Paul Mure, Qizheng Zhang, Fredrik Kjolstad, Kunle Olukotun:
The Dataflow Abstract Machine Simulator Framework. 532-547 - Mohammad Bakhshalipour, Phillip B. Gibbons:
Tartan: Microarchitecting a Robotic Processor. 548-565 - Deval Shah, Tor M. Aamodt:
Collision Prediction for Robotics Accelerators. 566-581 - Seunghee Han, Seungjae Moon, Teokkyu Suh, Jaehoon Heo, Joo-Young Kim:
BLESS: Bandwidth and Locality Enhanced SMEM Seeding Acceleration for DNA Sequencing. 582-596 - Julian Pavon, Iván Vargas Valdivieso, Carlos Rojas, César Hernández, Mehmet Aslan, Roger Figueras, Yichao Yuan, Joël Lindegger, Mohammed Alser, Francesc Moll, Santiago Marco-Sola, Oguz Ergin, Nishil Talati, Onur Mutlu, Osman S. Unsal, Mateo Valero, Adrián Cristal:
QUETZAL: Vector Acceleration Framework for Modern Genome Sequence Analysis Algorithms. 597-612 - Jinghan Huang, Jiaqi Lou, Srikar Vanavasam, Xinhao Kong, Houxiang Ji, Ipoom Jeong, Danyang Zhuo, Eun Kyung Lee, Nam Sung Kim:
HAL: Hardware-assisted Load Balancing for Energy-efficient SNIC-Host Cooperative Computing. 613-627 - Boyu Tian, Yiwei Li, Li Jiang, Shuangyu Cai, Mingyu Gao:
NDPBridge: Enabling Cross-Bank Coordination in Near-DRAM-Bank Processing Architectures. 628-643 - Yilong Zhao, Mingyu Gao, Fangxin Liu, Yiwei Hu, Zongwu Wang, Han Lin, Jin Li, He Xian, Hanlin Dong, Tao Yang, Naifeng Jing, Xiaoyao Liang, Li Jiang:
UM-PIM: DRAM-based PIM with Uniform & Shared Memory Space. 644-659 - Nika Mansouri-Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Mao, Joël Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, Onur Mutlu:
MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing. 660-677 - Hüsrev Cilasun, Salonik Resch, Zamshed I. Chowdhury, Masoud Zabihi, Yang Lv, Brandon Zink, Jianping Wang, Sachin S. Sapatnekar, Ulya R. Karpuzcu:
On Error Correction for Nonvolatile Processing-In-Memory. 678-692 - Md Hafizul Islam Chowdhuryy, Hao Zheng, Fan Yao:
MetaLeak: Uncovering Side Channels in Secure Processor Architectures Exploiting Metadata. 693-707 - Erhu Feng, Dahu Feng, Dong Du, Yubin Xia, Haibo Chen:
sNPU: Trusted Execution Environments on Integrated NPUs. 708-723 - Xin Wang, Jagadish Kotra, Alex Jones, Wenjie Xiong, Xun Jian:
Counter-light Memory Encryption. 724-738 - Tae Hoon Kim, David Rudo, Kaiyang Zhao, Zirui Neil Zhao, Dimitrios Skarlatos:
Perspective: A Principled Framework for Pliable and Secure Speculation in Operating Systems. 739-755 - Rashmi S. Agrawal, Anantha P. Chandrakasan, Ajay Joshi:
HEAP: A Fully Homomorphic Encryption Accelerator with Parallelized Bootstrapping. 756-769 - Dai Cheol Jung, Max Ruttenberg, Paul Gao, Scott Davidson, Daniel Petrisko, Kangli Li, Aditya K. Kamath, Lin Cheng, Shaolin Xie, Peitian Pan, Zhongyuan Zhao, Zichao Yue, Bandhav Veluri, Sripathi Muralitharan, Adrian Sampson, Andrew Lumsdaine, Zhiru Zhang, Christopher Batten, Mark Oskin, Dustin Richmond, Michael Bedford Taylor:
Scalable, Programmable and Dense: The HammerBlade Open-Source RISC-V Manycore. 770-784 - Apostolos Kokolis, Antonis Psistakis, Benjamin Reidys, Jian Huang, Josep Torrellas:
HADES: Hardware-Assisted Distributed Transactions in the Age of Fast Networks and SmartNICs. 785-800 - Martin Cochet, Karthik Swaminathan, Erik Jens Loscalzo, Joseph Zuckerman, Maico Cassel dos Santos, Davide Giri, Alper Buyuktosunoglu, Tianyu Jia, David Brooks, Gu-Yeon Wei, Kenneth L. Shepard, Luca P. Carloni, Pradip Bose:
BlitzCoin: Fully Decentralized Hardware Power Management for Accelerator-Rich SoCs. 801-817 - Samuel Hsia, Alicia Golden, Bilge Acun, Newsha Ardalani, Zachary DeVito, Gu-Yeon Wei, David Brooks, Carole-Jean Wu:
MAD-Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems. 818-833 - Yuan Feng, Seonjin Na, Hyesoon Kim, Hyeran Jeon:
Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs. 834-847 - Yifan Yuan, Ren Wang, Narayan Ranganathan, Nikhil Rao, Sanjay Kumar, Philip Lantz, Vivekananthan Sanjeepan, Jorge Cabrera, Atul Kwatra, Rajesh Sankaran, Ipoom Jeong, Nam Sung Kim:
Intel Accelerators Ecosystem: An SoC-Oriented Perspective : Industry Product. 848-862 - Yuan Li, Jianbin Zhu, Yao Fu, Yu Lei, Toshio Nagata, Ryan Braidwood, Haohuan Fu, Juepeng Zheng, Wayne Luk, Hongxiang Fan:
Circular Reconfigurable Parallel Processor for Edge Computing : Industrial Product ✶. 863-875 - Alan Smith, Gabriel H. Loh, Michael J. Schulte, Mike Ignatowski, Samuel Naffziger, Mike Mantor, Nathan Kalyanasundharam, Vamsi Alla, Nicholas Malaya, Joseph L. Greathouse, Eric Chapman, Raja Swaminathan:
Realizing the AMD Exascale Heterogeneous Processor Vision : Industry Product. 876-889 - Hanjoon Kim, Younggeun Choi, Junyoung Park, Byeongwook Bae, Hyunmin Jeong, Sang Min Lee, Jeseung Yeon, Minho Kim, Changjae Park, Boncheol Gu, Changman Lee, Jaeick Bae, SungGyeong Bae, Yojung Cha, Wooyoung Choe, Jonguk Choi, Juho Ha, Hyuck Han, Namoh Hwang, Seokha Hwang, Kiseok Jang, Haechan Je, Hojin Jeon, Jaewoo Jeon, Hyunjun Jeong, Yeonsu Jung, Dongok Kang, Hyewon Kim, Minjae Kim, Muhwan Kim, Sewon Kim, Suhyung Kim, Won Kim, Yong Kim, Youngsik Kim, Younki Ku, Jeong Ki Lee, Juyun Lee, Kyungjae Lee, Seokho Lee, Minwoo Noh, Hyuntaek Oh, Gyunghee Park, Sanguk Park, Jimin Seo, Jungyoung Seong, June Paik, Nuno P. Lopes, Sungjoo Yoo:
TCP: A Tensor Contraction Processor for AI Workloads Industrial Product. 890-902 - Weihao Kong, Yifan Hao, Qi Guo, Yongwei Zhao, Xinkai Song, Xiaqing Li, Mo Zou, Zidong Du, Rui Zhang, Chang Liu, Yuanbo Wen, Pengwei Jin, Xing Hu, Wei Li, Zhiwei Xu, Tianshi Chen:
Cambricon-D: Full-Network Differential Acceleration for Diffusion Models. 903-914 - Xiurui Pan, Yuda An, Shengwen Liang, Bo Mao, Mingzhe Zhang, Qiao Li, Myoungsoo Jung, Jie Zhang:
Flagger: Cooperative Acceleration for Large-Scale Cross-Silo Federated Learning Aggregation. 915-930 - Yifan Yang, Joel S. Emer, Daniel Sánchez:
Trapezoid: A Versatile Accelerator for Dense and Sparse Matrix Multiplications. 931-945 - Kaustubh Shivdikar, Nicolas Bohm Agostini, Malith Jayaweera, Gilbert Jonatan, José L. Abellán, Ajay Joshi, John Kim, David R. Kaeli:
NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial Accelerator. 946-960 - Jianping Zeng, Tong Zhang, Changhee Jung:
Compiler-Directed Whole-System Persistence. 961-977 - Mojtaba Abaie Shoushtary, José-María Arnau, Jordi Tubella Murgadas, Antonio González:
Memento: An Adaptive, Compiler-Assisted Register File Cache for GPUs. 978-990 - Fuyu Wang, Minghua Shen, Yufei Ding, Nong Xiao:
Soter: Analytical Tensor-Architecture Modeling and Automatic Tensor Program Tuning for Spatial Accelerators. 991-1004 - Youpeng Zhao, Di Wu, Jun Wang:
ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching. 1005-1017 - Ranggi Hwang, Jianyu Wei, Shijie Cao, Changho Hwang, Xiaohu Tang, Ting Cao, Mao Yang:
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference. 1018-1031 - Yubin Qin, Yang Wang, Zhiren Zhao, Xiaolong Yang, Yang Zhou, Shaojun Wei, Yang Hu, Shouyi Yin:
MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition. 1032-1047 - Jungi Lee, Wonbeom Lee, Jaewoong Sim:
Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization. 1048-1062 - Muhammad Adnan, Yassaman Ebrahimzadeh Maboud, Divya Mahajan, Prashant J. Nair:
Heterogeneous Acceleration Pipeline for Recommendation System Training. 1063-1079 - Hengrui Zhang, August Ning, Rohan Baskar Prabhakar, David Wentzlaff:
LLMCompass: Enabling Efficient Hardware Design for Large Language Model Inference. 1080-1096 - Hwayong Nam, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, Jung Ho Ahn:
DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands. 1097-1111 - Aditya K. Kamath, Simon Peter:
(MC)2: Lazy MemCopy at the Memory Controller. 1112-1128 - Gagandeep Panwar, Muhammad Laghari, Esha Choukse, Xun Jian:
DyLeCT: Achieving Huge-page-like Translation Performance for Hardware-compressed Memory. 1129-1143 - Yesin Ryu, Yoojin Kim, Giyong Jung, Jung Ho Ahn, Jungrae Kim:
Native DRAM Cache: Re-architecting DRAM as a Large-Scale Cache for Data Centers. 1144-1156 - Aamer Jaleel, Gururaj Saileshwar, Stephen W. Keckler, Moinuddin K. Qureshi:
PrIDE: Achieving Secure Rowhammer Mitigation with Low-Cost In-DRAM Trackers. 1157-1172 - Quang Duong, Akanksha Jain, Calvin Lin:
A New Formulation of Neural Data Prefetching. 1173-1187 - Surim Oh, Mingsheng Xu, Tanvir Ahmed Khan, Baris Kasikci, Heiner Litz:
UDP: Utility-Driven Fetch Directed Instruction Prefetching. 1188-1201 - Sam Ainsworth, Lev Mukhanov:
Triangel: A High-Performance, Accurate, Timely On-Chip Temporal Prefetcher. 1202-1216 - Aniket Anand Deshmukh, Lingzhe Chester Cai, Yale N. Patt:
Alternate Path Fetch. 1217-1229 - Sawan Singh, Arthur Perais, Alexandra Jimborean, Alberto Ros:
Alternate Path μ-op Cache Prefetching. 1230-1245 - Yoonsung Kim, Changhun Oh, Jinwoo Hwang, Wonung Kim, Seongryong Oh, Yubin Lee, Hardik Sharma, Amir Yazdanbakhsh, Jongse Park:
DACAPO: Accelerating Continuous Learning in Autonomous Systems for Video Analytics. 1246-1261 - Yu Feng, Tianrui Ma, Yuhao Zhu, Xuan Zhang:
BlissCam: Boosting Eye Tracking Efficiency with Learned In-Sensor Sparse Sampling. 1262-1277 - Meng Han, Liang Wang, Limin Xiao, Hao Zhang, Tianhao Cai, Jiale Xu, Yibo Wu, Chenhao Zhang, Xiangrong Xu:
BitNN: A Bit-Serial Accelerator for K-Nearest Neighbor Search in Point Clouds. 1278-1292 - Yu Feng, Zihan Liu, Jingwen Leng, Minyi Guo, Yuhao Zhu:
Cicero: Addressing Algorithmic and Architectural Bottlenecks in Neural Rendering by Radiance Warping and Memory Optimizations. 1293-1308 - Sandeepa Bhuyan, Ziyu Ying, Mahmut T. Kandemir, Mahanth Gowda, Chita R. Das:
GameStreamSR: Enabling Neural-Augmented Game Streaming on Commodity Mobile Platforms. 1309-1322
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.