default search action
William J. Dally
Person information
- affiliation: Stanford University, USA
- affiliation: NVIDIA
- award (2010): Eckert-Mauchly Award
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j78]Yoshinori Nishi, John W. Poulton, Walker J. Turner, Xi Chen, Sanquan Song, Brian Zimmer, Stephen G. Tell, Nikola Nedovic, John M. Wilson, William J. Dally, C. Thomas Gray:
A 0.190-pJ/bit 25.2-Gb/s/wire Inverter-Based AC-Coupled Transceiver for Short-Reach Die-to-Die Interfaces in 5-nm CMOS. IEEE J. Solid State Circuits 59(4): 1146-1157 (2024) - [c177]Walker J. Turner, John W. Poulton, Yoshinori Nishi, Xi Chen, Brian Zimmer, Sanquan Song, John M. Wilson, William J. Dally, C. Thomas Gray:
Leveraging Micro-Bump Pitch Scaling to Accelerate Interposer Link Bandwidths for Future High-Performance Compute Applications. CICC 2024: 1-7 - 2023
- [j77]Yoshinori Nishi, John W. Poulton, Walker J. Turner, Xi Chen, Sanquan Song, Brian Zimmer, Stephen G. Tell, Nikola Nedovic, John M. Wilson, William J. Dally, C. Thomas Gray:
A 0.297-pJ/Bit 50.4-Gb/s/Wire Inverter-Based Short-Reach Simultaneous Bi-Directional Transceiver for Die-to-Die Interface in 5-nm CMOS. IEEE J. Solid State Circuits 58(4): 1062-1073 (2023) - [j76]Ben Keller, Rangharajan Venkatesan, Steve Dai, Stephen G. Tell, Brian Zimmer, Charbel Sakr, William J. Dally, C. Thomas Gray, Brucek Khailany:
A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm. IEEE J. Solid State Circuits 58(4): 1129-1141 (2023) - [j75]Tuofei Chen, Lei Gu, William J. Dally, Juan Rivas-Davila, John D. Fox:
A Novel High-Efficiency Three-Phase Multilevel PV Inverter With Reduced DC-Link Capacitance. IEEE Trans. Ind. Electron. 70(5): 4751-4761 (2023) - [c176]Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelado, Seungwon Min, Amna Masood, Jeongmin Brian Park, Jinjun Xiong, Chris J. Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William J. Dally, Wen-Mei W. Hwu:
GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture. ASPLOS (2) 2023: 325-339 - [c175]Bill Dally:
Hardware for Deep Learning. HCS 2023: 1-58 - [c174]Yoshinori Nishi, John W. Poulton, Xi Chen, Sanquan Song, Brian Zimmer, Walker J. Turner, Stephen G. Tell, Nikola Nedovic, John M. Wilson, William J. Dally, C. Thomas Gray:
A 0.190-pJ/bit 25.2-Gb/s/wire Inverter-Based AC-Coupled Transceiver for Short-Reach Die-to-Die Interfaces in 5-nm CMOS. VLSI Technology and Circuits 2023: 1-2 - [i25]Chenzhuo Zhu, Alexander C. Rucker, Yawen Wang, William J. Dally:
SatIn: Hardware for Boolean Satisfiability Inference. CoRR abs/2303.02588 (2023) - [i24]Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally:
Retrospective: EIE: Efficient Inference Engine on Sparse and Compressed Neural Network. CoRR abs/2306.09552 (2023) - [i23]Mingjie Liu, Teodor-Dumitru Ene, Robert Kirby, Chris Cheng, Nathaniel Ross Pinckney, Rongjian Liang, Jonah Alben, Himyanshu Anand, Sanmitra Banerjee, Ismet Bayraktaroglu, Bonita Bhaskaran, Bryan Catanzaro, Arjun Chaudhuri, Sharon Clay, Bill Dally, Laura Dang, Parikshit Deshpande, Siddhanth Dhodhi, Sameer Halepete, Eric Hill, Jiashang Hu, Sumit Jain, Brucek Khailany, Kishor Kunal, Xiaowei Li, Hao Liu, Stuart F. Oberman, Sujeet Omar, Sreedhar Pratty, Jonathan Raiman, Ambar Sarkar, Zhengjiang Shao, Hanfei Sun, Pratik P. Suthar, Varun Tej, Kaizhe Xu, Haoxing Ren:
ChipNeMo: Domain-Adapted LLMs for Chip Design. CoRR abs/2311.00176 (2023) - 2022
- [j74]William J. Dally:
On the model of computation: point. Commun. ACM 65(9): 30-32 (2022) - [j73]Jiawei Zhao, Steve Dai, Rangharajan Venkatesan, Brian Zimmer, Mustafa Fayez Ali, Ming-Yu Liu, Brucek Khailany, William J. Dally, Anima Anandkumar:
LNS-Madam: Low-Precision Training in Logarithmic Number System Using Multiplicative Weight Update. IEEE Trans. Computers 71(12): 3179-3190 (2022) - [c173]Charbel Sakr, Steve Dai, Rangharajan Venkatesan, Brian Zimmer, William J. Dally, Brucek Khailany:
Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training. ICML 2022: 19123-19138 - [c172]Peter M. Kogge, William J. Dally:
Frontier vs the Exascale Report: Why so long? and Are We Really There Yet? PMBS@SC 2022: 26-35 - [c171]Ben Keller, Rangharajan Venkatesan, Steve Dai, Stephen G. Tell, Brian Zimmer, William J. Dally, C. Thomas Gray, Brucek Khailany:
A 17-95.6 TOPS/W Deep Learning Inference Accelerator with Per-Vector Scaled 4-bit Quantization for Transformers in 5nm. VLSI Technology and Circuits 2022: 16-17 - [c170]Yoshinori Nishi, John W. Poulton, Xi Chen, Sanquan Song, Brian Zimmer, Walker J. Turner, Stephen G. Tell, Nikola Nedovic, John M. Wilson, William J. Dally, C. Thomas Gray:
A 0.297-pJ/bit 50.4-Gb/s/wire Inverter-Based Short-Reach Simultaneous Bidirectional Transceiver for Die-to-Die Interface in 5nm CMOS. VLSI Technology and Circuits 2022: 154-155 - [d1]Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelago, Seungwon Min, Amna Masood, Jeongmin Brian Park, Jinjun Xiong, Chris J. Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William J. Dally, Wen-mei W. Hwu:
GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture. Zenodo, 2022 - [i22]Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelado, Seungwon Min, Amna Masood, Jeongmin Brian Park, Jinjun Xiong, Chris J. Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William J. Dally, Wen-Mei W. Hwu:
BaM: A Case for Enabling Fine-grain High Throughput GPU-Orchestrated Access to Storage. CoRR abs/2203.04910 (2022) - [i21]Charbel Sakr, Steve Dai, Rangharajan Venkatesan, Brian Zimmer, William J. Dally, Brucek Khailany:
Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training. CoRR abs/2206.06501 (2022) - 2021
- [j72]Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Ross Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel S. Emer, C. Thomas Gray, Brucek Khailany, Stephen W. Keckler:
Simba: scaling deep-learning inference with chiplet-based architecture. Commun. ACM 64(6): 107-116 (2021) - [j71]William J. Dally, Stephen W. Keckler, David Blair Kirk:
Evolution of the Graphics Processing Unit (GPU). IEEE Micro 41(6): 42-51 (2021) - [j70]William J. Dally:
OP-VENT: A Low-Cost, Easily Assembled, Open-Source Medical Ventilator. GetMobile Mob. Comput. Commun. 25(4): 12-18 (2021) - [c169]Steve Dai, Rangharajan Venkatesan, Mark Ren, Brian Zimmer, William J. Dally, Brucek Khailany:
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference. MLSys 2021 - [c168]Guy E. Blelloch, William J. Dally, Margaret Martonosi, Uzi Vishkin, Katherine A. Yelick:
SPAA'21 Panel Paper: Architecture-Friendly Algorithms versus Algorithm-Friendly Architectures. SPAA 2021: 1-7 - [i20]Steve Dai, Rangharajan Venkatesan, Haoxing Ren, Brian Zimmer, William J. Dally, Brucek Khailany:
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference. CoRR abs/2102.04503 (2021) - [i19]Huizi Mao, Sibo Zhu, Song Han, William J. Dally:
PatchNet - Short-range Template Matching for Efficient Video Processing. CoRR abs/2103.07371 (2021) - [i18]Jiawei Zhao, Steve Dai, Rangharajan Venkatesan, Ming-Yu Liu, Brucek Khailany, Bill Dally, Anima Anandkumar:
Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update. CoRR abs/2106.13914 (2021) - 2020
- [j69]William J. Dally, Yatish Turakhia, Song Han:
Domain-specific hardware accelerators. Commun. ACM 63(7): 48-57 (2020) - [j68]Brian Zimmer, Rangharajan Venkatesan, Yakun Sophia Shao, Jason Clemons, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Ross Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel S. Emer, C. Thomas Gray, Stephen W. Keckler, Brucek Khailany:
A 0.32-128 TOPS, Scalable Multi-Chip-Module-Based Deep Neural Network Inference Accelerator With Ground-Referenced Signaling in 16 nm. IEEE J. Solid State Circuits 55(4): 920-932 (2020) - [j67]Brucek Khailany, Haoxing Ren, Steve Dai, Saad Godil, Ben Keller, Robert Kirby, Alicia Klinefelter, Rangharajan Venkatesan, Yanqing Zhang, Bryan Catanzaro, William J. Dally:
Accelerating Chip Design With Machine Learning. IEEE Micro 40(6): 23-32 (2020) - [j66]Milad Mohammadi, Song Han, Ehsan Atoofian, Amirali Baniasadi, Tor M. Aamodt, William J. Dally:
Energy Efficient On-Demand Dynamic Branch Prediction Models. IEEE Trans. Computers 69(3): 453-465 (2020) - [c167]Jongho Kim, Youngsuk Park, John D. Fox, Stephen P. Boyd, William J. Dally:
Optimal Operation of a Plug-in Hybrid Vehicle with Battery Thermal and Degradation Model. ACC 2020: 3083-3090 - [c166]Zhekai Zhang, Hanrui Wang, Song Han, William J. Dally:
SpArch: Efficient Architecture for Sparse Matrix Multiplication. HPCA 2020: 261-274 - [i17]Zhekai Zhang, Hanrui Wang, Song Han, William J. Dally:
SpArch: Efficient Architecture for Sparse Matrix Multiplication. CoRR abs/2002.08947 (2020)
2010 – 2019
- 2019
- [j65]John W. Poulton, John M. Wilson, Walker J. Turner, Brian Zimmer, Xi Chen, Sudhir S. Kudva, Sanquan Song, Stephen G. Tell, Nikola Nedovic, Wenxu Zhao, Sunil R. Sudhakaran, C. Thomas Gray, William J. Dally:
A 1.17-pJ/b, 25-Gb/s/pin Ground-Referenced Single-Ended Serial Link for Off- and On-Package Communication Using a Process- and Temperature-Adaptive Voltage Regulator. IEEE J. Solid State Circuits 54(1): 43-54 (2019) - [j64]Yatish Turakhia, Gill Bejerano, William J. Dally:
Darwin: A Genomics Coprocessor. IEEE Micro 39(3): 29-37 (2019) - [c165]Matthew Fojtik, Ben Keller, Alicia Klinefelter, Nathaniel Ross Pinckney, Stephen G. Tell, Brian Zimmer, Tezaswi Raja, Kevin Zhou, William J. Dally, Brucek Khailany:
A Fine-Grained GALS SoC with Pausible Adaptive Clocking in 16 nm FinFET. ASYNC 2019: 27-35 - [c164]Sanquan Song, John Poulton, Xi Chen, Brian Zimmer, Stephen G. Tell, Walker J. Turner, Sudhir S. Kudva, Nikola Nedovic, John M. Wilson, C. Thomas Gray, William J. Dally:
A 2-to-20 GHz Multi-Phase Clock Generator with Phase Interpolators Using Injection-Locked Oscillation Buffers for High-Speed IOs in 16nm FinFET. CICC 2019: 1-4 - [c163]Angad S. Rekhi, Brian Zimmer, Nikola Nedovic, Ningxi Liu, Rangharajan Venkatesan, Miaorong Wang, Brucek Khailany, William J. Dally, C. Thomas Gray:
Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference. DAC 2019: 81 - [c162]Rangharajan Venkatesan, Yakun Sophia Shao, Brian Zimmer, Jason Clemons, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Ross Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel S. Emer, C. Thomas Gray, Stephen W. Keckler, Brucek Khailany:
A 0.11 PJ/OP, 0.32-128 Tops, Scalable Multi-Chip-Module-Based Deep Neural Network Accelerator Designed with A High-Productivity vlsi Methodology. Hot Chips Symposium 2019: 1-24 - [c161]Yatish Turakhia, Sneha D. Goenka, Gill Bejerano, William J. Dally:
Darwin-WGA: A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High Speedup. HPCA 2019: 359-372 - [c160]Rangharajan Venkatesan, Yakun Sophia Shao, Miaorong Wang, Jason Clemons, Steve Dai, Matthew Fojtik, Ben Keller, Alicia Klinefelter, Nathaniel Ross Pinckney, Priyanka Raina, Yanqing Zhang, Brian Zimmer, William J. Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany:
MAGNet: A Modular Accelerator Generator for Neural Networks. ICCAD 2019: 1-8 - [c159]Huizi Mao, Xiaodong Yang, Bill Dally:
A Delay Metric for Video Object Detection: What Average Precision Fails to Tell. ICCV 2019: 573-582 - [c158]Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Ross Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel S. Emer, C. Thomas Gray, Brucek Khailany, Stephen W. Keckler:
Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture. MICRO 2019: 14-27 - [c157]Huizi Mao, Taeyoung Kong, Bill Dally:
CaTDet: Cascaded Tracked Detector for Efficient Object Detection from Video. SysML 2019 - [c156]Yatish Turakhia, Gill Bejerano, William J. Dally:
Darwin: A Genomics Co-processor Provides up to 15, 000X Acceleration on Long Read Assembly. USENIX ATC 2019 - [c155]Brian Zimmer, Rangharajan Venkatesan, Yakun Sophia Shao, Jason Clemons, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Ross Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel S. Emer, C. Thomas Gray, Stephen W. Keckler, Brucek Khailany:
A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm. VLSI Circuits 2019: 300- - [i16]Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Eric S. Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros G. Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim M. Hazelwood, Furong Huang, Martin Jaggi, Kevin G. Jamieson, Michael I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konecný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Gordon Murray, Dimitris S. Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Randall Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric P. Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar:
SysML: The New Frontier of Machine Learning Systems. CoRR abs/1904.03257 (2019) - [i15]Huizi Mao, Xiaodong Yang, William J. Dally:
A Delay Metric for Video Object Detection: What Average Precision Fails to Tell. CoRR abs/1908.06368 (2019) - 2018
- [j63]Jason A. Platt, Nicholas Moehle, John D. Fox, William J. Dally:
Optimal Operation of a Plug-In Hybrid Vehicle. IEEE Trans. Veh. Technol. 67(11): 10366-10377 (2018) - [c154]Yatish Turakhia, Gill Bejerano, William J. Dally:
Darwin: A Genomics Co-processor Provides up to 15, 000X Acceleration on Long Read Assembly. ASPLOS 2018: 199-213 - [c153]Walker J. Turner, John W. Poulton, John M. Wilson, Xi Chen, Stephen G. Tell, Matthew Fojtik, Thomas H. Greer, Brian Zimmer, Sanquan Song, Nikola Nedovic, Sudhir S. Kudva, Sunil R. Sudhakaran, Rizwan Bashirullah, Wenxu Zhao, William J. Dally, C. Thomas Gray:
Ground-referenced signaling for intra-chip and short-reach chip-to-chip interconnects. CICC 2018: 1-8 - [c152]Song Han, William J. Dally:
Bandwidth-efficient deep learning. DAC 2018: 147:1-147:6 - [c151]Yujun Lin, Song Han, Huizi Mao, Yu Wang, Bill Dally:
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training. ICLR (Poster) 2018 - [c150]Xingyu Liu, Jeff Pool, Song Han, William J. Dally:
Efficient Sparse-Winograd Convolutional Neural Networks. ICLR (Poster) 2018 - [c149]John M. Wilson, Walker J. Turner, John W. Poulton, Brian Zimmer, Xi Chen, Sudhir S. Kudva, Sanquan Song, Stephen G. Tell, Nikola Nedovic, Wenxu Zhao, Sunil R. Sudhakaran, C. Thomas Gray, William J. Dally:
A 1.17pJ/b 25Gb/s/pin ground-referenced single-ended serial link for off- and on-package communication in 16nm CMOS using a process- and temperature-adaptive voltage regulator. ISSCC 2018: 276-278 - [c148]William J. Dally, C. Thomas Gray, John Poulton, Brucek Khailany, John M. Wilson, Larry R. Dennison:
Hardware-Enabled Artificial Intelligence. VLSI Circuits 2018: 3-6 - [i14]Xingyu Liu, Jeff Pool, Song Han, William J. Dally:
Efficient Sparse-Winograd Convolutional Neural Networks. CoRR abs/1802.06367 (2018) - [i13]Huizi Mao, Taeyoung Kong, William J. Dally:
CaTDet: Cascaded Tracked Detector for Efficient Object Detection from Video. CoRR abs/1810.00434 (2018) - 2017
- [j62]Babak Falsafi, Bill Dally, Desh Singh, Derek Chiou, Joshua J. Yi, Resit Sendag:
FPGAs versus GPUs in Data centers. IEEE Micro 37(1): 60-72 (2017) - [j61]Milad Mohammadi, Tor M. Aamodt, William J. Dally:
CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution Near In-Order Energy with Near Out-of-Order Performance. ACM Trans. Archit. Code Optim. 14(4): 39:1-39:26 (2017) - [c147]Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, William J. Dally:
Exploring the Granularity of Sparsity in Convolutional Neural Networks. CVPR Workshops 2017: 1927-1934 - [c146]Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, William (Bill) J. Dally:
ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA. FPGA 2017: 75-84 - [c145]Niladrish Chatterjee, Mike O'Connor, Donghyuk Lee, Daniel R. Johnson, Stephen W. Keckler, Minsoo Rhu, William J. Dally:
Architecting an Energy-Efficient DRAM System for GPUs. HPCA 2017: 73-84 - [c144]Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally:
DSD: Dense-Sparse-Dense Training for Deep Neural Networks. ICLR (Poster) 2017 - [c143]Xingyu Liu, Song Han, Huizi Mao, William J. Dally:
Efficient Sparse-Winograd Convolutional Neural Networks. ICLR (Workshop) 2017 - [c142]Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally:
Trained Ternary Quantization. ICLR (Poster) 2017 - [c141]Bill Dally:
Efficient methods and hardware for deep learning. TIML@ISCA 2017: 2 - [c140]Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel S. Emer, Stephen W. Keckler, William J. Dally:
SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. ISCA 2017: 27-40 - [c139]Mike O'Connor, Niladrish Chatterjee, Donghyuk Lee, John M. Wilson, Aditya Agrawal, Stephen W. Keckler, William J. Dally:
Fine-grained DRAM: energy-efficient DRAM for extreme bandwidth systems. MICRO 2017: 41-54 - [i12]Yatish Turakhia, Subhasis Das, Tor M. Aamodt, William J. Dally:
HoLiSwap: Reducing Wire Energy in L1 Caches. CoRR abs/1701.03878 (2017) - [i11]Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, William J. Dally:
Exploring the Regularity of Sparse Structure in Convolutional Neural Networks. CoRR abs/1705.08922 (2017) - [i10]Morteza Mardani, Enhao Gong, Joseph Y. Cheng, Shreyas Vasanawala, Greg Zaharchuk, Marcus T. Alley, Neil Thakur, Song Han, William J. Dally, John M. Pauly, Lei Xing:
Deep Generative Adversarial Networks for Compressed Sensing Automates MRI. CoRR abs/1706.00051 (2017) - [i9]Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel S. Emer, Stephen W. Keckler, William J. Dally:
SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. CoRR abs/1708.04485 (2017) - [i8]Yujun Lin, Song Han, Huizi Mao, Yu Wang, William J. Dally:
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training. CoRR abs/1712.01887 (2017) - 2016
- [j60]Mahmut E. Sinangil, John W. Poulton, Matthew R. Fojtik, Thomas H. Greer, Stephen G. Tell, Andreas J. Gotterba, Jesse Wang, Jason Golbus, Brian Zimmer, William J. Dally, C. Thomas Gray:
A 28 nm 2 Mbit 6 T SRAM With Highly Configurable Low-Voltage Write-Ability Assist Implementation and Capacitor-Based Sense-Amplifier Input Offset Compensation. IEEE J. Solid State Circuits 51(2): 557-567 (2016) - [j59]Subhasis Das, Tor M. Aamodt, William J. Dally:
Reuse Distance-Based Probabilistic Cache Replacement. ACM Trans. Archit. Code Optim. 12(4): 33:1-33:22 (2016) - [c138]Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark Horowitz, Bill Dally:
Deep compression and EIE: Efficient inference engine on compressed deep neural network. Hot Chips Symposium 2016: 1-6 - [c137]Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally:
EIE: Efficient Inference Engine on Compressed Deep Neural Network. ISCA 2016: 243-254 - [c136]John M. Wilson, Matthew R. Fojtik, John W. Poulton, Xi Chen, Stephen G. Tell, Thomas H. Greer, C. Thomas Gray, William J. Dally:
8.6 A 6.5-to-23.3fJ/b/mm balanced charge-recycling bus in 16nm FinFET CMOS at 1.7-to-2.6Gb/s/wire with clock forwarding and low-crosstalk contraflow wiring. ISSCC 2016: 156-157 - [c135]Song Han, Huizi Mao, William J. Dally:
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. ICLR 2016 - [i7]Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally:
EIE: Efficient Inference Engine on Compressed Deep Neural Network. CoRR abs/1602.01528 (2016) - [i6]Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, Kurt Keutzer:
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360 (2016) - [i5]Milad Mohammadi, Tor M. Aamodt, William J. Dally:
CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution. CoRR abs/1606.01607 (2016) - [i4]Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Shijian Tang, Erich Elsen, Bryan Catanzaro, John Tran, William J. Dally:
DSD: Regularizing Deep Neural Networks with Dense-Sparse-Dense Training Flow. CoRR abs/1607.04381 (2016) - [i3]Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, William J. Dally:
ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA. CoRR abs/1612.00694 (2016) - [i2]Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally:
Trained Ternary Quantization. CoRR abs/1612.01064 (2016) - 2015
- [j58]Milad Mohammadi, Song Han, Tor M. Aamodt, William J. Dally:
On-Demand Dynamic Branch Prediction. IEEE Comput. Archit. Lett. 14(1): 50-53 (2015) - [j57]R. Curtis Harting, William J. Dally:
On-Chip Active Messages for Speed, Scalability, and Efficiency. IEEE Trans. Parallel Distributed Syst. 26(2): 507-515 (2015) - [c134]Subhasis Das, Tor M. Aamodt, William J. Dally:
SLIP: reducing wire energy in the memory hierarchy. ISCA 2015: 349-361 - [c133]Song Han, Jeff Pool, John Tran, William J. Dally:
Learning both Weights and Connections for Efficient Neural Network. NIPS 2015: 1135-1143 - [c132]Nan Jiang, Larry R. Dennison, William J. Dally:
Network endpoint congestion control for fine-grained communication. SC 2015: 35:1-35:12 - [i1]Song Han, Jeff Pool, John Tran, William J. Dally:
Learning both Weights and Connections for Efficient Neural Networks. CoRR abs/1506.02626 (2015) - 2014
- [c131]William J. Dally, James D. Balfour:
Author retrospective for design tradeoffs for tiled CMP on-chip networks. ICS 25th Anniversary 2014: 77-79 - [c130]Oreste Villa, Daniel R. Johnson, Mike O'Connor, Evgeny Bolotin, David W. Nellans, Justin Luitjens, Nikolai Sakharnykh, Peng Wang, Paulius Micikevicius, Anthony Scudiero, Stephen W. Keckler, William J. Dally:
Scaling the Power Wall: A Path to Exascale. SC 2014: 830-841 - 2013
- [j56]John W. Poulton, William J. Dally, Xi Chen, John G. Eyles, Thomas H. Greer, Stephen G. Tell, John M. Wilson, C. Thomas Gray:
A 0.54 pJ/b 20 Gb/s Ground-Referenced Single-Ended Short-Reach Serial Link in 28 nm CMOS for Advanced Packaging Applications. IEEE J. Solid State Circuits 48(12): 3206-3218 (2013) - [j55]George Michelogiannakis, William J. Dally:
Elastic Buffer Flow Control for On-Chip Networks. IEEE Trans. Computers 62(2): 295-309 (2013) - [c129]William J. Dally, Chris Malachowsky, Stephen W. Keckler:
21st century digital design tools. DAC 2013: 94:1-94:6 - [c128]Nan Jiang, Daniel U. Becker, George Michelogiannakis, James D. Balfour, Brian Towles, David E. Shaw, John Kim, William J. Dally:
A detailed and flexible cycle-accurate Network-on-Chip simulator. ISPASS 2013: 86-96 - [c127]John W. Poulton, William J. Dally, Xi Chen, John G. Eyles, Thomas H. Greer, Stephen G. Tell, C. Thomas Gray:
A 0.54pJ/b 20Gb/s ground-referenced single-ended short-haul serial link in 28nm CMOS for advanced packaging applications. ISSCC 2013: 404-405 - [c126]George Michelogiannakis, Nan Jiang, Daniel Becker, William J. Dally:
Channel reservation protocol for over-subscribed channels and destinations. SC 2013: 52:1-52:12 - 2012
- [j54]Mark Gebhart, Daniel R. Johnson, David Tarjan, Stephen W. Keckler, William J. Dally, Erik Lindholm, Kevin Skadron:
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors. ACM Trans. Comput. Syst. 30(2): 8:1-8:38 (2012) - [c125]Nan Jiang, Daniel U. Becker, George Michelogiannakis, William J. Dally:
Network congestion avoidance through Speculative Reservation. HPCA 2012: 443-454 - [c124]Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally:
Adaptive Backpressure: Efficient buffer management for on-chip networks. ICCD 2012: 419-426 - [c123]Mark Gebhart, Stephen W. Keckler, Brucek Khailany, Ronny Krashinsky, William J. Dally:
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor. MICRO 2012: 96-106 - 2011
- [j53]George Michelogiannakis, Nan Jiang, Daniel Becker, William J. Dally:
Packet Chaining: Efficient Single-Cycle Allocation for On-Chip Networks. IEEE Comput. Archit. Lett. 10(2): 33-36 (2011) - [j52]Stephen W. Keckler, William J. Dally, Brucek Khailany, Michael Garland, David Glasco:
GPUs and the Future of Parallel Computing. IEEE Micro 31(5): 7-17 (2011) - [j51]George Michelogiannakis, Daniel Becker, William J. Dally:
Evaluating Elastic Buffer and Wormhole Flow Control. IEEE Trans. Computers 60(6): 896-903 (2011) - [c122]R. Curtis Harting, Vishal Parikh, William J. Dally:
The utility of fast active messages on many-core chips: Efficient supercomputing project. Hot Chips Symposium 2011: 1 - [c121]Yves Robert, William J. Dally, Jack J. Dongarra, Satoshi Matsuoka, Robert Schreiber, Horst D. Simon, Uzi Vishkin:
Panel Statement. IPDPS 2011: 505 - [c120]Bill Dally:
Power, Programmability, and Granularity: The Challenges of ExaScale Computing. IPDPS 2011: 878 - [c119]Mark Gebhart, Daniel R. Johnson, David Tarjan, Stephen W. Keckler, William J. Dally, Erik Lindholm, Kevin Skadron:
Energy-efficient mechanisms for managing thread context in throughput processors. ISCA 2011: 235-246 - [c118]Bill Dally:
Power, programmability, and granularity: The challenges of ExaScale computing. ITC 2011: 12 - [c117]George Michelogiannakis, Nan Jiang, Daniel Becker, William J. Dally:
Packet chaining: efficient single-cycle allocation for on-chip networks. MICRO 2011: 83-94 - [c116]Mark Gebhart, Stephen W. Keckler, William J. Dally:
A compile-time managed multi-level register file hierarchy. MICRO 2011: 465-476 - 2010
- [j50]John Nickolls, William J. Dally:
The GPU Computing Era. IEEE Micro 30(2): 56-69 (2010) - [c115]William J. Dally, Stephen G. Tell:
The Even/Odd Synchronizer: A Fast, All-Digital, Periodic Synchronizer. ASYNC 2010: 75-84 - [c114]JongSoo Park, James D. Balfour, William J. Dally:
Fine-grain dynamic instruction placement for L0 scratch-pad memory. CASES 2010: 137-146 - [c113]David Black-Schaffer, William J. Dally:
Block-Parallel Programming for Real-Time Embedded Applications. ICPP 2010: 297-306 - [c112]William J. Dally:
Throughput computing. ICS 2010: 2 - [c111]William J. Dally:
Moving the needle, computer architecture research in academe and industry. ISCA 2010: 1 - [c110]George Michelogiannakis, Daniel Sánchez, William J. Dally, Christos Kozyrakis:
Evaluating Bufferless Flow Control for On-chip Networks. NOCS 2010: 9-16 - [c109]JongSoo Park, William J. Dally:
Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures. SPAA 2010: 1-10
2000 – 2009
- 2009
- [j49]James D. Balfour, R. C. Halting, William J. Dally:
Operand Registers and Explicit Operand Forwarding. IEEE Comput. Archit. Lett. 8(2): 60-63 (2009) - [j48]John Kim, William J. Dally, Steve Scott, Dennis Abts:
Cost-Efficient Dragonfly Topology for Large-Scale Systems. IEEE Micro 29(1): 33-40 (2009) - [c108]George Michelogiannakis, James D. Balfour, William J. Dally:
Elastic-buffer flow control for on-chip networks. HPCA 2009: 151-162 - [c107]Nan Jiang, John Kim, William J. Dally:
Indirect adaptive routing on large scale interconnection networks. ISCA 2009: 220-231 - [c106]Daniel U. Becker, William J. Dally:
Allocator implementations for network-on-chip routers. SC 2009 - [c105]George Michelogiannakis, William J. Dally:
Router designs for elastic buffer on-chip networks. SC 2009 - [p4]Mattan Erez, William J. Dally:
Stream Processors. Multicore Processors and Systems 2009: 231-270 - 2008
- [j47]James D. Balfour, William J. Dally, David Black-Schaffer, Vishal Parikh, JongSoo Park:
An Energy-Efficient Processor Architecture for Embedded Systems. IEEE Comput. Archit. Lett. 7(1): 29-32 (2008) - [j46]David Black-Schaffer, James D. Balfour, William J. Dally, Vishal Parikh, JongSoo Park:
Hierarchical Instruction Register Organization. IEEE Comput. Archit. Lett. 7(2): 41-44 (2008) - [j45]William J. Dally, James D. Balfour, David Black-Schaffer, James Chen, R. Curtis Harting, Vishal Parikh, JongSoo Park, David Sheffield:
Efficient Embedded Computing. Computer 41(7): 27-32 (2008) - [j44]Brucek Khailany, Ted Williams, Jim Lin, Eileen Peters Long, Mark Rygh, DeForest Tovey, William J. Dally:
A Programmable 512 GOPS Stream Processor for Signal, Image, and Video Processing. IEEE J. Solid State Circuits 43(1): 202-213 (2008) - [c104]Manman Ren, Ji Young Park, Mike Houston, Alex Aiken, William J. Dally:
A tuning framework for software-managed memory hierarchies. PACT 2008: 280-291 - [c103]Abhishek Das, William J. Dally:
Stream Scheduling: A Framework to Manage Bulk Operations in Memory Hierarchies. Euro-Par 2008: 337-349 - [c102]John Kim, William J. Dally, Steve Scott, Dennis Abts:
Technology-Driven, Highly-Scalable Dragonfly Topology. ISCA 2008: 77-88 - [c101]Mike Houston, Ji Young Park, Manman Ren, Timothy J. Knight, Kayvon Fatahalian, Alex Aiken, William J. Dally, Pat Hanrahan:
A portable runtime interface for multi-level memory hierarchies. PPoPP 2008: 143-152 - 2007
- [j43]John Kim, James D. Balfour, William J. Dally:
Flattened Butterfly Topology for On-Chip Networks. IEEE Comput. Archit. Lett. 6(2): 37-40 (2007) - [j42]John Poulton, Robert Palmer, Andrew M. Fuller, Trey Greer, John G. Eyles, William J. Dally, Mark Horowitz:
A 14-mW 6.25-Gb/s Transceiver in 90-nm CMOS. IEEE J. Solid State Circuits 42(12): 2745-2757 (2007) - [j41]John D. Owens, William J. Dally, Ron Ho, Doddaballapur Narasimha-Murthy Jayasimha, Stephen W. Keckler, Li-Shiuan Peh:
Research Challenges for On-Chip Interconnection Networks. IEEE Micro 27(5): 96-108 (2007) - [c100]Jayanth Gummaraju, Mattan Erez, Joel Coburn, Mendel Rosenblum, William J. Dally:
Architectural Support for the Stream Execution Model on General-Purpose Processors. PACT 2007: 3-12 - [c99]Abhishek Das, William J. Dally:
Stream Scheduling: A Framework to Manage Bulk Operations in a Memory Hierarchy. PACT 2007: 405 - [c98]JongSoo Park, Sung-Boem Park, James D. Balfour, David Black-Schaffer, Christos Kozyrakis, William J. Dally:
Register pointer architecture for efficient embedded processors. DATE 2007: 600-605 - [c97]William J. Dally:
Interconnect-Centric Computing. HPCA 2007: 1 - [c96]Mattan Erez, Jung Ho Ahn, Jayanth Gummaraju, Mendel Rosenblum, William J. Dally:
Executing irregular scientific applications on stream architectures. ICS 2007: 93-104 - [c95]Jung Ho Ahn, Mattan Erez, William J. Dally:
Tradeoff between data-, instruction-, and thread-level parallelism in stream processors. ICS 2007: 126-137 - [c94]John Kim, William J. Dally, Dennis Abts:
Flattened butterfly: a cost-efficient topology for high-radix networks. ISCA 2007: 126-137 - [c93]Shekhar Borkar, William J. Dally:
Future of on-chip interconnection architectures. ISLPED 2007: 122 - [c92]Brucek Khailany, Ted Williams, Jim Lin, Eileen Long, Mark Rygh, DeForest Tovey, William J. Dally:
A Programmable 512 GOPS Stream Processor for Signal, Image, and Video Processing. ISSCC 2007: 272-602 - [c91]Robert Palmer, John Poulton, William J. Dally, John G. Eyles, Andrew M. Fuller, Trey Greer, Mark Horowitz, Mark Kellam, F. Quan, F. Zarkeshvari:
A 14mW 6.25Gb/s Transceiver in 90nm CMOS for Serial Chip-to-Chip Communications. ISSCC 2007: 440-614 - [c90]John Kim, James D. Balfour, William J. Dally:
Flattened Butterfly Topology for On-Chip Networks. MICRO 2007: 172-182 - [c89]William J. Dally:
Enabling Technology for On-Chip Interconnection Networks. NOCS 2007: 3 - [c88]Timothy J. Knight, Ji Young Park, Manman Ren, Mike Houston, Mattan Erez, Kayvon Fatahalian, Alex Aiken, William J. Dally, Pat Hanrahan:
Compilation for explicitly managed memory hierarchies. PPoPP 2007: 226-236 - 2006
- [j40]Amit K. Gupta, William J. Dally:
Topology optimization of interconnection networks. IEEE Comput. Archit. Lett. 5(1): 10-13 (2006) - [j39]Jung Ho Ahn, William J. Dally:
Data parallel address architecture. IEEE Comput. Archit. Lett. 5(1): 30-33 (2006) - [c87]Abhishek Das, William J. Dally, Peter R. Mattson:
Compiling for stream processing. PACT 2006: 33-42 - [c86]Andrew W. Howard, Gu-Yeon Wei, William J. Dally, Paul Horowitz:
Pulsenet - A Parallel Flash Sampler and Digital Processor IC for Optical SETI. CICC 2006: 261-264 - [c85]William J. Dally:
Computer Architecture in the Many-Core Era. ICCD 2006: 1 - [c84]James D. Balfour, William J. Dally:
Design tradeoffs for tiled CMP on-chip networks. ICS 2006: 187-198 - [c83]Steve Scott, Dennis Abts, John Kim, William J. Dally:
The BlackWidow High-Radix Clos Network. ISCA 2006: 16-28 - [c82]Thomas L. Sterling, Peter M. Kogge, William J. Dally, Steve Scott, William Gropp, David E. Keyes, Peter H. Beckman:
Multi-core issues - Multi-Core for HPC: breakthrough or breakdown? SC 2006: 73 - [c81]Jung Ho Ahn, Mattan Erez, William J. Dally:
Architecture - The design space of data-parallel memory systems. SC 2006: 80 - [c80]Kayvon Fatahalian, Daniel Reiter Horn, Timothy J. Knight, Larkhoon Leem, Mike Houston, Ji Young Park, Mattan Erez, Manman Ren, Alex Aiken, William J. Dally, Pat Hanrahan:
Sequoia: programming the memory hierarchy. SC 2006: 83 - [c79]John Kim, William J. Dally, Dennis Abts:
Interconnect routing and scheduling - Adaptive routing in high-radix clos network. SC 2006: 92 - 2005
- [j38]Patrick Chiang, William J. Dally, Ming-Ju Edward Lee, Ramesh Senthinathan, Yangjin Oh, Mark A. Horowitz:
A 20-Gb/s 0.13-μm CMOS serial link transmitter using an LC-PLL to directly drive the output multiplexer. IEEE J. Solid State Circuits 40(4): 1004-1011 (2005) - [j37]William J. Dally, Keith Diefendorff:
Hot Chips 16: Power, Parallelism, and Memory Performance. IEEE Micro 25(2): 8-9 (2005) - [c78]Andrew Chang, William J. Dally:
Explaining the gap between ASIC and custom power: a custom perspective. DAC 2005: 281-284 - [c77]Jung Ho Ahn, Mattan Erez, William J. Dally:
Scatter-Add in Data Parallel Architectures. HPCA 2005: 132-142 - [c76]John Kim, William J. Dally, Brian Towles, Amit K. Gupta:
Microarchitecture of a High-Radix Router. ISCA 2005: 420-431 - [c75]Mattan Erez, Nuwan Jayasena, Timothy J. Knight, William J. Dally:
Fault Tolerance Techniques for the Merrimac Streaming Supercomputer. SC 2005: 29 - 2004
- [j36]Arjun Singh, William J. Dally:
Buffer and Delay Bounds in High Radix Interconnection Networks. IEEE Comput. Archit. Lett. 3 (2004) - [j35]Arjun Singh, William J. Dally, Brian Towles, Amit K. Gupta:
Globally Adaptive Load-Balanced Routing on Tori. IEEE Comput. Archit. Lett. 3 (2004) - [j34]Ramin Farjad-Rad, Anhtuyet Nguyen, James Tran, Trey Greer, John Poulton, William J. Dally, John H. Edmondson, Ramesh Senthinathan, Rohit Rathi, Ming-Ju Edward Lee, Hiok-Tiaq Ng:
A 33-mW 8-Gb/s CMOS clock multiplier and CDR for highly integrated I/Os. IEEE J. Solid State Circuits 39(9): 1553-1561 (2004) - [j33]William J. Dally, Ujval J. Kapasi, Brucek Khailany, Jung Ho Ahn, Abhishek Das:
Stream Processors: Progammability and Efficiency. ACM Queue 2(1): 52-62 (2004) - [c74]Nuwan Jayasena, Mattan Erez, Jung Ho Ahn, William J. Dally:
Stream Register Files with Indexed Access. HPCA 2004: 60-72 - [c73]Jung Ho Ahn, William J. Dally, Brucek Khailany, Ujval J. Kapasi, Abhishek Das:
Evaluating the Imagine Stream Architecture. ISCA 2004: 14-25 - [c72]Mattan Erez, Jung Ho Ahn, Ankit Garg, William J. Dally, Eric Darve:
Analysis and Performance Results of a Molecular Modeling Application on Merrimac. SC 2004: 42 - [c71]Arjun Singh, William J. Dally, Amit K. Gupta, Brian Towles:
Adaptive channel queue routing on k-ary n-cubes. SPAA 2004: 11-19 - [c70]William J. Dally:
The case for broader computer architecture education: keynote address. WCAE 2004: 10 - 2003
- [j32]Ujval J. Kapasi, Scott Rixner, William J. Dally, Brucek Khailany, Jung Ho Ahn, Peter R. Mattson, John D. Owens:
Programmable Stream Processors. Computer 36(8): 54-62 (2003) - [j31]Ming-Ju Edward Lee, William J. Dally, Trey Greer, Hiok-Tiaq Ng, Ramin Farjad-Rad, John Poulton, Ramesh Senthinathan:
Jitter transfer characteristics of delay-locked loops - theories and design techniques. IEEE J. Solid State Circuits 38(4): 614-621 (2003) - [j30]Hiok-Tiaq Ng, Ramin Farjad-Rad, Ming-Ju Edward Lee, William J. Dally, Trey Greer, John Poulton, John H. Edmondson, Rohit Rathi, Ramesh Senthinathan:
A second-order semidigital clock recovery circuit based on injection locking. IEEE J. Solid State Circuits 38(12): 2101-2110 (2003) - [j29]Brian Towles, William J. Dally:
Guaranteed scheduling for switches with configuration overhead. IEEE/ACM Trans. Netw. 11(5): 835-847 (2003) - [c69]Hiok-Tiaq Ng, Ming-Ju Edward Lee, Ramin Farjad-Rad, Ramesh Senthinathan, William J. Dally, Anhtuyet Nguyen, Rohit Rathi, Trey Greer, John Poulton, John H. Edmondson, James Tran:
A 33mW 8Gb/s CMOS clock multiplier and CDR for highly integrated I/Os. CICC 2003: 77-80 - [c68]Brucek Khailany, William J. Dally, Scott Rixner, Ujval J. Kapasi, John D. Owens, Brian Towles:
Exploring the VLSI Scalability of Stream Processors. HPCA 2003: 153-164 - [c67]Ming-Ju Edward Lee, William J. Dally, Ramin Farjad-Rad, Hiok-Tiaq Ng, Ramesh Senthinathan, John H. Edmondson, John W. Poulton:
CMOS High-Speed I/Os - Present and Future. ICCD 2003: 454-461 - [c66]Arjun Singh, William J. Dally, Amit K. Gupta, Brian Towles:
GOAL: A Load-Balanced Adaptive Routing Algorithm for Torus Networks. ISCA 2003: 194-205 - [c65]William J. Dally, Francois Labonte, Abhishek Das, Pat Hanrahan, Jung Ho Ahn, Jayanth Gummaraju, Mattan Erez, Nuwan Jayasena, Ian Buck, Timothy J. Knight, Ujval J. Kapasi:
Merrimac: Supercomputing with Streams. SC 2003: 35 - [c64]Brian Towles, William J. Dally, Stephen P. Boyd:
Throughput-centric routing algorithm design. SPAA 2003: 200-209 - 2002
- [j28]Kelly A. Shaw, William J. Dally:
Migration in Single Chip Multiprocessors. IEEE Comput. Archit. Lett. 1 (2002) - [j27]Brian Towles, William J. Dally:
Worst-case Traffic for Oblivious Routing Functions. IEEE Comput. Archit. Lett. 1 (2002) - [j26]Ramin Farjad-Rad, William J. Dally, Hiok-Tiaq Ng, Ramesh Senthinathan, Ming-Ju Edward Lee, Rohit Rathi, John Poulton:
A low-power multiplying DLL for low-jitter multigigahertz clock generation in highly integrated digital chips. IEEE J. Solid State Circuits 37(12): 1804-1812 (2002) - [c63]John D. Owens, Brucek Khailany, Brian Towles, William J. Dally:
Comparing Reyes and OpenGL on a Stream Architecture. Graphics Hardware 2002: 47-56 - [c62]Amit K. Gupta, William J. Dally, Arjun Singh, Brian Towles:
Scalable Opto-Electronic Network (SOENet). Hot Interconnects 2002: 71-76 - [c61]Ujval J. Kapasi, William J. Dally, Scott Rixner, John D. Owens, Brucek Khailany:
The Imagine Stream Processor. ICCD 2002: 282-288 - [c60]Brucek Khailany, William J. Dally, Andrew Chang, Ujval J. Kapasi, Jinyung Namkoong, Brian Towles:
VLSI Design and Verification of the Imagine Processor. ICCD 2002: 289-294 - [c59]John D. Owens, Scott Rixner, Ujval J. Kapasi, Peter R. Mattson, Brian Towles, Ben Serebrin, William J. Dally:
Media Processing Applications on the Imagine Stream Processor. ICCD 2002: 295-302 - [c58]Ben Serebrin, John D. Owens, Chen H. Chen, Stephen P. Crago, Ujval J. Kapasi, Peter R. Mattson, Jinyung Namkoong, Scott Rixner, William J. Dally:
A Stream Processor Development Platform. ICCD 2002: 303- - [c57]Brian Towles, William J. Dally:
Guaranteed Scheduling for Switches with Configuration Overhead. INFOCOM 2002: 342-351 - [c56]Brian Towles, William J. Dally:
Worst-case traffic for oblivious routing functions. SPAA 2002: 1-8 - [c55]Arjun Singh, William J. Dally, Brian Towles, Amit K. Gupta:
Locality-preserving randomized oblivious routing on torus networks. SPAA 2002: 9-13 - 2001
- [b2]William J. Dally, John W. Poulton:
Digital systems engineering. Cambridge University Press 2001, ISBN 978-0-521-59292-5, pp. I-XXIV, 1-663 - [j25]Li-Shiuan Peh, William J. Dally:
A Delay Model for Router Microarchitectures. IEEE Micro 21(1): 26-34 (2001) - [j24]William J. Dally, Marc Tremblay, Allen J. Baum:
Guest Editors' Introduction: Hot Chips 12. IEEE Micro 21(2): 13-15 (2001) - [j23]Brucek Khailany, William J. Dally, Ujval J. Kapasi, Peter R. Mattson, Jinyung Namkoong, John D. Owens, Brian Towles, Andrew Chang, Scott Rixner:
Imagine: Media Processing with Streams. IEEE Micro 21(2): 35-46 (2001) - [c54]William J. Dally, Brian Towles:
Route Packets, Not Wires: On-Chip Interconnection Networks. DAC 2001: 684-689 - [c53]Li-Shiuan Peh, William J. Dally:
A Delay Model and Speculative Architecture for Pipelined Routers. HPCA 2001: 255-266 - [c52]Patrick Chiang, William J. Dally, Ming-Ju Edward Lee:
Monolithic chaotic communications system. ISCAS (3) 2001: 325-328 - 2000
- [j22]Ming-Ju Edward Lee, William J. Dally, Patrick Chiang:
Low-power area-efficient high-speed I/O circuit techniques. IEEE J. Solid State Circuits 35(11): 1591-1599 (2000) - [c51]Peter R. Mattson, William J. Dally, Scott Rixner, Ujval J. Kapasi, John D. Owens:
Communication Scheduling. ASPLOS 2000: 82-92 - [c50]William J. Dally, Andrew Chang:
The role of custom design in ASIC Chips. DAC 2000: 643-647 - [c49]John D. Owens, William J. Dally, Ujval J. Kapasi, Scott Rixner, Peter R. Mattson, Ben Mowery:
Polygon Rendering on a Stream Architecture. Workshop on Graphics Hardware 2000: 23-32 - [c48]Li-Shiuan Peh, William J. Dally:
Flit-Reservation Flow Control. HPCA 2000: 73-84 - [c47]Scott Rixner, William J. Dally, Brucek Khailany, Peter R. Mattson, Ujval J. Kapasi, John D. Owens:
Register Organization for Media Processing. HPCA 2000: 375-386 - [c46]Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter R. Mattson, John D. Owens:
Memory access scheduling. ISCA 2000: 128-138 - [c45]Ken Mai, Tim Paaske, Nuwan Jayasena, Ron Ho, William J. Dally, Mark Horowitz:
Smart Memories: a modular reconfigurable architecture. ISCA 2000: 161-171 - [c44]Nicholas P. Carter, William J. Dally, Whay Sing Lee, Stephen W. Keckler, Andrew Chang:
Processor Mechanisms for Software Shared Memory. ISHPC 2000: 120-133 - [c43]Ujval J. Kapasi, William J. Dally, Scott Rixner, Peter R. Mattson, John D. Owens, Brucek Khailany:
Efficient conditional operations for data-parallel architectures. MICRO 2000: 159-170
1990 – 1999
- 1999
- [j21]Stephen W. Keckler, Andrew Chang, Whay Sing Lee, Sandeep Chatterjee, William J. Dally:
Concurrent Event Handling through Multithreading. IEEE Trans. Computers 48(9): 903-916 (1999) - [c42]William J. Dally, Steve Lacy:
VLSI Architecture: Past, Present, and Future. ARVLSI 1999: 232-241 - [e2]Allan Gottlieb, William J. Dally:
Proceedings of the 26th Annual International Symposium on Computer Architecture, ISCA 1999, Atlanta, Georgia, USA, May 2-4, 1999. IEEE Computer Society 1999, ISBN 0-7695-0170-2 [contents] - 1998
- [j20]Whay Sing Lee, William J. Dally, Stephen W. Keckler, Nicholas P. Carter, Andrew Chang:
An Efficient, Protected Message Interface. Computer 31(11): 69-75 (1998) - [j19]Randall Rettberg, William J. Dally, David E. Culler:
The bleeding edge. IEEE Micro 18(1): 10-11 (1998) - [j18]John Poulton, William J. Dally, Steve Tell:
A tracking clock recovery receiver for 4-Gbps signaling. IEEE Micro 18(1): 25-27 (1998) - [c41]Andrew Chang, William J. Dally, Stephen W. Keckler, Nicholas P. Carter, Whay Sing Lee:
The effects of explicitly parallel mechanisms on the multi-ALU processor cluster pipeline. ICCD 1998: 474-481 - [c40]William J. Dally, Andrew A. Chien, Stuart Fiske, Waldemar Horwat, Richard A. Lethin, Michael D. Noakes, Peter R. Nuth, Ellen Spertus, Deborah A. Wallach, D. Scott Wills, Andrew Chang, John S. Keen:
Retrospective: the J-machine. 25 Years ISCA: Retrospectives and Reprints 1998: 54-58 - [c39]Stephen W. Keckler, William J. Dally, Daniel Maskit, Nicholas P. Carter, Andrew Chang, Whay Sing Lee:
Exploiting Fine-grain Thread Level Parallelism on the MIT Multi-ALU Processor. ISCA 1998: 306-317 - [c38]William J. Dally, Linda Chao, Andrew A. Chien, Soha Hassoun, Waldemar Horwat, Jon Kaplan, Paul Song, Brian Totty, D. Scott Wills:
Architecture of a Message-Driven Processor. 25 Years ISCA: Retrospectives and Reprints 1998: 337-344 - [c37]Scott Rixner, William J. Dally, Ujval J. Kapasi, Brucek Khailany, Abelardo López-Lagunas, Peter R. Mattson, John D. Owens:
A Bandwidth-efficient Architecture for Media Processing. MICRO 1998: 3-13 - [c36]J. P. Grossman, William J. Dally:
Point Sample Rendering. Rendering Techniques 1998: 181-192 - 1997
- [j17]Marco Fillo, Stephen W. Keckler, William J. Dally, Nicholas P. Carter, Andrew Chang, Yevgeny Gurevich, Whay Sing Lee:
The M-machine multicomputer. Int. J. Parallel Program. 25(3): 183-212 (1997) - [j16]William J. Dally, John W. Poulton:
Transmitter equalization for 4-Gbps signaling. IEEE Micro 17(1): 48-56 (1997) - [j15]John S. Keen, William J. Dally:
Extended Ehemeral Logging: Log Storage Management for Applications with Long Lived Transactions. ACM Trans. Database Syst. 22(1): 1-42 (1997) - 1996
- [e1]Bill Dally, Susan J. Eggers:
ASPLOS-VII Proceedings - Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, Cambridge, Massachusetts, USA, October 1-5, 1996. ACM Press 1996, ISBN 0-89791-767-7 [contents] - 1995
- [j14]Stuart Fiske, William J. Dally:
Thread prioritization: A thread scheduling mechanism for multiple-context parallel processors. Future Gener. Comput. Syst. 11(6): 503-518 (1995) - [c35]Larry R. Dennison, William J. Dally, Thucydides Xanthopoulos:
Low-latency plesiochronous data retiming. ARVLSI 1995: 304-315 - [c34]Peter R. Nuth, William J. Dally:
The Named-State Register File: Implementation and Performance. HPCA 1995: 4-13 - [c33]Stuart Fiske, William J. Dally:
Thread Prioritization: A Thread Scheduling Mechanism for Multiple-Context Parallel Processors. HPCA 1995: 210-221 - [c32]Marco Fillo, Stephen W. Keckler, William J. Dally, Nicholas P. Carter, Andrew Chang, Yevgeny Gurevich, Whay Sing Lee:
The M-Machine multicomputer. MICRO 1995: 146-156 - [c31]Ellen Spertus, William J. Dally:
Evaluating the Locality Benefits of Active Messages. PPoPP 1995: 189-198 - 1994
- [j13]Robert A. Iannucci, Anant Agarwal, Bill Dally, Anoop Gupta, Greg Papadopoulos, Burton J. Smith:
Architectural and implementation issues for multithreading (panel session I). SIGARCH Comput. Archit. News 22(1): 3-18 (1994) - [c30]Nicholas P. Carter, Stephen W. Keckler, William J. Dally:
Hardware Support for Fast Capability-based Addressing. ASPLOS 1994: 319-327 - [c29]John S. Keen, William J. Dally:
XEL: Extended Ephemeral Logging for Log Storage Management. CIKM 1994: 312-321 - [c28]William J. Dally, Larry R. Dennison, David Money Harris, Kinhong Kan, Thucydides Xanthopoulos:
Architecture and implementation of the reliable router. Hot Interconnects 1994: 197-208 - [c27]William J. Dally, Larry R. Dennison, David Money Harris, Kinhong Kan, Thucydides Xanthopoulos:
The Reliable Router: A Reliable and High-Performance Communication Substrate for Parallel Computers. PCRCW 1994: 241-255 - [p3]William J. Dally:
Issues in the Design and Implementation of Instruction Processors for Multicomputers (Position Statement). Multithreaded Computer Architecture 1994: 79-82 - [p2]Kathleen Knobe, William J. Dally:
Subspace Optimizations. Automatic Parallelization 1994: 153-176 - [p1]Peter R. Nuth, William J. Dally:
Named State and Efficient Context Switching. Multithreaded Computer Architecture 1994: 201-212 - 1993
- [j12]William J. Dally:
A Universal Parallel Computer Architecture. New Gener. Comput. 11(3): 227-249 (1993) - [j11]William J. Dally, Hiromichi Aoki:
Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels. IEEE Trans. Parallel Distributed Syst. 4(4): 466-475 (1993) - [c26]Michael D. Noakes, Deborah A. Wallach, William J. Dally:
The J-Machine Multicomputer: An Architectural Evaluation. ISCA 1993: 224-235 - [c25]Ellen Spertus, Seth Copen Goldstein, Klaus E. Schauser, Thorsten von Eicken, David E. Culler, William J. Dally:
Evaluation of Mechanisms for Fine-Grained Parallel Programs in the J-Machine and the CM-5. ISCA 1993: 302-313 - [c24]John S. Keen, William J. Dally:
Performance Evaluation of Ephemeral Logging. SIGMOD Conference 1993: 187-196 - 1992
- [j10]William J. Dally, Stuart Fiske, John S. Keen, Richard A. Lethin, Michael D. Noakes, Peter R. Nuth, Roy E. Davison, Gregory A. Fyler:
The message-driven processor: a multicomputer processing node with efficient mechanisms. IEEE Micro 12(2): 23-39 (1992) - [j9]William J. Dally:
A Fast Translation Method for Paging on top of Segmentation. IEEE Trans. Computers 41(2): 247-250 (1992) - [j8]William J. Dally:
Virtual-Channel Flow Control. IEEE Trans. Parallel Distributed Syst. 3(2): 194-205 (1992) - [c23]William J. Dally:
A Universal Parallel Computer Architecture. FGCS 1992: 746-758 - [c22]William J. Dally, Andrew A. Chien, Stuart Fiske, Gregory A. Fyler, Waldemar Horwat, John S. Keen, Richard A. Lethin, Michael D. Noakes, Peter R. Nuth, D. Scott Wills:
The Message Driven Processor: An Integrated Multicomputer Processing Element. ICCD 1992: 416-419 - [c21]Peter R. Nuth, William J. Dally:
The J-Machine Network. ICCD 1992: 420-423 - [c20]Richard A. Lethin, William J. Dally:
MDP Design Tools and Methods. ICCD 1992: 424-428 - [c19]Stephen W. Keckler, William J. Dally:
Processor Coupling: Integrating Compile Time and Runtime Scheduling for Parallelism. ISCA 1992: 202-213 - 1991
- [j7]William J. Dally:
Express Cubes: Improving the Performance of k-Ary n-Cube Interconnection Networks. IEEE Trans. Computers 40(9): 1016-1023 (1991) - [c18]Peter R. Nuth, William J. Dally:
A Mechanism for Efficient Context Switching. ICCD 1991: 301-304 - [c17]Ellen Spertus, William J. Dally:
Experiences Implementing Dataflow on a General-Purpose Parallel Computer. ICPP (2) 1991: 231-235 - 1990
- [j6]William J. Dally:
Performance Analysis of k-Ary n-Cube Interconnection Networks. IEEE Trans. Computers 39(6): 775-785 (1990) - [j5]Prathima Agrawal, William J. Dally:
A hardware logic simulation system. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 9(1): 19-29 (1990) - [c16]Kevin Lam, Larry R. Dennison, William J. Dally:
Simultaneous bidirectional signalling for IC systems. ICCD 1990: 430-433 - [c15]William J. Dally:
Virtual-Channel Flow Control. ISCA 1990: 60-68 - [c14]Andrew A. Chien, William J. Dally:
Concurrent Aggregates (CA). PPoPP 1990: 187-196
1980 – 1989
- 1989
- [c13]William J. Dally:
Micro-Optimization of Floating Point Operations. ASPLOS 1989: 283-289 - [c12]Prathima Agrawal, Raffi Tutundjian, William J. Dally:
Algorithms for Accuracy Enhancement in a Hardware Logic Simulator. DAC 1989: 645-648 - [c11]William J. Dally, Andrew A. Chien, Stuart Fiske, Waldemar Horwat, John S. Keen, Michael Larivee, Richard A. Lethin, Peter R. Nuth, D. Scott Wills:
The J-Machine: A Fine-Gain Concurrent Computer. IFIP Congress 1989: 1147-1153 - [c10]William J. Dally, D. Scott Wills:
Universal Mechanisms for Concurrency. PARLE (1) 1989: 19-33 - [c9]Waldemar Horwat, Andrew A. Chien, William J. Dally:
Experience with CST: Programming and Implementation. PLDI 1989: 101-109 - 1988
- [c8]William J. Dally:
Finite-grain message passing concurrent computers. C³P 1988: 2-12 - [c7]William J. Dally, Andrew A. Chien:
Object-oriented concurrent programming in CST. C³P 1988: 434-439 - [c6]William J. Dally:
Mechanisms for Concurrent Computing. FGCS 1988: 154-156 - [c5]Stuart Fiske, William J. Dally:
The Reconfigurable Arithmetic Processor. ISCA 1988: 30-36 - [c4]William J. Dally, Andrew A. Chien:
Object-oriented concurrent programming in CST. OOPSLA/ECOOP Workshop on Object-based Concurrent Programming 1988: 28-31 - 1987
- [j4]Prathima Agrawal, William J. Dally, W. C. Fischer, H. V. Jagadish, A. S. Krishnakumar, Raffi Tutundjian:
MARS: A Multiprocessor-Based Programmable Accelerator. IEEE Des. Test 4(5): 28-36 (1987) - [j3]William J. Dally, Charles L. Seitz:
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks. IEEE Trans. Computers 36(5): 547-553 (1987) - [c3]Prathima Agrawal, William J. Dally, Ahmed K. Ezzat, W. C. Fischer, H. V. Jagadish, A. S. Krishnakumar:
Architecture and Design of the MARS Hardware Accelerator. DAC 1987: 101-107 - [c2]William J. Dally, Linda Chao, Andrew A. Chien, Soha Hassoun, Waldemar Horwat, Jon Kaplan, Paul Song, Brian Totty, D. Scott Wills:
Architecture of a Message-Driven Processor. ISCA 1987: 189-196 - 1986
- [b1]William J. Dally:
A VLSI Architecture for Concurrent Data Structures. California Institute of Technology, USA, 1986 - [j2]William J. Dally, Charles L. Seitz:
The Torus Routing Chip. Distributed Comput. 1(4): 187-196 (1986) - 1985
- [j1]William J. Dally, Randal E. Bryant:
A Hardware Architecture for Switch-Level Simulation. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 4(3): 239-250 (1985) - [c1]William J. Dally, James T. Kajiya:
An Object Oriented Architecture. ISCA 1985: 154-161
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-07 21:20 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint