default search action
Ching-Hsiang Chu
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c27]Liang Luo, Buyun Zhang, Michael Tsang, Yinbin Ma, Ching-Hsiang Chu, Yuxin Chen, Shen Li, Yuchen Hao, Yanli Zhao, Guna Lakshminarayanan, Ellie Wen, Jongsoo Park, Dheevatsa Mudigere, Maxim Naumov:
Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large Scale Recommendation. MLSys 2024 - [i5]Liang Luo, Buyun Zhang, Michael Tsang, Yinbin Ma, Ching-Hsiang Chu, Yuxin Chen, Shen Li, Yuchen Hao, Yanli Zhao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Dheevatsa Mudigere, Maxim Naumov:
Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation. CoRR abs/2403.00877 (2024) - [i4]Hao Feng, Boyuan Zhang, Fanjiang Ye, Min Si, Ching-Hsiang Chu, Jiannan Tian, Chunxing Yin, Summer Deng, Yuchen Hao, Pavan Balaji, Tong Geng, Dingwen Tao:
Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression. CoRR abs/2407.04272 (2024) - 2023
- [c26]Kshiteej Mahajan, Ching-Hsiang Chu, Srinivas Sridharan, Aditya Akella:
Better Together: Jointly Optimizing ML Collective Scheduling and Execution Planning using SYNDICATE. NSDI 2023: 809-824 - 2022
- [c25]Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Zhihao Jia, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, Liang Luo, Jie Amy Yang, Leon Gao, Dmytro Ivchenko, Aarti Basant, Yuxi Hu, Jiyan Yang, Ehsan K. Ardestani, Xiaodong Wang, Rakesh Komuravelli, Ching-Hsiang Chu, Serhat Yilmaz, Huayu Li, Jiyuan Qian, Zhuobo Feng, Yinbin Ma, Junjie Yang, Ellie Wen, Hong Li, Lin Yang, Chonglin Sun, Whitney Zhao, Dimitry Melts, Krishna Dhulipala, K. R. Kishore, Tyler Graf, Assaf Eisenman, Kiran Kumar Matam, Adi Gangidi, Guoqiang Jerry Chen, Manoj Krishnan, Avinash Nayak, Krishnakumar Nair, Bharath Muthiah, Mahmoud khorashadi, Pallab Bhattacharya, Petr Lapukhov, Maxim Naumov, Ajit Mathews, Lin Qiao, Mikhail Smelyanskiy, Bill Jia, Vijay Rao:
Software-hardware co-design for fast and scalable training of deep learning recommendation models. ISCA 2022: 993-1011 - 2021
- [j9]Dhabaleswar Kumar Panda, Hari Subramoni, Ching-Hsiang Chu, Mohammadreza Bayatpour:
The MVAPICH project: Transforming research into high-performance MPI library for HPC community. J. Comput. Sci. 52: 101208 (2021) - [c24]Kawthar Shafie Khorassani, Ching-Hsiang Chu, Quentin G. Anthony, Hari Subramoni, Dhabaleswar K. Panda:
Adaptive and Hierarchical Large Message All-to-all Communication Algorithms for Large-scale Dense GPU Systems. CCGRID 2021: 113-122 - [c23]Kawthar Shafie Khorassani, Jahanzeb Maqbool Hashmi, Ching-Hsiang Chu, Chen-Chun Chen, Hari Subramoni, Dhabaleswar K. Panda:
Designing a ROCm-Aware MPI Library for AMD GPUs: Early Experiences. ISC 2021: 118-136 - [i3]Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, Liang Luo, Jie Amy Yang, Leon Gao, Dmytro Ivchenko, Aarti Basant, Yuxi Hu, Jiyan Yang, Ehsan K. Ardestani, Xiaodong Wang, Rakesh Komuravelli, Ching-Hsiang Chu, Serhat Yilmaz, Huayu Li, Jiyuan Qian, Zhuobo Feng, Yinbin Ma, Junjie Yang, Ellie Wen, Hong Li, Lin Yang, Chonglin Sun, Whitney Zhao, Dimitry Melts, Krishna Dhulipala, K. R. Kishore, Tyler Graf, Assaf Eisenman, Kiran Kumar Matam, Adi Gangidi, Guoqiang Jerry Chen, Manoj Krishnan, Avinash Nayak, Krishnakumar Nair, Bharath Muthiah, Mahmoud khorashadi, Pallab Bhattacharya, Petr Lapukhov, Maxim Naumov, Lin Qiao, Mikhail Smelyanskiy, Bill Jia, Vijay Rao:
High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models. CoRR abs/2104.05158 (2021) - 2020
- [j8]Jahanzeb Maqbool Hashmi, Ching-Hsiang Chu, Sourav Chakraborty, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda:
FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architectures. J. Parallel Distributed Comput. 144: 1-13 (2020) - [j7]Ammar Ahmad Awan, Arpan Jain, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Communication Profiling and Characterization of Deep-Learning Workloads on Clusters With High-Performance Interconnects. IEEE Micro 40(1): 35-43 (2020) - [c22]Ching-Hsiang Chu, Kawthar Shafie Khorassani, Qinghua Zhou, Hari Subramoni, Dhabaleswar K. Panda:
Dynamic Kernel Fusion for Bulk Non-contiguous Data Transfer on GPU Clusters. CLUSTER 2020: 130-141 - [c21]Ching-Hsiang Chu, Pouya Kousha, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni, Dhabaleswar K. D. K. Panda:
NV-group: link-efficient reduction for distributed deep learning on modern dense GPU systems. ICS 2020: 6:1-6:12
2010 – 2019
- 2019
- [j6]Ammar Ahmad Awan, Karthik Vadambacheri Manian, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Optimized large-message broadcast for deep learning workloads: MPI, MPI+NCCL, or NCCL2? Parallel Comput. 85: 141-152 (2019) - [j5]Ching-Hsiang Chu, Xiaoyi Lu, Ammar Ahmad Awan, Hari Subramoni, Bracy Elton, Dhabaleswar K. Panda:
Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast. IEEE Trans. Parallel Distributed Syst. 30(3): 575-588 (2019) - [c20]Karthik Vadambacheri Manian, A. A. Ammar, Amit Ruhela, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Characterizing CUDA Unified Memory (UM)-Aware MPI Designs on Modern GPU Architectures. GPGPU@ASPLOS 2019: 43-52 - [c19]Ammar Ahmad Awan, Jeroen Bédorf, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation. CCGRID 2019: 498-507 - [c18]Pouya Kousha, Bharath Ramesh, Kaushik Kandadi Suresh, Ching-Hsiang Chu, Arpan Jain, Nick Sarkauskas, Hari Subramoni, Dhabaleswar K. Panda:
Designing a Profiling and Visualization Tool for Scalable and In-depth Analysis of High-Performance GPU Clusters. HiPC 2019: 93-102 - [c17]Ching-Hsiang Chu, Jahanzeb Maqbool Hashmi, Kawthar Shafie Khorassani, Hari Subramoni, Dhabaleswar K. Panda:
High-Performance Adaptive MPI Derived Datatype Communication for Modern Multi-GPU Systems. HiPC 2019: 267-276 - [c16]Ammar Ahmad Awan, Arpan Jain, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Communication Profiling and Characterization of Deep Learning Workloads on Clusters with High-Performance Interconnects. Hot Interconnects 2019: 49-53 - [c15]Jie Zhang, Xiaoyi Lu, Ching-Hsiang Chu, Dhabaleswar K. Panda:
C-GDR: High-Performance Container-Aware GPUDirect MPI Communication Schemes on RDMA Networks. IPDPS 2019: 242-251 - [c14]Karthik Vadambacheri Manian, Ching-Hsiang Chu, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni:
OMB-UM: Design, Implementation, and Evaluation of CUDA Unified Memory Aware MPI Benchmarks. PMBS@SC 2019: 82-92 - [c13]Kawthar Shafie Khorassani, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Performance Evaluation of MPI Libraries on GPU-Enabled OpenPOWER Architectures: Early Experiences. ISC Workshops 2019: 361-378 - 2018
- [j4]Min-Te Sun, Ching-Hsiang Chu, Eric Hsiao-Kuang Wu, Chi-Sen Hsiao, Andy An-Kai Jeng:
Distributed Topology Control for Energy-Efficient and Reliable Wireless Communications. IEEE Syst. J. 12(3): 2152-2161 (2018) - [c12]Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Xiaoyi Lu, Dhabaleswar K. Panda:
OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training. HiPC 2018: 143-152 - [c11]Ching-Hsiang Chu, Sreeram Potluri, Anshuman Goswami, Manjunath Gorentla Venkata, Neena Imam, Chris J. Newburn:
Designing High-Performance In-Memory Key-Value Operations with Persistent GPU Kernels and OpenSHMEM. OpenSHMEM 2018: 148-164 - [c10]Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL? EuroMPI 2018: 2:1-2:9 - [i2]Ammar Ahmad Awan, Jeroen Bédorf, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation. CoRR abs/1810.11112 (2018) - 2017
- [c9]Akshay Venkatesh, Khaled Hamidouche, Sreeram Potluri, Davide Rossetti, Ching-Hsiang Chu, Dhabaleswar K. Panda:
MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling. ICPP 2017: 151-160 - [c8]Ching-Hsiang Chu, Xiaoyi Lu, Ammar Ahmad Awan, Hari Subramoni, Jahanzeb Maqbool Hashmi, Bracy Elton, Dhabaleswar K. Panda:
Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning. ICPP 2017: 161-170 - [i1]Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL? CoRR abs/1707.09414 (2017) - 2016
- [j3]Khaled Hamidouche, Akshay Venkatesh, Ammar Ahmad Awan, Hari Subramoni, Ching-Hsiang Chu, Dhabaleswar K. Panda:
CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters. Parallel Comput. 58: 27-36 (2016) - [c7]Ching-Hsiang Chu, Khaled Hamidouche, Akshay Venkatesh, Ammar Ahmad Awan, Dhabaleswar K. Panda:
CUDA Kernel Based Collective Reduction Operations on Large-scale GPU Clusters. CCGrid 2016: 726-735 - [c6]Ching-Hsiang Chu, Khaled Hamidouche, Akshay Venkatesh, Dip Sankar Banerjee, Hari Subramoni, Dhabaleswar K. Panda:
Exploiting Maximal Overlap for Non-Contiguous Data Movement Processing on Modern GPU-Enabled Systems. IPDPS 2016: 983-992 - [c5]Ching-Hsiang Chu, Khaled Hamidouche, Hari Subramoni, Akshay Venkatesh, Bracy Elton, Dhabaleswar K. Panda:
Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters. SBAC-PAD 2016: 59-66 - [c4]Ching-Hsiang Chu, Khaled Hamidouche, Hari Subramoni, Akshay Venkatesh, Bracy Elton, Dhabaleswar K. Panda:
Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications. COMHPC@SC 2016: 29-38 - 2015
- [c3]Khaled Hamidouche, Akshay Venkatesh, Ammar Ahmad Awan, Hari Subramoni, Ching-Hsiang Chu, Dhabaleswar K. Panda:
Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters. CLUSTER 2015: 78-87 - [c2]A. A. Awan, Khaled Hamidouche, Ching-Hsiang Chu, Dhabaleswar K. Panda:
A Case for Non-blocking Collectives in OpenSHMEM: Design, Implementation, and Performance Evaluation using MVAPICH2-X. OpenSHMEM 2015: 69-86 - 2014
- [c1]Ching-Hsiang Chu, You-Ming Chen, Yu-Te Huang, Roberto Carvalho, Chiun-Chieh Hsu, Ling-Jyh Chen:
Measurement of long-distance Wi-Fi connections: An empirical study. ICC 2014: 2418-2423 - 2013
- [j2]Jyh-Ming Chen, Eric Hsiao-Kuang Wu, Hsiang-Wei Lu, Ching-Hsiang Chu, Meng-Feng Tsai:
Channel condition self-clocked packet scheduling scheme for wireless networks. EURASIP J. Wirel. Commun. Netw. 2013: 131 (2013) - 2011
- [j1]Jyh-Ming Chen, Ching-Hsiang Chu, Eric Hsiao-Kuang Wu, Meng-Feng Tsai, Jian-Ren Wang:
Improving SCTP Performance by Jitter-Based Congestion Control over Wired-Wireless Networks. EURASIP J. Wirel. Commun. Netw. 2011 (2011)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-08-14 22:16 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint