default search action
ICPP 2021: Virtual Event / Lemont (near Chicago), IL, USA
- Xian-He Sun, Sameer Shende, Laxmikant V. Kalé, Yong Chen:
ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9 - 12, 2021. ACM 2021, ISBN 978-1-4503-9068-2
Best Paper Candidates
- Hanfeng Liu, Zeyi Wen, Wei Cai:
FastPSO: Towards Efficient Swarm Intelligence Algorithm on GPUs. 1:1-1:10 - Tanmoy Sen, Haiying Shen:
Context-aware Data Operation Strategies in Edge Systems for High Application Performance. 2:1-2:10 - Yang Yang, Qiang Cao, Jie Yao, Yuanyuan Dong, Weikang Kong:
SPMFS: A Scalable Persistent Memory File System on Optane Persistent Memory. 3:1-3:10 - Lucas Leandro Nesi, Arnaud Legrand, Lucas Mello Schnorr:
Exploiting system level heterogeneity to improve the performance of a GeoStatistics multi-phase task-based application. 4:1-4:10
Memory Systems and NVM
- Shizhi Jiang, Yiwei Ci, Qiusong Yang, Mingshu Li:
Matryoshka: A Coalesced Delta Sequence Prefetcher. 5:1-5:11 - Jingwen Du, Fang Wang, Dan Feng, Weiguang Li, Fan Li:
Fast and Consistent Remote Direct Access to Non-volatile Memory. 6:1-6:11 - Mengya Lei, Fang Wang, Dan Feng, Fan Li, Xueliang Wei:
Crash-Consistency-Aware Encryption for Non-Volatile Memories. 7:1-7:10 - Bagus Hanindhito, Ruihao Li, Dimitrios Gourounas, Arash Fathi, Karan Govil, Dimitar Trenev, Andreas Gerstlauer, Lizy Kurian John:
Wave-PIM: Accelerating Wave Simulation Using Processing-in-Memory. 8:1-8:11
GPU Computing and Task-based Programming Models
- Yan-Hao Chen, Fei Hua, Yuwei Jin, Eddy Z. Zhang:
BGPQ: A Heap-Based Priority Queue Design for GPUs. 9:1-9:10 - Antoni Navarro Muñoz, Arthur Francisco Lorenzon, Eduard Ayguadé Parra, Vicenç Beltran Querol:
Combining Dynamic Concurrency Throttling with Voltage and Frequency Scaling on Task-based Programming Models. 10:1-10:11 - Seiya Kozakai, Noriyuki Fujimoto, Koichi Wada:
Efficient GPU-Implementation for Integer Sorting Based on Histogram and Prefix-Sums. 11:1-11:11 - Martin Koppehel, Tobias Groth, Sven Groppe, Thilo Pionteck:
CuART - a CUDA-based, scalable Radix-Tree lookup and update engine. 12:1-12:10
Resource Management and Infrastructure
- Jinyu Yu, Dan Feng, Wei Tong, Pengze Lv, Yufei Xiong:
CERES: Container-Based Elastic Resource Management System for Mixed Workloads. 13:1-13:10 - Jananie Jarachanthan, Li Chen, Fei Xu, Bo Li:
AMPS-Inf: Automatic Model Partitioning for Serverless Inference with Cost Efficiency. 14:1-14:12 - Hongyan Li, Hang Lu, Jiawen Huang, Wenxu Wang, Mingzhe Zhang, Wei Chen, Liang Chang, Xiaowei Li:
BitX: Empower Versatile Inference with Hardware Runtime Pruning. 15:1-15:12 - Longfang Zhou, Xiaorong Zhang, Wenxiang Yang, Yongguo Han, Fang Wang, Yadong Wu, Jie Yu:
PREP: Predicting Job Runtime with Job Running Path on Supercomputers. 16:1-16:10
Storage Systems and Parallel I/O
- Liangfeng Cheng, Yuchong Hu, Zhaokang Ke, Zhongjie Wu:
Coupling Right-Provisioned Cold Storage Data Centers with Deduplication. 17:1-17:11 - Yang Zhou, Fang Wang, Dan Feng:
ASLDP: An Active Semi-supervised Learning method for Disk Failure Prediction. 18:1-18:11 - Hai Zhou, Dan Feng, Yuchong Hu:
Multi-level Forwarding and Scheduling Repair Technique in Heterogeneous Network for Erasure-coded Clusters. 19:1-19:11 - Haiwei Deng, Ranhao Jia, Chentao Wu:
A Graph-Assisted Out-of-Place Update Scheme for Erasure Coded Storage Systems. 20:1-20:10
Scheduling Algorithms and Optimizations
- Yubin Duan, Jie Wu:
Joint Optimization of DNN Partition and Scheduling for Mobile Cloud Computing. 21:1-21:10 - Ali Eker, David Timmerman, Barry Williams, Kenneth Chiu, Dmitry Ponomarev:
GVT-Guided Demand-Driven Scheduling in Parallel Discrete Event Simulation. 22:1-22:10 - Lucas Perotin, Hongyang Sun, Padma Raghavan:
Multi-Resource List Scheduling of Moldable Parallel Jobs under Precedence Constraints. 23:1-23:10 - YuAng Chen, Yeh-Ching Chung:
HiPa: Hierarchical Partitioning for Fast PageRank on NUMA Multicore Systems. 24:1-24:10
GPU-Accelerated Applications
- Robin Kobus, André Müller, Daniel Jünger, Christian Hundt, Bertil Schmidt:
MetaCache-GPU: Ultra-Fast Metagenomic Classification. 25:1-25:11 - Zonghao Feng, Qiong Luo:
Accelerating Sequence-to-Graph Alignment on Heterogeneous Processors. 26:1-26:10 - Ricardo Nobre, Aleksandar Ilic, Sergio Santander-Jiménez, Leonel Sousa:
Fourth-Order Exhaustive Epistasis Detection for the xPU Era. 27:1-27:10 - Junsong Wang, Xiaofan Zhang, Yubo Li, Yonghua Lin:
Exploring HW/SW Co-Optimizations for Accelerating Large-scale Texture Identification on Distributed GPUs. 28:1-28:10
Performance Modeling and Evaluation
- Guillem López-Paradís, Adrià Armejach, Miquel Moretó:
gem5 + rtl: A Framework to Enable RTL Models Inside a Full-System Simulator. 29:1-29:11 - Abdullah Alperen, Md. Afibuzzaman, Fazlay Rabbi, M. Yusuf Özkaya, Ümit V. Çatalyürek, Hasan Metin Aktulga:
An Evaluation of Task-Parallel Frameworks for Sparse Solvers on Multicore and Manycore CPU Architectures. 30:1-30:11 - Alexandre Denis, Emmanuel Jeannot, Philippe Swartvagher:
Interferences between Communications and Computations in Distributed HPC Systems. 31:1-31:11 - Junyao Yang, Yuchen Wang, Zhenlin Wang:
Efficient Modeling of Random Sampling-Based LRU. 32:1-32:11
Parallelization and Code Generation
- Qiang Fu, H. Howie Huang:
Automatic Generation of High-Performance Inference Kernels for Graph Neural Networks on Multi-Core Systems. 33:1-33:11 - Mingzhen Li, Yi Liu, Hailong Yang, Yongmin Hu, Qingxiao Sun, Bangduo Chen, Xin You, Xiaoyan Liu, Zhongzhi Luan, Depei Qian:
Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core Processors. 34:1-34:12 - Jan-Patrick Lehr, Christian H. Bischof, Florian Dewald, Heiko Mantel, Mohammad Norouzi, Felix Wolf:
Tool-Supported Mini-App Extraction to Facilitate Program Analysis and Parallelization. 35:1-35:10 - Hannah Cartier, James Dinan, D. Brian Larkins:
Optimizing Work Stealing Communication with Structured Atomic Operations. 36:1-36:10
Applications with Machine Learning
- Haoyu Wang, Haiying Shen, Jiechao Gao, Kevin Zheng, Xiaoying Li:
Multi-Agent Reinforcement Learning based Distributed Renewable Energy Matching for Datacenters. 37:1-37:10 - Sijiang Fan, Jiawei Fei, Xiaowei Guo, Canqun Yang, Alistair Revell:
CNN+LSTM Accelerated Turbulent Flow Simulation with Link-Wise Artificial Compressibility Method. 38:1-38:10 - Garvit Goel, Atharva Gondhalekar, Jingyuan Qi, Zhicheng Zhang, Guohua Cao, Wu Feng:
ComputeCOVID19+: Accelerating COVID-19 Diagnosis and Monitoring via High-Performance Deep Learning on CT Images. 39:1-39:11 - Aymen Al Saadi, Dario Alfè, Yadu N. Babuji, Agastya Bhati, Ben Blaiszik, Alexander Brace, Thomas S. Brettin, Kyle Chard, Ryan Chard, Austin Clyde, Peter V. Coveney, Ian T. Foster, Tom Gibbs, Shantenu Jha, Kristopher Keipert, Dieter Kranzlmüller, Thorsten Kurth, Hyungro Lee, Zhuozhao Li, Heng Ma, Gerald Mathias, André Merzky, Alexander Partin, Arvind Ramanathan, Ashka Shah, Abraham C. Stern, Rick Stevens, Li Tan, Mikhail Titov, Anda Trifan, Aristeidis Tsaris, Matteo Turilli, Huub J. J. Van Dam, Shunzhou Wan, David Wifling, Junqi Yin:
IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads. 40:1-40:12
Graph Computing
- Ruiqi Tang, Ziyi Zhao, Kailun Wang, Xiaoli Gong, Jin Zhang, Wenwen Wang, Pen-Chung Yew:
Ascetic: Enhancing Cross-Iterations Data Efficiency in Out-of-Memory Graph Processing on GPUs. 41:1-41:10 - Mohsen Koohi Esfahani, Peter Kilpatrick, Hans Vandierendonck:
Exploiting in-Hub Temporal Locality in SpMV-based Graph Processing. 42:1-42:10 - Huashan Yu, Xiaolin Wang, Yingwei Luo:
An Edge-Fencing Strategy for Optimizing SSSP Computations on Large-Scale Graphs. 43:1-43:11 - Lin Zhu, Qiang-Sheng Hua, Hai Jin:
Communication Avoiding All-Pairs Shortest Paths Algorithm for Sparse Graphs. 44:1-44:10
Storage Software and Optimizations
- Qiliang Li, Min Lyu, Liangliang Xu, Yinlong Xu, Wei Wang:
Fast Reconstruction for Large Disk Enclosures Based on RAID2.0. 45:1-45:10 - Jun Li, Minjun Li, Zhigang Cai, François Trahay, Mohamed Wahib, Balazs Gerofi, Zhiming Liu, Min Huang, Jianwei Liao:
Intra-page Cache Update in SLC-mode with Partial Programming in High Density SSDs. 46:1-46:10 - Junhao Zhu, Kaixin Huang, Xiaomin Zou, Chenglong Huang, Nuo Xu, Liang Fang:
HDNH: a read-efficient and write-optimized hashing scheme for hybrid DRAM-NVM memory. 47:1-47:10 - Jing Hu, Jianxi Chen, Yifeng Zhu, Qing Yang, Zhouxuan Peng, Ya Yu:
Parallel Multi-split Extendible Hashing for Persistent Memory. 48:1-48:10
Algorithms and Applications
- Zitong Li, Qiming Fang, Grey Ballard:
Parallel Tucker Decomposition with Numerically Accurate SVD. 49:1-49:11 - Nikita Mishin, Daniil Berezun, Alexander Tiskin:
Efficient Parallel Algorithms for String Comparison. 50:1-50:10 - Zhuoran Ji, Cho-Li Wang:
Accelerating DBSCAN Algorithm with AI Chips for Large Datasets. 51:1-51:11 - Runtian Ren, Xueyan Tang:
Generalized Skyline Interval Coloring and Dynamic Geometric Bin Packing Problems. 52:1-52:10
Linear Algebra Algorithms
- Chenhao Xie, Jieyang Chen, Jesun Firoz, Jiajia Li, Shuaiwen Leon Song, Kevin J. Barker, Mark Raugas, Ang Li:
Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures. 53:1-53:11 - Christoph Klein, Robert Strzodka:
Tridiagonal GPU Solver with Scaled Partial Pivoting at Maximum Bandwidth. 54:1-54:10 - Yuan Tang, Weiguo Gao:
Processor-Aware Cache-Oblivious Algorithms✱. 55:1-55:10 - Viviana Arrigoni, Filippo Maggioli, Annalisa Massini, Emanuele Rodolà:
Efficiently Parallelizable Strassen-Based Multiplication of a Matrix by its Transpose. 56:1-56:12
Data Analytics Systems and Runtime
- Yijie Shen, Jin Xiong, Dejun Jiang:
Using Vectorized Execution to Improve SQL Query Performance on Spark. 57:1-57:11 - Bowen Yu, Huanqi Cao, Tianyi Shan, Haojie Wang, Xiongchao Tang, Wenguang Chen:
Sparker: Efficient Reduction for More Scalable Machine Learning with Spark. 58:1-58:11 - Qianwen Ye, Wuji Liu, Chase Q. Wu:
NoStop: A Novel Configuration Optimization Scheme for Spark Streaming. 59:1-59:10 - Md. Muhib Khan, Weikuan Yu:
ROBOTune: High-Dimensional Configuration Tuning for Cluster-Based Data Analytics. 60:1-60:10
Applications and Performance
- Hanpei Wu, Tongliang Deng, Yanliang Zou, Shu Yin, Si Chen, Tao Xie:
ADA: An Application-Conscious Data Acquirer for Visual Molecular Dynamics. 61:1-61:9 - Kun Qiu, Harry Chang, Yang Hong, Wenjun Zhu, Xiang Wang, Baoqian Li:
Teddy: An Efficient SIMD-based Literal Matching Engine for Scalable Deep Packet Inspection. 62:1-62:11 - Jianda Wang, Yang Hu:
Enabling Efficient SIMD Acceleration for Virtual Radio Access Network. 63:1-63:10 - Marquita Ellis, Aydin Buluç, Katherine A. Yelick:
Scaling Generalized N-Body Problems, A Case Study from Genomics. 64:1-64:9
Networking and Routing
- Yang Shi, Mei Wen:
sRouting: Towards a Better Flow Size Estimation Performance through Routing and Sketch Configuration. 65:1-65:11 - Yiran Zhang, Kun Qian, Fengyuan Ren:
Receiver-Driven Congestion Control for InfiniBand. 66:1-66:10 - En Wang, Dongming Luan, Yongjian Yang, Zihe Wang, Pengmin Dong, Dawei Li, Wenbin Liu, Jie Wu:
Distributed Game-Theoretical Route Navigation for Vehicular Crowdsensing. 67:1-67:11 - Sen Liu, Xiang Lin, Zehua Guo, Yi Wang, Mohamed Adel Serhani, Yang Xu:
Optimizing Flow Completion Time via Adaptive Buffer Management in Data Center Networks. 68:1-68:10
Machine Learning and Acceleration
- Zhenwei Zhang, Qiang Qi, Ruitao Shang, Li Chen, Fei Xu:
Prophet: Speeding up Distributed DNN Training with Predictable Communication Scheduling. 69:1-69:11 - Dongsheng Li, Dan Huang, Zhiguang Chen, Yutong Lu:
Optimizing Massively Parallel Winograd Convolution on ARM Processor. 70:1-70:12 - Xiangyu Ye, Zhiquan Lai, Shengwei Li, Lei Cai, Ding Sun, Linbo Qiao, Dongsheng Li:
Hippie: A Data-Paralleled Pipeline Approach to Improve Memory-Efficiency and Scalability for Large DNN Training. 71:1-71:10 - Hao Lan, Li Chen, Baochun Li:
Accelerated Device Placement Optimization with Contrastive Learning. 72:1-72:10
Data Structures and Applications
- Haosen Wen, Wentao Cai, Mingzhe Du, Louis Jenkins, Benjamin Valpey, Michael L. Scott:
A Fast, General System for Buffered Persistent Data Structures. 73:1-73:11 - Zhengming Yi, Yiping Yao, Kai Chen:
A Universal Construction to implement Concurrent Data Structure for NUMA-muticore. 74:1-74:11 - Hui Zeng, Tongqing Zhou, Yeting Guo, Zhiping Cai, Fang Liu:
FedCav: Contribution-aware Model Aggregation on Distributed Heterogeneous Data in Federated Learning. 75:1-75:10 - Yizhi Huang, Yanlong Yin, Yan Liu, Shuibing He, Yang Bai, Renfa Li:
A Novel Multi-CPU/GPU Collaborative Computing Framework for SGD-based Matrix Factorization. 76:1-76:12
Performance Optimization
- Xiang Fei, Youhui Zhang:
Regu2D: Accelerating Vectorization of SpMV on Intel Processors through 2D-partitioning and Regular Arrangement. 77:1-77:11 - Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura:
Accurate Matrix Multiplication on Binary128 Format Accelerated by Ozaki Scheme. 78:1-78:11 - Enda Yu, Dezun Dong, Yemao Xu, Shuo Ouyang, Xiangke Liao:
CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay Compensation. 79:1-79:10 - Shaoshuai Zhang, Panruo Wu:
Recursion Brings Speedup to Out-of-Core TensorCore-based Linear Algebra Algorithms: A Case Study of Classic Gram-Schmidt QR Factorization. 80:1-80:11
Machine Learning Algorithms
- Guangli Li, Zhen Jia, Xiaobing Feng, Yida Wang:
LoWino: Towards Efficient Low-Precision Winograd Convolutions on Modern CPUs. 81:1-81:11 - Liang Gao, Li Li, Yingwen Chen, Wenli Zheng, Chengzhong Xu, Ming Xu:
FIFL: A Fair Incentive Mechanism for Federated Learning. 82:1-82:10 - Shulai Zhang, Zirui Li, Quan Chen, Wenli Zheng, Jingwen Leng, Minyi Guo:
Dubhe: Towards Data Unbiasedness with Homomorphic Encryption in Federated Learning Client Selection. 83:1-83:10 - Junhong Liu, Dongxu Yang, Junjie Lai:
Optimizing Winograd-Based Convolution with Tensor Cores. 84:1-84:10
Virtualization and Stream Processing
- Yuewen Wu, Heng Wu, Yuanjia Xu, Yi Hu, Wenbo Zhang, Hua Zhong, Tao Huang:
Best VM Selection for Big Data Applications across Multiple Frameworks by Transfer Learning. 85:1-85:11 - Lulu Yao, Yongkun Li, Jiawei Li, Weijie Wu, Yinlong Xu:
Progressive Memory Adjustment with Performance Guarantee in Virtualized Systems. 86:1-86:11 - Stijn Schildermans, Kris Aerts, Jianchen Shan, Xiaoning Ding:
Paratick: Reducing Timer Overhead in Virtual Machines. 87:1-87:10 - Huiyao Mei, Hanhua Chen, Hai Jin, Qiang-Sheng Hua, Bing Bing Zhou:
Efficient Complete Event Trend Detection over High-Velocity Streams. 88:1-88:12
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.