default search action
24th ICS 2010: Tsukuba, Ibaraki, Japan
- Taisuke Boku, Hiroshi Nakashima, Avi Mendelson:
Proceedings of the 24th International Conference on Supercomputing, 2010, Tsukuba, Ibaraki, Japan, June 2-4, 2010. ACM 2010, ISBN 978-1-4503-0018-6
Keynotes
- Stephen S. Pawlowski:
Exascale science: the next frontier in high performance computing. 1 - William J. Dally:
Throughput computing. 2 - Kimihiko Hirao:
The next-generation supercomputer project and a plan for the advanced institute for computational science. 3
MPI
- Vladimir Marjanovic, Jesús Labarta, Eduard Ayguadé, Mateo Valero:
Overlapping communication and computation by using a hybrid MPI/SMPSs approach. 5-16 - Sreeram Potluri, Ping Lai, Karen A. Tomko, Sayantan Sur, Yifeng Cui, Mahidhar Tatineni, Karl W. Schulz, William L. Barth, Amitava Majumdar, Dhabaleswar K. Panda:
Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application. 17-25 - Nikhil Jain, Yogish Sabharwal:
Optimal bucket algorithms for large MPI collectives on torus interconnects. 27-36
Cache and transaction memory
- Javier Lira, Carlos Molina, Antonio González:
The auction: optimizing banks usage in Non-Uniform Cache Architectures. 37-47 - Robert Strzodka, Mohammed Shaheen, Dawid Pajak, Hans-Peter Seidel:
Cache oblivious parallelograms in iterative stencil computations. 49-59 - Woongki Baek, Nathan Grasso Bronson, Christos Kozyrakis, Kunle Olukotun:
Making nested parallel transactions practical using lightweight hardware support. 61-71
Applications (1)
- Atabak Mahram, Martin C. Herbordt:
Fast and accurate NCBI BLASTP: acceleration with multiphase FPGA-based prefiltering. 73-82 - Narges Bani Asadi, Christopher W. Fletcher, Greg Gibeling, John Wawrzynek, Wing H. Wong, Garry P. Nolan:
ParaLearn: a massively parallel, scalable system for learning interaction networks on FPGAs. 83-94 - Michael D. Linderman, Robert V. Bruggner, Vivek Athalye, Teresa H. Meng, Narges Bani Asadi, Garry P. Nolan:
High-throughput Bayesian network learning using heterogeneous multicore computers. 95-104 - Chi Ching Chi, Ben H. H. Juurlink, Cor Meenderinck:
Evaluation of parallel H.264 decoding strategies for the Cell Broadband Engine. 105-114
GPGPU and accelerators (1)
- Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Xipeng Shen:
Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping. 115-126 - Allen D. Malony, Scott Biersdorff, Wyatt Spear, Shangkar Mayanglambam:
An experimental approach to performance measurement of heterogeneous parallel applications using CUDA. 127-136 - Vignesh T. Ravi, Wenjing Ma, David Chiu, Gagan Agrawal:
Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. 137-146
Architecture
- Ramon Bertran, Marc González, Xavier Martorell, Nacho Navarro, Eduard Ayguadé:
Decomposable and responsive power models for multicore processors using performance counters. 147-158 - Lixin Zhang, Evan Speight, Ramakrishnan Rajamony, Jiang Lin:
Enigma: architectural and operating system support for reducing the impact of address translation. 159-168 - Huaiyu Zhu, Yong Chen, Xian-He Sun:
Timing local streams: improving timeliness in data prefetching. 169-178 - Chunyang Gou, Georgi Kuzmanov, Georgi Gaydadjiev:
SAMS multi-layout memory: providing multiple views of data to boost SIMD performance. 179-188
System and IO issues
- Major Bhadauria, Sally A. McKee:
An approach to resource-aware co-scheduling for CMPs. 189-199 - Adam J. Oliner, Alex Aiken:
A query language for understanding component interactions in production systems. 201-210 - Ramya Prabhakar, Shekhar Srikantaiah, Mahmut T. Kandemir, Christina M. Patrick:
Adaptive multi-level cache allocation in distributed storage architectures. 211-221 - Xuechen Zhang, Song Jiang:
InterferenceRemoval: removing interference of disk access for MPI programs through data replication. 223-232
Applications (2)
- Keith R. Bisset, Jiangzhuo Chen, Xizhou Feng, Yifei Ma, Madhav V. Marathe:
Indemics: an interactive data intensive framework for high performance epidemic simulation. 233-242 - Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Robert J. Fowler, Daniel A. Reed:
Clustering performance data efficiently at massive scales. 243-252 - Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul F. Fischer, Paul D. Hovland:
Speeding up Nek5000 with autotuning and specialization. 253-262
Compilers
- Josep M. Pérez, Rosa M. Badia, Jesús Labarta:
Handling task dependencies under strided and aliased references. 263-274 - Harmen L. A. van der Spek, C. W. Mattias Holm, Harry A. G. Wijshoff:
How to unleash array optimizations on code using recursive data structures. 275-284 - Lixia Liu, Zhiyuan Li:
A compiler-automated array compression scheme for optimizing memory intensive programs. 285-294 - Arun Chauhan, Chun-Yu Shei:
Static reuse distances for locality-based optimizations in MATLAB. 295-304
GPGPU and accelerators (2)
- Liang Gu, Xiaoming Li, Jakob Siegel:
An empirically tuned 2D and 3D FFT library on CUDA GPU. 305-314 - Yifeng Chen, Xiang Cui, Hong Mei:
Large-scale FFT on GPU clusters. 315-324 - Yong Dou, Yuanwu Lei, Guiming Wu, Song Guo, Jie Zhou, Li Shen:
FPGA accelerating double/quad-double high precision floating-point applications for ExaScale computing. 325-336 - Jamin Naghmouchi, Daniele Paolo Scarpazza, Mladen Berekovic:
Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimization. 337-348
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.