[PDF][PDF] Fine-grain multithreading with the EM-X multiprocessor

A Sohn, Y Kodama, J Ku, M Sato, H Sakane… - Proceedings of the …, 1997 - dl.acm.org
A Sohn, Y Kodama, J Ku, M Sato, H Sakane, H Yamana, S Sakai, Y Yamaguchi
Proceedings of the ninth annual ACM symposium on parallel algorithms and …, 1997dl.acm.org
Multithreading aims to tolerate latency by overlapping communication with computation. This
report explicates the multithreading capabilities of the EM-X distributed-memory
multiprocessor through empirical studies. The EM-X provides hardware supports for fine-
grain multithreading, including a by-passing mechanism for direct remote reads and writes,
hardware FIFO thread scheduling, and dedicated instructions for generating fixedsized
communication packets. Bitonic sorting and Fast Fourier Transform are selected for …
Abstract
Multithreading aims to tolerate latency by overlapping communication with computation. This report explicates the multithreading capabilities of the EM-X distributed-memory multiprocessor through empirical studies. The EM-X provides hardware supports for fine-grain multithreading, including a by-passing mechanism for direct remote reads and writes, hardware FIFO thread scheduling, and dedicated instructions for generating fixedsized communication packets. Bitonic sorting and Fast Fourier Transform are selected for experiments. Parameters that characterize the performance of multi threading are investigated, including the number of threads, the number of thread switches, the run length, and the number of remote reads. Experimental results indicate that the best communication performance occurs when the number of threads is two to four. FIW yielded over 95% overlapping due to a large amount of computation and communication parallelism across threads. Even in the absence of thread computation parallelism, multithreading helps overlap over 3570 of the communication time for bitonic sorting.
ACM Digital Library
Showing the best result for this search. See all results