A dynamically adaptive approach for speculative loop execution in SMT architectures
M Li, Y Zhao - 2014 IEEE Intl Conf on High Performance …, 2014 - ieeexplore.ieee.org
M Li, Y Zhao
2014 IEEE Intl Conf on High Performance Computing and …, 2014•ieeexplore.ieee.orgSimultaneous multithreading allows the exploitation of thread-level speculation on the same
processor. Due to the contention for shared processor resources, the performance of
speculative threads often suffers from the potential of inter-thread interference, which is hard
to be statically estimated by the compiler. Thus we propose an approach to dynamically
determine and extract speculative threads from parallel regions until runtime. It relies on a
cycle counter architecture to collect the performance profiles of each parallelized loop and …
processor. Due to the contention for shared processor resources, the performance of
speculative threads often suffers from the potential of inter-thread interference, which is hard
to be statically estimated by the compiler. Thus we propose an approach to dynamically
determine and extract speculative threads from parallel regions until runtime. It relies on a
cycle counter architecture to collect the performance profiles of each parallelized loop and …
Simultaneous multithreading allows the exploitation of thread-level speculation on the same processor. Due to the contention for shared processor resources, the performance of speculative threads often suffers from the potential of inter-thread interference, which is hard to be statically estimated by the compiler. Thus we propose an approach to dynamically determine and extract speculative threads from parallel regions until runtime. It relies on a cycle counter architecture to collect the performance profiles of each parallelized loop and uncover the potential of loop-level parallelism. These performance profiles are obtained from the relative single-threaded execution time prediction for speculative threads using thread execution cycle breakdown. The performance of different loop levels is dynamically evaluated by the prediction and only the best loop level will be chosen to parallelize. Several performance tuning policies are also examined. The best policy can achieve an average speedup of 1.45 using SPEC CPU2000 benchmarks, and it outperforms the static loop selection by 33%.
ieeexplore.ieee.org
Showing the best result for this search. See all results