CoolSim: Statistical techniques to replace cache warming with efficient, virtualized profiling
2016 International Conference on Embedded Computer Systems …, 2016•ieeexplore.ieee.org
Simulation is an important part of the evaluation of next-generation computing systems.
Detailed, cycle-accurate simulation, however, can be very slow when evaluating realistic
workloads on modern microarchitectures. Sampled simulation (eg, SMARTS and SimPoint)
improves simulation performance by an order of magnitude or more through the reduction of
large workloads into a small but representative sample. Additionally, the execution state just
prior to a simulation sample can be stored into checkpoints, allowing for fast restoration and …
Detailed, cycle-accurate simulation, however, can be very slow when evaluating realistic
workloads on modern microarchitectures. Sampled simulation (eg, SMARTS and SimPoint)
improves simulation performance by an order of magnitude or more through the reduction of
large workloads into a small but representative sample. Additionally, the execution state just
prior to a simulation sample can be stored into checkpoints, allowing for fast restoration and …
Simulation is an important part of the evaluation of next-generation computing systems. Detailed, cycle-accurate simulation, however, can be very slow when evaluating realistic workloads on modern microarchitectures. Sampled simulation (e.g., SMARTS and SimPoint) improves simulation performance by an order of magnitude or more through the reduction of large workloads into a small but representative sample. Additionally, the execution state just prior to a simulation sample can be stored into checkpoints, allowing for fast restoration and evaluation. Unfortunately, changes in software, architecture or fundamental pieces of the microarchitecture (e.g., hardware-software co-design) require checkpoint regeneration. The end result for co-design degenerates to creating checkpoints for each modification, a task check pointing was designed to eliminate. Therefore, a solution is needed that allows for fast and accurate simulation, without the need for checkpoints. Virtualized fast-forwarding (VFF), an alternative to using checkpoints, allows for execution at near-native speed between simulation points. Warming the micro-architectural state prior to each simulation point, however, requires functional simulation, a costly operation for large caches (e.g., 8 MB). Simulating future systems with caches of many MBs can require warming of billions of instructions, dominating simulation time. This paper proposes CoolSim, an efficient simulation framework that eliminates cache warming. CoolSim uses VFF to advance between simulation points collecting at the same time sparse memory reuse information (MRI). The MRI is collected more than an order of magnitude faster than functional simulation. At the simulation point, detailed simulation with a statistical cache model is used to evaluate the design. The previously acquired MRI is used to estimate whether each memory request hits in the cache. The MRI is an architecturally independent metric and a single profile can be used in simulations of any size cache. We describe a prototype implementation of CoolSim based on KVM and gem5 running 19 × faster than the state-of-the-art sampled simulation, while it estimates the CPI of the SPEC CPU2006 benchmarks with 3.62% error on average, across a wide range of cache sizes.
ieeexplore.ieee.org
Showing the best result for this search. See all results