Description:
For decades, high-performance computing systems have focused on increasing maximum performance at any cost. A consequence of the devotion towards boosting performance significantly increases power consumption. The most powerful supercomputers require up to 10 megawatts of peak power – enough to sustain a city of 40,000. However, some of that power may be wasted with little or no performance gain, because applications do not require peak performance all the time. Therefore, improving power and performance efficiency becomes one of the primary concerns in parallel and distributed computing. Our goal is to build a runtime system that can understand power-performance tradeoffs and balance power consumption and performance penalty adaptively.
In this thesis, we make the following contributions. First, we develop a MPI runtime system that can dynamically balance power and performance tradeoffs in MPI applications. Our system dynamically identifies power saving opportunities without prior knowledge about system behaviors and then determines the best p-state to improve the power and performance efficiency. The system is entirely transparent to MPI applications with no user intervention. Second, we develop a method for determining minimum energy consumption in voltage and frequency scaling systems for a given time delay. Our approach helps to better analyze the performance of a specific DVFS algorithm in terms of balancing power and performance. Third, we develop a power prediction model that can correlate power and performance data on a chip multiprocessor machine. Our model shows that the power consumption can be estimated by hardware performance counters with reasonable accuracy in various execution environments. Given the prediction model, one can make a runtime decision of balancing power and performance tradeoffs on a chip-multiprocessor machine without delay for actual power measurements. Last, we develop an algorithm to save power by dynamically migrating virtual machines and placing them onto fewer physical machines depending on workloads. Our scheme uses a two-level, adaptive buffering scheme which reserves processing capacity. It is designed to adapt the buffer sizes to workloads in order to balance performance violations and energy savings by reducing the amount of energy wasted on the buffers. Our simulation framework justifies our study of the energy benefits and the performance effects of the algorithm along with studies of its sensitivity to various parameters.