Abstract:
The advances in the transistor technology have resulted in the fabrication of
tens of processing units or cores on a single chip. These multi-processor chips
are widely known as Many-Cores and have largely replaced the single-core
processors in computing devices. However, the underlying concentration of
cores on a single chip makes the heat-dissipation a bottle-neck in gaining the
required performance. The excessive heating of chip leads to reliability issues
and in worst cases may even result in physical damage to the chip.
Several techniques for moderating heating issues have been proposed at
both the hardware and software levels. Techniques like Dynamic Voltage
and Frequency Scaling (DVFS) handle thermal issues at the expense of per
formance by tuning the operating voltage and frequency of the cores. An
alternate solution is to carry out the Dynamic Thermal Management (DTM)
by means of task migration among the cores, resulting in a lesser perfor
mance degradation as compared to DVFS techniques. DTM techniques can
be classi ed as central (c-) or distributed (d-), based on a single or multiple
DTM controllers employed on a many-core chip.
Traditionally, DTM techniques are analyzed using simulation or emula
tion methods. However, the growing number of cores, the varied nature of
DTM techniques and the inherent sampling-based nature of traditional analy
sis techniques, like simulation and emulation, makes a complete and rigorous
viii
analysis of these DTM techniques almost impossible. Also these analysis
technique cannot provide a comprehensive comparison between DTM tech
niques owing to the wide range of corresponding DTM design parameters.
These limitations compromise the analysis integrity and may lead to the de
ployment of an ine cient and inaccurate DTM scheme on chip, which in turn
can cause permanent defects in the chip due to extreme temperatures.
In this thesis, we utilize model checking, i.e., a rigorous formal veri cation
technique, to overcome the above mentioned limitations. In particular, the
thesis proposes a modeling, veri cation, and analysis methodology for DTM
techniques and identi es a set of generic properties for the formal veri ca
tion of task migration based DTM schemes, i.e., central and distributed. We
have developed the formal models for i) various aspects of many-core chips
like, power, thermal, task queues, and inter-core communication, ii) DTM
parameters like task migration policy and migration criterion, iii) di erent
task loads, i.e., single- and multi-threaded. The thesis also formalizes veri
cation properties that are applicable to both central and distributed DTM
schemes. The work also suggests a common ground for comparison of DTM
schemes by evaluating parameters like tasks migrations, stalls, completion,
creation of hot spots and time to achieve stability using the nuXmv model
checker.
To show the rigor and e cacy of our work, we apply our proposed analy
sis methodology to a variety of task migration based central and distributed
DTM schemes. The selected DTM schemes are veri ed against the formal
ized DTM properties, for a range of many-core system sizes, DTM parame
ters and di erent application pro les i.e., multi-threaded, malleable, and the
applications which do not support migration.