Thermal distribution and reliability prediction for 3D networks-on-chip
VNU Journal of Science: Comp. Science & Com. Eng, Vol. 36, No. 1 (2020) 65-77
Original Article
Thermal Distribution and Reliability Prediction
for 3D Networks-on-Chip
Khanh N. Dang1,*, Akram Ben Ahmed2, Abderazek Ben Abdallah3, Xuan-Tu Tran1
1VNU University of Engineering and Technology, Vietnam National University, Hanoi,
144 Xuan Thuy, Cau Giay, Hanoi, Vietnam
2National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, 305-8568, Japan
3University of Aizu, Aizu-Wakamatsu, Japan
Received 02 April 2020
Revised 02 June 2020; Accepted 06 June 2020
Abstract: As one of the most promising technologies to reduce footprint, power consumption and
wire latency, Three Dimensional Integrated Circuits (3D-ICs) is considered as the near future for
VLSI system. Combining with the Network-on-Chip infrastructure to obtain 3D Networks-on-
Chip (3D-NoCs), the new on-chip communication paradigm brings several advantages. However,
thermal dissipation is one of the most critical challenges for 3D-ICs, where the heat cannot easily
transfer through several layers of silicon. Consequently, the high-temperature area also confronts
the reliability threat as the Mean Time to Failure (MTTF) decreases exponentially with the
operating temperature as in Black’s model. Apparently, 3D-NoCs and 3D ICs must tackle this
fundamental problem in order to be widely used. However, the thermal analyses usually require
complicated simulation and might cost an enormous execution time. As a closed-loop design flow,
designers may take several times to optimize their designs which significantly increase the thermal
analyzing time. Furthermore, reliability prediction also requires both completed design and
thermal prediction, and designer can use the result as a feedback for their optimization. As we can
observe two big gaps in the design flow, it is difficult to obtain both of them which put 3D-NoCs
under thermal throttling and reliability threats. Therefore, in this work, we investigate the thermal
distribution and reliability prediction of 3D-NoCs. We first propose a new method to help simulate
the temperature (both steady and transient) using traffic values from realistic and synthetic
benchmarks and the power consumption from standard VLSI design flow. Then, based on the
proposed method, we further predict the relative reliability between different parts of the network.
Experimental results show that the method has an extremely fast execution time in comparison to
the acceleration lifetime test. Furthermore, we compare the thermal behavior and reliability
between Monolithic design and TSV (Through-Silicon-Via) based design. We also explore the
ability to implement the thermal via a mechanism to help reduce the operating temperature.
Keywords: Thermal dissipation, Reliability, Through-Silicon-Via, 3D-ICs, 3D-NoCs.*
_______
* Corresponding author.
E-mail address: khanh.n.dang@vnu.edu.vn
65
K.N. Dang et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 36, No. 1 (2020) 65-77
66
like to note that the activation energy of Copper
is much higher than CMOS material which
makes TSV more vulnerable than the normal
gates. Since TSV can act as a cooling device,
TSV-based NoC has a lower operating
temperature than Monolithic; however, TSV
also has lower reliability. Therefore, the
reliability differences between Monolithic and
TSV-based 3D-ICs need to be investigated.
While the thermal behavior could be
extracted by performing the real-chip, reliability
cannot be directly measured. Most industrial
Equation 1 by baking the chip under high
temperature to accelerate the failure [10-12].
In this work, we have investigated the
impact of the thermal dissipation difficulty of
Network on Chip based 3D-ICs by proposing a
method to predict the temperature and MTTF of
each region of the targeted system. We first use
commercial EDA tools to design and analyze
the power and energy per data bit of 3D-NoC
router. Then, we extract the number of bits and
the operating time of synthetic and PARSEC
benchmarks to obtain the average power
consumption of each router inside the network.
We then use a thermal emulation tool named
temperature of the system. By adopting the
Black’s model of reliability, the tool follows up
with a reliability prediction of the system. By
following the method, designers can fast extract
the potential hotspots inside the 3D-ICs and
predict the potential of the vulnerable regions
due to high operating temperatures. The results
also suggest the possible mapping of fluid
contribution of this work is as follows:
1. Introduction
3D Networks-on-Chip (3D-NoCs), as a
result of combining Networks-on-Chip (NoCs)
considered as one the most promising
parallelism and scalability of the NoCs to 3D-
ICs, we even obtain lower power consumption,
shorter wire length while reducing the design
area cost by several times. Among several
3D-ICs, Through-Silicon-Via which constitutes
as inter-layer wire is one of the near-future
technologies. Monolithic 3D ICs is another
method to implement the 3D-ICs [4, 5]. With
both technologies, we expect to have multiple
layers of the system. To support communication
within the system, 3D-NoCs offer a router-
based infrastructure where the 3D mesh
topology is used.
Despite several advantages, 3D-ICs and
3D-NoCs have to confront the thermal
dissipation issue. The temperature variation
between the two layers has been reported to
conducted an experiment of four-layer and 48
cores which gives the temperature variation up
to 10°C between a single layer. The main reason
for thermal dissipation difficulty in 3D-ICs is the
top layers act as obstacles that prevent the heat
could be dissipated by the heatsink. To solve this
[8] has been proposed.
By having higher operating temperatures, it
is apparent that 3D-NoCs easily encounter
thermal throttling. Moreover, in terms of
reliability, there is an expected acceleration in
the failure rate (or a reduction in Mean-time-to-
Failure). For semiconductor devices, one of the
most well-known models of thermal impact in
fault rate acceleration πT is:
- A platform to model the power,
temperature, and reliability of any NoC
systems. Here, we specify for 3D-NoCs but the
technique is general and can be applied for the
traditional planar NoC systems.
- The reliability analyses of Monolithic and
TSV-based NoCs. While TSV-based NoCs
have a lower operating temperature, TSV’s
material (Copper) has lower reliability.
where A is constant, J is the energy, kB is
Boltzmann constant, Eais activation energy and
T is the temperature in Kelvin. Here, we would
K.N. Dang et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 36, No. 1 (2020) 65-77
67
- Exploration and comparison between
different layout strategies and cooling methods.
The remaining part of this paper is
organized as follows. Section 2 surveys the
existing works. Section 3 describes the
proposed method in detail. Experimental results
are discussed in Section 4. Finally, Section 5
concludes this work.
be obtained by its switching activity. By
obtaining the number of flits went through the
router during simulation, it can estimate the
dynamic power consumption. Meanwhile, the
static power consumption is constant for the
same configuration (voltage, frequency,
power consumption as dynamic and static power.
Physical parameters such as wire length and
leakage current are calculated to estimate the
estimate the power consumption of the system
based on the existing values. Other works in
[19][20] also consider dynamic voltage frequency
scaling in power consumption.
While these works can help estimate the
power consumption of our system, we observe
it is not the most accurate one because of the
differences in design choice and library.
Therefore, in this work, we propose our power
extraction method. We use the EDA tools to
estimate the dynamic and static power and then
combine with the switching of the routers in the
used benchmarks.
2. Related Works
In this section, we summarize the literatures
related to our proposed method. We start with
the power model and then present the work on
thermal estimation. Finally, the reliability
estimations for 3D-NoCs are presented.
2.1. Power Modeling for 3D Network-on-Chip
To measure the power consumption of a
3D-IC, the straight forward method is to
fabricate and set up a measuring system [16].
However, it is difficult to obtain such a system,
especially designing and fabricating the chip are
expensive, time-consuming and designers want
to estimate the value before sending to
production. Therefore, modeling the power
consumption is a necessary step.
2.2. Thermal Behavior Prediction for 3D
Network-on-Chip
Once we obtain the power consumption of
modules within a system, we can estimate the
the ealier tools to help estimate the temperature
grid. The 6th version of HotSpot now can
estimate the temperature of 3D-ICs. There are
as Hotspot by using the finite element method,
3D-ICE focuses on the potential of liquid
layout strategies and liquid cooling for 3D-ICs.
To model the power of any digital IC
system, two major parts which are static and
dynamic power are considered as follows:
where is the switching probability (or activity
ratio),
is the clock frequency,
is the load
capacitance,
is the leakage current and
is
the supply voltage. Based on Equation 2, common
EDA tools can estimate the power consumption
based on the parameter of the library and the
switching activity. In fact, power estimation tool
such as PrimeTime requires switching activity to
obtain the most accurate result.
Using Equation 2 can estimate the power
consumption of any circuit; however, for a fast
prediction, the power consumption of NoCs can
2.3. Reliability Prediction for 3D Network-on-Chip
By having the temperature of the system,
we now can estimate the potential reliability.
As we previously have metioned, Black’s
models for CMOS designs. MIL-HDBK-217F
K.N. Dang et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 36, No. 1 (2020) 65-77
68
model of reliability acceleration related to
activation energy also varies among materials.
The output of reliability can also affect
models to estimate the reliability of the system.
Among these models, HRD4 consider the
reliability as the same for the chip bellow 70°C.
The rest of the models follows the exponential
redundancies mapping as
a
close loop.
Consequently, designers can further optimize
the system to have the most balancing point of
temperature, reliability, and area overhead. In
the following part, we explained in detail each
part of the proposed method.
acceleration
(in Kelvin).
with
operation
temperature
On the other hand, industrial approaches on
reliability prediction [10-12] are to bake the
chip to high temperature and measure the
average time to failure of the samples. By using
Black’s model, they can estimate the potential
lifetime reliability under normal temperature.
3. Proposed Method
Figure 1 shows the proposed method for the
thermal and reliability prediction of 3D-NoCs.
We first built Verilog HDL of 3D-NoC. Then,
synthesis and place & route are the following
steps to obtain the layout, netlist file, wire
length, and physical parameters.
We then perform post-layout simulation and
use Synopsys PrimeTime to extract the power
consumption of the system. Based on the number
of data-bit, we further extract the energy per data
bit. Then, we now can estimate the power
consumption of all benchmarks by multiplying
the obtained value with the number of bits per
router per time. The power consumption of each
router is taken to the temperature estimator tool
(Hotspot 6.0) to obtain the temperature map. At
the end of this step, we obtain all temperature
maps of all benchmarks.
One notable thing in 3D-NoCs is the
possibility to have redundant Through-Silicon-
Vias (TSVs). TSVs are usually made out of
Copper and have a larger size than normal wire
which can dissipate heat faster than normal
silicon. Monolithic 3D-ICs fails to have the
same feature since the via is extremely small.
Consequently, we take the redundancy mapping
into the hotspot prediction.
Figure 1. Thermal and reliability prediction method
of 3D Networks-on-Chip.
We would like to note that our method
reuses and follows the principle of existing
works in academic and industrial approaches
[10-12, 22-24].
3.1. Design of 3D Network-on-Chip
Here, we adopted our previous work in [3]
with some modifications where the TSVs of a
router are divided into four groups and placed
in four directions (west, east, north, south) of
the router to support sharing and fault tolerance.
However, we here provide more flexibility in
the design since fault tolerance is not our
objective of this work. Figure 4 shows the
architecture of our 3×3×3 Network on Chip.
Each router can connect to at most six
neighboring routers in six directions and one
local connection to its attached processing
element. The inter-layer connections are TSVs
and we support optional the redundant TSV
group (yellow TSVs) which can be used to
repair a faulty group in the router. Borrowing
and sharing mechanisms are another features
Once we can predict the temperature, we
can obtain the reliability prediction using the
Black’s model in Equation 1. Note that the
K.N. Dang et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 36, No. 1 (2020) 65-77
69
we support to have high reliability in our
system. More details on the fault tolerance
method can be seen in our previous work [3].
Each router receives a header flit of packet
and support routing inside the network. Based
on the destination, it forwards the header flit
and the following flits (body and tail flits) to the
desired port. Once the tail flit completes its
transmission, the router starts to route a
new packet.
module. Since routers are usually hotspots
inside the system, placing them near a hot area
can raise its temperature significantly. Here, by
surrounding by TSVs, we create isolation for
the router. Furthermore, Copper has low
thermal resistivity which can dissipate the heat
from the router to the upper layers. By doing so,
we can transfer then heat to the top layer and
the heatsink. In the evaluation section, we then
discuss the efficiency and cost of inserting
thermal via in our design.
Figure 3 shows the different between
Monolithic and TSV-based 3D-ICs. While TSV
is made out of Copper that dissipate thermal
faster than Silicon layers. However, there are
bonding layers between stacking using TSVs
which creates an isolation of thermal disspation
between them.
Figure 2. Layout option for 3D-NoC router:
(c) Surround TSV region.
3.2. EDA tools and Power Extraction
The following part of the method is to use
EDA tool to extract the power consumption.
Apparently, we can use any supported EDA to
obtain power consumption. For our experiment,
we use Synopsys Design Compiler, ICC and
PrimeTime to do the physical design and
extract the power consumption.
To extract the power, we perform a
heuristic transmission benchmark of a single
router. Here, we generate two packets of ten
flits in all possible directions. Because our
router supports returning the flit from it sending
ports, we have 7×7=49 possible directions. By
using PrimeTime, we can obtain the dynamic
and static power.
Here, we also classify the energy into static
and dynamic. While static power consumption
is stable, we keep the value as it is. For the
dynamic power, we calculate the total energy
and the energy per data bit.
Figure 3. 3D IC layer structure (heat sink on top)
of Monolithic 3D IC vs TSV-based 3D IC.
well optimized since it leases space between
routers in layout. Figure 2(a) shows the layout
different floorplans in this work. We first place
TSVs and router logics in separated regions as in
Figure 2 (b). Then, we place TSVs surrounding
the router logics as in Figure 2 (c). We can notice
that we reduce the size of the router significantly
by removing the empty space.
3.3. Power and Temperature Estimation
Once we obtain the energy per data-bit, we
can obtain the overall power consumption
as follows:
Among the two new layouts, Figure 2(c)
provides the best thermal balance because it
isolates the logic of a router to the nearby
K.N. Dang et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 36, No. 1 (2020) 65-77
70
ơ
Figure 4. Architecture of our 3D Network-on-Chip with the size of 3x3x3.
where Nbit is the number of a data bits in the
acceleration model in academics and industry.
We illustrate the MIL-HDBK-217F of the US
we could also adopt the existing model if
needed as in Figure 6. One common between
the model is the exponential curve of
acceleration of the fault rate with the
temperature. Note that HRD4 uses 70°C as the
threshold of reliability concern.
benchmark. We can also scale the power with
the dynamic frequency and voltage if needed.
Here, we also support dynamic scaling for
voltage and frequency by using Equation 2
where different voltage and frequency can be
converted using the following equations:
where V1,f1 and V2,f2 are two pairs of supply
voltage and frequency.
The power trace and floorplan are taken
into Hotspot 6.0 to obtain the thermal map of
the design. The results of Hotspot 6.0 are the
steady temperature of each router and its TSVs.
We can also support transient power and
temperature. However, since we consider
reliability as the major target, the steady
temperature is the most important value.
Figure 6. Normalized thermal acceleration
of fault rate.
3.4. Defect Mapping
Table 1 shows the fault rate mapping
fault rate is less than 2% at 70°C (343.15K).
However, once the IC operates at 80°C
(353.15K), its fault rate is 2.6× at 70°C
After getting the thermal map, we can
extract the reliability to obtain the defect map.
Figure 6 shows the normalized thermal
K.N. Dang et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 36, No. 1 (2020) 65-77
71
Table 2. Hardware complexity
of our 3D-NoC router
(343.15K) and 220× at 30°C (303.15K). By
mapping to fault rates, we can find the critical
part of the 3D-NoCs in terms of reliability.
Parameter
Area cost
Value
Table 1. Normalize fault rate of Copper TSV
38,838
Maximum Frequency 537.63 MHz
Operating Frequency 500 MHz
Temperature (K)
303.15
Normalize fault rate to 70°C
0.011537
Technology
Voltage
45nm (NANGATE 45)
1.1 V
313.15
0.039174
Static Power (at
500MHz)
Dynamic Power (at
500MHz)
7.64e-4 Watt
323.15
0.123317
333.15
0.362371
1.028e-2 Watt
343.15
1
353.15
2.605435
Simulation time
2.823200e-6 second
2.9022496e-8 Joule
9.2546e-13 Joule/bit
Energy
363.15
6.439561
Energy per data bit
373.15
13.94691
4.2. 3D-NoC System Power Estimation
4. Experimental Results
To estimate the power of 3D-NoC system,
we use Equation 3 with the scaling Equation 4
and 5 for different voltage and frequency pairs
if needed. Apparently, we need to obtain the
number of the bits through the routing during
its operation. Here, we perform both synthetic
benchmarks (Matrix, HotSpot, Uniform, and
is one of the most well-known benchmarks for
multi-core computing systems. Here, we use 64
core x64 processors as the processing elements
of the PARSEC benchmarks. Here, we only
extract the number of flits that went through the
routers to estimate the power consumption. The
power consumption of the processing elements
can be obtained by using McPAT [29];
however, it is out-of-scope of this work.
Figure 7 shows the power consumption of
our 3D-NoC under PARSEC benchmark. Here,
we scale the frequency to 2GHz to fit with the
configuration of gem5 using Equation 4 and 5.
Among these benchmarks, we observe the
benchmark cannel has the highest power
consumption and also the highest variation
(between the minimum and maximum power
of router).
In this section, we evaluate the 3D Network
Furthermore, we explore the idea of the
different floorplan and cooling strategies. At
first, we extract the power consumption from
the synthetic benchmark of a router. Then, we
estimate the power consumption of the 3D-NoC
system under various benchmarks. Then,
temperature and reliability prediction are
illustrated. In the final part, we compare
different strategies for layout and cooling.
4.1. 3D-NoC Router Power Estimation
We used the router model in our previous
and the energy. Note that we modified the
router with some optimizations and further fault
tolerances. We use NANGATE 45nm library
hardware complexity of the router is shown in
Table 2. We perform a heuristic benchmark for
this router by sending each port to all possible
ports two packets of ten flits of 32 bits. The
number of bits is 7×7×2×10×32= 31360 bits.
The desired injection rate is 1 flit/port/cycle.
The final results for static power and energy
per data bit are 7.66e-4 W and 9.246e-13
J/bit, respectively.
K.N. Dang et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 36, No. 1 (2020) 65-77
72
4.2. 3D-NoC Thermal Estimation
By using the power estimation of the
previous section, we conduct the thermal
shows the configurations for thermal estimation
using Hotspot 6.0. We modify the thermal
resistivity corresponding to our designed TSV
(Copper with the size of
)
using the following equation [30]:
Figure 7. Power consumption of our 3D-NoC under
PARSEC benchmarks.
where TIM is the thermal interface material.
The result of the thermal resistivity of the
layout in Figure 2(c) can be found in Table 3.
The final TSV area thermal resistivity is
0.0226mK/W.
Figure 8 shows the power consumption of
the 3D-NoC system under synthetic
benchmarks. We keep the frequency as of
500MHz and inject the flit with a maximum
inject rate. Note that we perform two Hotspot
benchmarks where two nodes are the
destination of 5% and 10% of total flits. We can
easily observe the significant drop when
increasing the number of flits to the hotspot
nodes. This can be explained by the congestion
created due more flits coming to these nodes
which extend the execution time of the system.
On the other hand, the matrix benchmark has
the lowest router power consumption. We also
notice that the synthetic benchmarks have much
higher power consumption than the PARSEC
benchmarks since no computation is taken in
this benchmarks. As a consequence, the
execution time is shorter, which makes the
power consumption higher than PARSEC.
Table 3. Configurations for thermal estimation
Parameter
Value
290
Router floor-plan
Floorplan
290
Figure 2(c)
One TSV area
Router logic area
4.06μm×4.06μm
220
220
Router logic utilization 80%
TSV area/utilization
35,700
/ 10.16%
Copper thermal
resistivity
0.0025mK/W
TIM thermal resistivity 0.25mK/W
TSV area thermal
resistivity
0.0226mK/W
H
Figure 8. Power consumption of our 3D-NoC under
synthetic benchmarks.
Figure 9. Temperature of our 3D-NoC under
PARSEC benchmarks.
K.N. Dang et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 36, No. 1 (2020) 65-77
73
To compare with Monolithic 3D-IC, we
the bonding layers between silicon layers. We
keep the thickness of the silicon layer as it is for
a fair comparison. Obviously, if we thin the
layer, the transfer of heat is much faster.
the PARSEC benchmark. With synthetic
benchmarks, TSV-based 3D-NoC is slightly
better than Monolithic ones.
4.4. Exploring Different Layout and Thermal
Dissipation Method
Figure 9 shows the router temperature
under the PARSEC benchmark. Here, we also
compare with the monolithic technology where
Figure 9, the TSV-based system has lower
operating temperature thanks to the ability to
transfer the heat of Copper TSVs. The
difference in temperature is around 1K at
the bottom layer and even reach 3.5K in the
cannel benchmark.
Figure 10 shows the operating temperature
under synthetic benchmarks of our 3D-NoC.
We can easily notice that the operating
temperature of Monolithic systems is much
higher than TSV ones since we stress the
system under its saturation points. The highest
temperature of Monolithic 3D-NoC even
reaches 351.64 K (78.49°C). The hottest layer
of the TSV-based system has a similar
temperature as the coolest layer of Monolithic
3D-NoC.
In this section, we explore different layouts
and their thermal dissipation behaviors for our
3D-NoC. First, we perform thermal and
reliability prediction for our layout in Figure
2(b). Then, we insert four thermal TSVs with
the size 15
15
in four corners of the
router floorplan in Figure 2(c). This size of
TSV is still feasible in the existing manufacture
Keep-out-Zone
distance this thermal TSV to avoid mechanical
stress. The thermal TSV went through all layers
of TSVs but did not contact with the heatsink.
The heatsink and thermal TSV are separated by
a layer of thermal interface material.
Figure 11. Normalized MTTF of our 3D-NoC under
PARSEC benchmarks.
Figure 10. Temperature of our 3D-NoC under
synthetic benchmarks.
4.2. 3D-NoC Reliability Estimation
In this section, we use the Black’s model to
evaluate the MTTF of 3D-NoC. Figure 11 and
Figure 12 show the normalized MTTF of each
layer to 323.15K (50°C) under PARSEC and
synthetic benchmarks. Here, we can observe the
TSV-based 3D-NoC dominates Monolithic in
Figure 12. Normalized MTTF of our 3D-NoC under
synthetic benchmarks.
K.N. Dang et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 36, No. 1 (2020) 65-77
74
Figure 13 and Figure 14 show the thermal
significantly cool down the bottom layer. Also,
liquid cooling could be extremely helpful in
this situation.
In comparison to the traditional 2D-ICs, we
observe that the TSV-based ICs have higher
operating temperatures. The 2D-based 3D-
NoCs operate under 319K and 322K with
behaviors under PARSEC and synthetic
benchmarks for different layouts and cooling.
We can notice that the layout in Figure 2(b) has
the worst thermal behavior among the TSV
designs. On the other hand, adding thermal
TSV can help reduce the operating temperature
significantly. By adding four TSVs, we can
even reduce the temperature by nearly 1K at the
bottom layer in the uniform benchmark which
is the most stressed benchmark. Other
benchmarks’ results also show a slight
improvement in thermal behaviors.
One thing we can easily notice the top
layer’s temperatures do not change. This is due
to the fact it is already cool down by the
heatsink and adding TSV cannot help it reduces
the temperature. Also, the heatsink temperature
is raised near the top layer temperature which
reduces the ability to transfer heat. If the
PARSEC
and
synthetic
benchmarks,
respectively. On the other hand, TSV-based
system increases at most 10K in maximum
temperature with the layout in Figure 2(b).
In summary, different layouts can make
different thermal behaviors. The layout in
Figure 2(b) does not surround the router by
TSV area, therefore, the router could heat up
each other and reach a higher temperature. On
the other hand, adding thermal TSV to cool
down the bottom layer is helpful since it can
reduce nearly 1 Kelvin in the worst case. By
mapping to the reliability, we can easily obtain
a 2×~3× improvement of MTTF.
thermal TSV can contact the heatsink, it can
G
Figure 13. Thermal behavior of different layouts and cooling methods under the PARSEC benchmark.
Figure 14. Thermal behavior of different layouts and cooling methods under the synthetic benchmarks.
K.N. Dang et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 36, No. 1 (2020) 65-77
75
4.5. Execution Time
TSV-based 3D-NoCs due to two major reasons:
i) TSVs act like thermal conduct devices and
ii) Monolithic 3D-ICs has a higher density than
TSV-based system. However, we would like to
note that Monolithic 3D-ICs have lower area
cost than TSV-based systems.
advanced methods to reduce the operating
temperature of the system. Although we have
not explored the ability of this method, it has
shown promising efficiency for 3D-ICs [7].
With a fast velocity of the fluid, we expect the
system can be cooled down significantly.
However, we would like to note that fluid
cooling has unknown reliability which needs to
be carefully investigated for being widely used.
In this work, we evaluate the proposed
method using a system with Xeon E5-2620 8
cores 2.1GHz, 16GB RAM and Linux
Subsystem and PowerShell under Windows 10.
The platform is written under C++, Python, and
Bash. The execution time is measured using
command time under Linux and Measure-
Command under Windows PowerShell. Here,
the simulation time of PARSEC and synthetic
benchmarks are not considered because they are
separated from our flow. As shown in Table 4,
all steps in our flow perform under two seconds.
Our method easily outperforms in terms of
execution time the fabrication-based methods
which usually take hours regardless of designing,
fabrication and assembly time [10-12].
5. Conclusion
Table 4. Execution time of the proposed flow
In this work, we proposed a platform to
quickly estimate the power, thermal behavior,
and reliability of 3D-NoC systems. The method
has shown extremely short execution time. We
also analyze and simulate the reliability of TSV
and Monolithic 3D-ICs. Furthermore, we
explore and compare different layout strategies
and cooling methods.
From our experiments with 3D-NoC, we
can realize that lower index layers have higher
operating temperatures and are more critical in
terms of reliability. Although this conclusion
cannot cover all possible cases; this is a
consensus of the tested benchmark Based on
these experiments, designers can decide their
fault-tolerance or thermal dissipation up on
their required specification.
Work Step
Time
Ours
Power extraction (one
1.22 s
benchmark)
Floorplan generate
0.095 s
81 s
Temperature estimation
(one benchmark)
Reliability estimation (12 1.12 s
benchmarks)
Reliability test
96h
The longest step in
reliability test
1000h
Lifetime acceleration test 100-5000h
Although our approach is fater than
real-chip testing [10-12], it cannot as accurate
as the baking tests due to the deviations during
simulation and the potential of manufacturing
variation. However, as the close-loop design
flow, having an understand of the potential
reliability threat is helpful for designers.
In the future, advanced cooling techniques
such as liquid could be investigated. The impact
of DVFS and fault tolerance on performance
and thermal behavior also could be studied.
4.6. Discussion
In this section, we would like to discuss
some technical details of our methods.
Advantages and drawbacks are also mentioned
in this part.
In our evaluation, we point out that
Monolithic has a higher temperature than
Acknowledgments
This research is funded by the Vietnam
National Foundation for Science and
Technology Development (NAFOSTED) under
grant number 102.01-2018.312.
K.N. Dang et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 36, No. 1 (2020) 65-77
76
[10] Hamada, M. Dorothy June, J. William, Roesch,
References
"Evaluating device reliability using wafer-level
methodology", CS Mantech Conference, 2008.
[1] Khanh N. Dang, Akram Ben Ahmed, Xuan Tu
Tran, Yuichi Okuyama, Abderazek Ben Abdallah,
“A Comprehensive Reliability Assessment of
[11] Renesas’s Semiconductor Reliability Handbook
51zz0001ej0250.pdf/, 2017 (access 17 March 2020).
Fault-Resilient
Network-on-Chip
Using
[12] Toshiba’s
Reliability
Handbook
Analytical Model,” IEEE Transactions on Very
Large Scale Integration (VLSI) Systems. 25(11)
(2017) 3099-3112.
handbook-tdsc-en.pdf /, 2018 (access 17 March 2020).
[2] K. Banerjee K. Banerjee, S.J. Souri, P. Kapur and
K.C. Saraswat, “3-D ICs: A novel chip design for
improving deep-submicrometer interconnect
performance and systems-on-chip integration,”
[13] Zhang, Runjie, Mircea R. Stan, Kevin
Skadron,
“Hotspot
6.0:
Validation,
acceleration and extension”, University of
Virginia, Tech, Rep, 2015.
Proc.
IEEE.
89(5)
(201)
602-633.
[14] Sridhar, Arvind, et al., "3D-ICE: Fast compact
transient thermal modeling for 3D ICs with inter-
[3] Khanh N. Dang, Akram Ben Ahmed, Yuichi
Okuyama, Abderazek Ben Abdallah, “Scalable
design methodology and online algorithm for
TSV-cluster defects recovery in highly reliable
3D-NoC systems”, IEEE Transactions on
Emerging Topics in Computing, 2017, pp. 1-14
(in-press).
tier
liquid
cooling",
2010
IEEE/ACM
International Conference on Computer-Aided
Design (ICCAD), IEEE, 2010.
[15] Scott Ladenheim, Yi-Chung Chen, Milan
Mihajlović, Vasilis F. Pavlidis, "The MTA: An
Advanced and Versatile Thermal Simulator for
Integrated Systems", IEEE Transactions on
Computer-Aided Design of Integrated Circuits
and Systems 37(12) (2018) 3123-3136.
[16] Erdmann, Christophe, et al., "A heterogeneous
3D-IC consisting of two 28 nm FPGA die and 32
reconfigurable high-performance data converters",
IEEE Journal of Solid-State Circuits 50(1) (2014)
258-269.
[4] Wong, Simon, et al. "Monolithic 3D integrated
circuits" International Symposium on VLSI
Technology, Systems and Applications (VLSI-
TSA), IEEE, 2007.
[5] Y.J. Park et al., “Thermal Analysis for 3D Multi-
core Processors with Dynamic Frequency
Scaling”, in IEEE/ACIS 9th Int, Conf, on
Computer and Information Science, Aug 2010,
pp. 69-74.
[6] Van der Plas, Geert, et al., "Design issues and
considerations for low-cost 3-D TSV IC
technology". IEEE Journal of Solid-State Circuits
46(1) (2010) 293-307.
[7] D. Cuesta et al., “Thermal-aware floorplanner for
3D IC, including TSVs, liquid microchannels and
thermal domains optimization,” Applied Soft
[17] Kahng, B. Andrew, et al., "ORION 2.0: A fast and
accurate NoC power and area model for early-
stage design space exploration", Design,
Automation & Test in Europe Conference &
Exhibition, IEEE, 2009.
[18] Lee, Seung Eun, and Nader Bagherzadeh, "A high
level power model for Network-on-Chip (NoC)
router", Computers & Electrical Engineering
35(6) (2009) 837-845.
Computing
34
(2015)
164-177.
[8] Park, Changyok, "Dummy TSV to improve
process uniformity and heat dissipation", U.S.
Patent 10, 181, 454, 15 Jan, 2019.
7A1/en (access 16 March 2020).
[19] Lee, Seung Eun, Nader Bagherzadeh, "A variable
frequency link for a power-aware network-on-
chip (NoC)", Integration 42(4) (2009) 479-485.
[20] Lebreton, Hugo, Pascal Vivet, "Power modeling in
SystemC at transaction level, application to a DVFS
architecture", 2008 IEEE Computer Society Annual
Symposium on VLSI, IEEE, 2008.
[9] J.R. Black, “Mass transport of aluminum by
momentum
exchange
with
conducting
electrons”, in 6th Annual Reliability Physics
Symposium (IEEE), IEEE, 1967, pp. 148-159.
[21] Khanh N. Dang Akram Ben Ahmed, Abderazek
Ben Abdallah, Xuan-Tu Tran, “TSV-OCT: A
K.N. Dang et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 36, No. 1 (2020) 65-77
77
Scalable
Online
Multiple-TSV
Defects
[28] Bienia, Christian, et al., "The PARSEC
benchmark suite: Characterization and
Localization for Real-Time 3-D-IC systems”
IEEE Transactions on Very Large Scale
Integration Systems 28(3) (2020) 672 - 685.
architectural implications", Proceedings of the
17th international conference on Parallel
architectures and compilation techniques, 2008.
[22] United States of America: Department of Defense,
Military Handbook: Reliability Prediction of
Electronic Equipment: MIL-HDBK-217F, 1991.
[23] J.B. Bowles, “A survey of reliability-prediction
procedures for microelectronic devices”, IEEE
[29] Li, Sheng, et al., "McPAT: an integrated power, area
and timing modeling framework for multicore and
manycore architectures", Proceedings of the 42nd
Annual IEEE/ACM International Symposium on
Microarchitecture, 2009.
Trans,
Rel.
41(1)
(1992)
2-12.
[30] J. Meng, K. Kawakami, A.K. Coskun,
“Optimizing energy efficiency of 3-d multicore
systems with stacked dram under power and
thermal constraints”, in DAC Design Automation
Conference 2012, IEEE, 2012, pp. 648-655.
[24] J. Srinivasan et al., “Lifetime reliability: Toward an
architectural solution”, IEEE Micro. 25(3) (2005)
[31] Khanh N. Dang, Akram Ben Ahmed, Abderazek
Ben Abdallah, Michael Corad Meyer, Xuan-Tu
Tran, “2D Parity Product Code for TSV online
fault correction and detection”, REV Journal on
Electronics and Communications (in-press).
[25] NanGate Inc., “Nangate Open Cell Library 45nm”
http://www.nangate.com/, 2016 (accessed 16 June 2016).
[26] NCSU
Electronic
Design
Automation,
“FreePDK3D45 3D-IC process design kit”,
tents/, 2016 (accessed 16 June 2016).
[32] Samal, Sandeep Kumar, et al., "Fast and accurate
thermal modeling and optimization for monolithic
3D ICs", 2014 51st ACM/EDAC/IEEE Design
Automation Conference (DAC), IEEE, 2014.
[27] Binkert, Nathan, et al., "The gem5 simulator",
ACM SIGARCH computer architecture news
39(2) (2011) 1-7.
P
Bạn đang xem tài liệu "Thermal distribution and reliability prediction for 3D networks-on-chip", để tải tài liệu gốc về máy hãy click vào nút Download ở trên
File đính kèm:
- thermal_distribution_and_reliability_prediction_for_3d_netwo.pdf