

# An optimization method for NBTI-aware design of domino logic circuits in nano-scale CMOS

## Masaoud Houshmand Kaffashian<sup>1a)</sup>, Reza Lotfi<sup>1</sup>, Khalil Mafinezhadand<sup>1</sup>, and Hamid Mahmoodi<sup>2</sup>

<sup>1</sup> Dept. of Electrical Engineering, Ferdowsi University of Mashhad Mashhad, Iran

<sup>2</sup> Dept. of Electrical and Computer Engineering, San Francisco State University, CA, USA

a) ma\_ho316@stu-mail.um.ac.ir

**Abstract:** In this paper, the impact of negative bias temperature instability (NBTI) on dynamic logic circuits is analyzed and a design technique for a wide fan-in domino gate based on Genetic Algorithm (GA) optimization is proposed. In this technique, the degraded delay due to NBTI during the lifetime of the circuit is minimized subject to the constraints on area, power consumption and unity noise gain (UNG). The proposed optimization method is implemented in a 65-nm technology for a lifetime of 3 years. In comparison with a typical design, the optimized results show an improvement of more than 21.6% in delay during the circuit lifetime with a negligible change in the power and the UNG. The proposed method has the advantage that it can be used for any desired circuit lifetime with any reasonable constraints on design parameters, just with setting the corresponding parameters in the algorithm.

**Keywords:** dynamic logic, NBTI, reliability, genetic algorithms **Classification:** Integrated circuits

#### References

- J. R. G. David and N. Bhat, "A low power, process invariant keeper for high speed dynamic logic circuits," *Proc. IEEE International Symposium* on Circuits and Systems (ISCAS), pp. 1668–1671, 2008.
- [2] K. Yelamarthi and C.-I. H. Chen, "Process variation aware transistor sizing for load balance of multiple paths in dynamic CMOS for timing optimization," *Journal of Computers (JCP)*, vol. 3, no. 2, pp. 21–28, Feb. 2008.
- [3] V. Huard, M. Denais, and C. R. Parthasarathy, "NBTI degradation: From physical mechanisms to modelling," *Microelectron Reliab*, vol. 46, no. 1, pp. 1–23, Jan. 2006.
- [4] S. Borkar, "Electronics beyond nano-scale CMOS," Proc. ACM/IEEE Design Automation Conf., pp. 807–808, 2006.
- [5] C. H. Kim, K. Roy, S. Hsu, R. Krishnamurthy, and S. Borkar, "A process variation compensating technique with an on-die leakage current sensor for





nanometer scale dynamic circuits," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 14, no. 6, pp. 646–649, 2006.

- [6] "Predictive Technology Model," [Online] http://www.eas.asu.edu/~ptm.
- [7] W. Wang, Z. Wei, S. Yang, and Y. Cao, "An Efficient Method to Identify Critical Gates under Circuit Aging," Proc. IEEE/ACM Int. Conf. Computer-Aided Design, pp. 735–740, 2007.
- [8] H. Mahmoodi-Meimand and K. Roy, "Diode-footed domino: a leakagetolerant high fan-in dynamic circuit design style," *IEEE Trans. Circuits* Syst. I, Reg. Papers, vol. 51, no. 3, pp. 495–503, 2004.

## **1** Introduction

High fan-in domino OR circuits or similar structures are widely used in the design of register and cache array bit lines due to their higher performance and compactness. In a domino gate, a keeper transistor is necessary to enhance the immunity of gate against charge sharing and charge loss due to leakage and noise in the input terminals. However, the use of the keeper transistor degrades the performance of the gate during evaluation by providing contention current [1]. Proper transistor sizing in domino gates has become one of the main challenges in timing optimization of dynamic circuits since it has a strong impact on delay, power and noise immunity [2].

Besides, NBTI in pMOS transistors has become a major reliability concern in digital circuit design especially in deep submicron technologies [3] and its effects must be considered for a robust design. NBTI occurs when a pMOS transistor is negatively biased ( $V_{gs} = -V_{DD}$ ) at elevated temperatures and causes the absolute value of the threshold voltage ( $|V_{th}|$ ) to increase. This shift can increase the delay of the transistor and degrade the circuit speed by about 10%–20% [4].

In this paper, an NBTI-aware sizing optimization technique using GA is proposed. In the proposed optimization technique, the maximum degraded delay of the circuit is minimized while satisfying the constraints on area, power consumption and UNG during the lifetime of the circuit. UNG is a metric of robustness for a domino gate and is defined as the DC input noise voltage generating the equal level of noise in the final output of the domino gate [5].

## 2 Impact of NBTI degradation on wide fan-in domino logic

To investigate the NBTI impact on dynamic logic circuits, a dynamic OR gate with a fan-in of 8, a supply voltage of 1.2 V, and a UNG greater than  $0.25^*\text{V}_{\text{DD}}$ , has been designed based on the standard footed domino structure shown in Fig. 1 (a). We have used a typical design methodology in which the keeper transistor is sized so that its saturation current is equal to 15% of the current provided by the PDN, in order to provide a reasonable noise immunity of the dynamic node without excessively slowing down its discharge transition. The nMOS evaluation transistor width has been considered to be





equal to 1.2 times that of the width of the transistors of pull-down network, as a compromise between the speed (which requires this transistor to be wide) and the clock load (since the clock must drive the gate capacitance of this transistor). The output static inverter is skewed for fast low-to-high transition so the pMOS size is wider than the nMOS size by a factor of four. The capacitive load has been considered to be equal to the input capacitance of a 16 × minimum size inverter. The circuit has been simulated using the 65-nm Predictive Technology Models (PTM) [6] by HSPICE.



Fig. 1. (a) Standard Footed Domino OR Gate (b) Change in delay due to the degradation of different pMOS transistors in the circuit

Considering a duty cycle of 50% for the clock signal, and an activity factor of 0.5 for the inputs, which is a typical situation; the NBTI degradation for the pMOS transistors of the circuit during a lifetime of 3 years ( $\approx 10^8 s$ ) has been calculated. We have used a simplified long term NBTI prediction model





developed in [7]. This model can accurately estimate  $\Delta V_{\rm th}$  at a given time t as follows

$$\Delta V_{th} = b.\alpha^n . t^n \tag{1}$$

where  $b = 3.9 \times 10^{-3} V \cdot s^{-1/6}$ ,  $\alpha$  is the input signal probability (the fraction of time spent in stress state for a period of time), n is the time exponential constant and equals to 0.16.

To understand the effect of the degradation for each of the pMOS transistors available in the circuit, the circuit has been simulated by HSpice while considering the NBTI-induced  $\Delta V_{th}$  in each of the pMOS transistors separately and the circuit delay has been measured. The simulations results are shown in Fig. 1 (b). As it can be seen, NBTI degradation of the precharge transistor has a negligible impact on delay since it is off during evaluation and is not effective on evaluation delay. The NBTI degradation of the keeper decreases the circuit delay due to the fact that the keeper gets weaker, leading to less contention between the keeper and the pull-down network. Consequently the transition in the dynamic node will occur faster. However, the degradation of the inverter pMOS transistor increases the delay because of slower low-to-high transition in the evaluation phase. When considering the NBTI degradation for all pMOS transistors of the circuit, the increase of delay due to the degradation of the inverter pMOS transistor is somehow alleviated by the keeper degradation.

From the above analysis it can be seen that the impact of NBTI degradations in keeper and the inverter pMOS are in opposite directions. The sensitivity (or the change rate) of the overall performance to the threshold voltage shift depends on the sizing of these two transistors. However any change in the sizing of these transistors may necessitate resizing other transistors available in the circuit to keep the initial performance metrics within their acceptable range of values leading to a complex design space. Consequently, the Genetic Algorithm optimization can be a suitable option to design an NBTI-aware wide fan-in domino gate.

#### **3 NBTI-aware optimization**

Denoting the degrading delay of the circuit during the circuit lifetime by delay(t), we formulated the problem as follows

Minimize: f=max {delay (t)};  $t_0 < t < t_{lifetime}$  subject to:

- 1) max (power (t))  $< \max_{\text{power}}$
- 2)  $\min(\text{noise margin } (t)) > \min_{u} UNG$

3) area < max\_area (specified by sizing constraints as explained in section 4)

The following flow has been used as our design methodology

1) Setup the design constraints: required lifetime  $(t_{lifetime})$ , power constraint, UNG constraint, area constraints

2) Compute  $\Delta V_{th}$  degradation for  $t_{lifetime}$  time using Eq. (1) for each of the pMOS transistors in the circuit.





#### 3) Apply GA and obtain the NBTI-aware optimal design

We have defined a chromosome format for our GA consisting of the transistor widths of the circuit. The lengths of all the circuit transistors were considered equal to 65 nm. Single point crossover has been used in the GA. We have used random binary toggling of chromosome bits as mutation. The roulette-wheel method is used for selection. The best solution of each generation (known as elite) has been directly transferred to the next generation.

The Genetic Algorithm has been implemented in MATLAB. To evaluate the fitness function, HSPICE is run as a simulator linked to the MATLAB. The algorithm was run for a domino OR gate with 8 inputs driving a capacitive load equal to the input capacitance of a 16  $\times$  minimum size inverter. Delay is measured from input IN0 to OUT (referring to Fig. 1 (a)) in the evaluation phase. The minimum UNG normalized to  $V_{\text{DD}}$  has been considered to be 25%. In order to measure the UNG, a pulse emulating the input noise is applied to all inputs and the output is measured. The pulse level is changed until the output level equals the input pulse. The maximum permitted power was chosen to be 5% more than the average power of the non-degraded circuit with a typical design (described in section 2). Power consumption has been considered as the average power in one period of clock measured when one input goes high, discharging the precharge node and producing a low-to-high transition in the output in the evaluation phase.

In order to restrict the area and to obtain a faster convergence of GA runs, the maximum widths of the transistors in the pull-down network have been restricted to 24 times of the minimum transistor length. The width of the nMOS evaluation transistor has been restricted to 1.5 times that of the width of the transistors in the pull-down network. The output static inverter is usually a high skewed inverter so the maximum width of its nMOS transistor was restricted to 6 times of the minimum transistor length and the maximum width of its pMOS transistor was restricted to 24 times of the minimum transistor length. To find a reasonable maximum value for the keeper width, we used the keeper ratio criteria. The keeper ratio (K) is defined as [8]

$$K = \frac{\mu_p(W/L)_{keeper}}{\mu_n(W/L)_{evaluation}}$$
(2)

where  $\mu_n$  and  $\mu_p$  are the mobilities of electrons and holes in a given technology, respectively. The keeper ratio provides a way to trade off robustness and performance in standard domino gates. As the size of the keeper transistor increases, the noise immunity increases; however, the performance degrades and the power consumption increases. So the keeper width has been restricted to a value satisfying the condition K < 0.5.

Based on the mentioned constraints, we have run our design flow. Table I shows the results. The transistor sizings as well as the circuit performance parameters at the beginning and at the end of the specified circuit lifetime are presented in this table. As it can be seen, all design constraints have been met.

In order to provide a comparison, the difference of the degrading delay





Table I. Optimization Results

| $W_{precharge}$                | W <sub>pull-down</sub> | $W_{evaluation}$ | W <sub>keeper</sub>                   | W <sub>inv-pMOS</sub> | $W_{inv-nMOS}$ |
|--------------------------------|------------------------|------------------|---------------------------------------|-----------------------|----------------|
| 0.55u                          | 0.533u                 | 0.64u            | 0.26u                                 | 0.3965u               | 0.0845u        |
| Delay(t=0s) = 53.358 ps        |                        |                  | $Delay(t=10^8 s) = 56.796 ps$         |                       |                |
| Average-Power(t=0s)= 25.238 uW |                        |                  | Average-Power( $t=10^8$ s)= 24.694 uW |                       |                |
| UNG(t=0s)=338.8mV              |                        |                  | $UNG(t=10^8 s)=346 mV$                |                       |                |

values of the optimized circuit and those of the circuit with the typical design (presented in section 2) during the circuit lifetime are shown in Fig. 2. All values are normalized to the delay of the typical circuit at the corresponding time. It can be seen that the delay in the GA-based optimized circuit shows an improvement of more than 21.65% in comparison with the delay of the typical circuit during the specified lifetime.



Fig. 2. The difference of delay change in the optimized design and the typical design

The measurement of power in the optimized circuit shows an increase of 1.5% to 2.04% during the circuit lifetime in comparison with the power in the typical circuit. The UNG of the optimized design normalized to  $V_{DD}$  is 28.24% at the beginning of the circuit (without any NBTI degradation) and 28.83% at the end of the specified lifetime (t =  $10^8$  s). Both values are slightly better than the circuit with typical design.

### 4 Conclusion

In this paper an NBTI-aware optimization method for transistor sizing has been proposed. In the proposed optimization method, the increased delay of the circuit under degradation is minimized subject to the constraints on power, UNG and area. The proposed optimization method is a systematic and comprehensive approach suitable for scaled-technology domino circuits where delay, power and noise immunity are highly challenging concerns. Meanwhile, the proposed design flow can be easily set up based on any given design constraints for a pre-specified circuit lifetime.

