https://doi.org/10.33180/InfMIDEM2021.202



Journal of Microelectronics, Electronic Components and Materials Vol. 51, No. 2(2021), 101 – 112

# Vector Controlled Delay Cell with Nearly Identical Rise/Fall Time for Processor Clock Application

Pritam Bhattacharjee<sup>1</sup>, Bidyut K. Bhattacharyya<sup>2</sup>, Alak Majumder<sup>3</sup>

<sup>1</sup>Department of Computer Science & Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, India.

<sup>2</sup>Packaging Research Center at Georgia Institute of Technology, Atlanta, USA. <sup>3</sup>National Institute of Technology (NIT), Department of Electronics & Communication Engineering, Integrated Circuit And System (i-CAS) Laboratory, Arunachal Pradesh, India.

**Abstract:** In the design of modern processor chips, proper clock distribution is a very important aspect which impacts the chip performance. It is the active cell of delay circuits and cells with variable delay that have the major involvement in clock distribution, thereby deciding the time slacks of all functionalities inside the chip. Because they help in proper input to output signal transmission with the adjustment of variable timing delays and monitor the output signal to have equal rise/fall time, which most of the existing delay elements fail to deliver. Therefore in this article, we have proposed an input vector based design of variable delay with balanced rise time and fall time for the output signal. We have also estimated the delay and output voltage in terms of a mathematical model. This new configuration is executed across the commercial platform of Cadence Virtuoso<sup>®</sup> using 90nm technology node while steered by a 1GHz input signal and power supply of 1.1 V. The execution outcome confirms the desired features of our proposed design under typical conditions and even in process corner variations.

Keywords: vector-controlled circuit design; variable delay cell, Rise/Fall time; Processor Clock; CMOS process technology

# Vektorsko nadzorovana zakasnilna celica s skoraj enakim časom vzpona/ padca za uporabo procesorske ure

**Izvleček:** Pri zasnovi sodobnih procesorskih čipov je ustrezna razporeditev ure zelo pomemben vidik, ki vpliva na delovanje čipa. Aktivna celica zakasnilnih vezij in celic s spremenljivo zakasnitvijo ima glavno vlogo pri porazdelitvi ure in tako odloča o časovnih zakasnitvah vseh funkcij znotraj čipa. Pomaga pri pravilnem prenosu vhodnega signala s prilagoditvijo spremenljivih časovnih zamikov in nadzoruje izhodni signal, da ima enak čas vzpona / padca, kar večina obstoječih elementov zakasnitve ne dosega. V članku predlagamo zasnovo spremenljive zakasnitve na osnovi vhodnega vektorja z uravnoteženim časom vzpona in padca izhodnega signala. Zakasnitev in izhodno napetost smo ocenili z matematičnim modelom. Nova konfiguracija se izvaja na komercialni platformi Cadence Virtuoso<sup>®</sup> z uporabo 90nm tehnologije s 1 GHz krmilnim signalom in napajanjem 1.1 V. Rezultat izvedbe potrjuje želene značilnosti našega predlaganega načrta v tipičnih in netipičnih pogojih.

Ključne besede: zasnova vektorsko krmiljenega vezja; celica s spremenljivo zakasnitvijo; čas vzpona / padca; procesorska ura; tehnologija CMOS

\* Corresponding Author's e-mail: pritambhattacharjee@am.amrita.edu

### 1 Introduction

Since the past few decades, we are able to witness a lot of advancement in the consumer electronics like computers, computer accessories, mobile phones as well as their inner components for example, central-processing-unit (CPU) or even the graphics-processing-unit (GPU). Semiconductor giants like Intel, AMD and QUAL-COMM have successfully brought up the discrete-level integration of CPU and GPU on a single platform [1, 2]. However, as result of this integration, clock signaling and its efficient routing have become very important in order to maintain proper functioning and performance of each CPU and GPU. The efficacy of clock signaling and transmission to CPU/GPU is ascertained by the components involved in clock distribution as seen from Fig. 1(a). Basically, the clock signal traverses through multi-buffer stages (in the form of tree-like structure, viz. clock tree) before reaching the dedicated CPU, Graphics or PCIe sockets. All these units operate at different frequencies, but they are supposed to function in parallel. Therefore, the timing parameters involved in the signal transmission are always a matter of concern so as to extract the best performance out of the endproduct [3].



**Figure 1:** (a) Typical style of clock distribution inside a processor chip (b) clock tree design.

In fact, it is these buffers which play a crucial role in forming the clock tree for the clock distribution network (CDN) as shown in Fig. 1(b), wherein, CDN is purposed to output a synchronizing signal to coordinate the functioning of each circuit block inside the processor chip. The buffers of clock tree set up the delay for signal transmission along the branches of tree so that the timing of signals can be balanced at each and every node (or leaf as directed in Fig. 1(b)) connected to the units like CPU, graphics or PCIe socket. However, the delay incorporated through these buffer cells is of constant value and most often it is required to use different sized buffers (i.e., the sizes of Buffer\_1  $\neq$  Buffer\_2  $\neq$  Buffer 3 and so on) such that the clock arrival time across all sequential elements inside CPU or GPU chip remains synchronized. But, nowadays the CDN designers are more interested and dependent on the use of variable delay cells (with proper control to adjust the delay variability) so that the clock trees inside CDN are more versatile in terms of their functioning. In fact, the use of variable delay cells is quite popular in other CDN components like locked loops (both DLL and PLL), oscillators, frequency multipliers and dividers and many other System-on-Chips (SoCs). As a matter of fact, the use of these delay elements in the form of cluster (i.e., delay line) is also popular for the construction of SoC time measurement circuits (TMC) that are installed to measure internal timing parameters of the chip [4-6]. Hereby, the design creditability of delay circuits offering fine-tuned values of delay indirectly supports the working performance of TMCs. Nevertheless, the circuit design of such delay elements for modern SoCs is difficult to tackle because of their own trade-offs in design specifications and the concern is also relatively high while considering their involvement in the computational aspects of embedded systems [7]. Therefore, many researchers and circuit designers have invested themselves in the development of different delay circuits.

### 1.1 Background of delay cell design

Although the research on delay circuit design has been present for guite a long time and there are several literatures, but we have focused on basic design structures like transmission gate-controlled delay cell element (Trans-DE) [8,9], concatenated inverter-controlling delay cell element (viz., CI-DE) [8, 10] and current starved controlling delay element (viz., CS-DE) [8, 10]. Up to now, any circuital modifications done on the delay circuit design revolve around this delay cell primitives and all of them produce delay based on the change of physical dimensions of devices used in the architecture. But nowadays, substantial research is invested into the design of delay cell architectures with fixed dimensions that are capable of generating variable delay values at the output. Such design was pioneered with the advent of Vernier Delay Line (VDL) [12, 13], as presented in Fig. 2. It has many buffers that are connected along the customised rows and columns. The delays introduced by a buffer is equal to one of two values §, and  $\S_2$  ( $\S_1 \neq \S_2$ ). The delay value obtained at a circuit node is given by  $\left\{ \left\| \int_{a,b} = (a \times \int_{1}) + (b \times \int_{2}) \right\| \cdot t \right\}$  depending on input cycle time viz., 't'. The magnitude difference of §, and §, (i.e.,  $|\$_1-\$_2|$ ) represents the adjustability of the delay in this design. As the buffers are typically designed using complementary metal-oxide-semiconductor (CMOS) technology, the input gate of every MOS along the customised rows and columns serves as the knob to tune

the delay value, which is not convenient and the architecture is unnecessarily crowded.



Figure 2: Design style of Vernier delay line [12].

In [14] the concept of Voltage Controlled based Delay Element i.e., VC-DE was presented. This has also been the foundation for designing digitally-controlled or even the digital-based programmable delay elements (DC-DE/DP-DE). From design prospective, DC-DE is not much different from DP-DE and they are treated as a sub-class of vector-controlled delay elements. The changes of delay value in DC-DE or DP-DE are based on the various combinations of input vectors [15-17]. In case of VC-DE, typically different bias/control voltages are employed to obtain the variable delay values. However, the design layover of both DC-DE, VC-DE along with DP-DE centres on the concept of controlling terminal voltages/currents across MOS devices of the fundamental designs viz., Trans-DE, CI-DE, also sometimes the CS-DE. The value of channel resistance  $(R_{ON})$ when the device is ON and the logical gate capacitance  $(C_{G'})$  as stated in equation (1) and (2) directly impact the propagation delay ( $\tau = R_{ON} \times C_{G'}$ ) of the delay circuit [18, 19].

$$R_{\rm oN} = \frac{1}{k(V_{\rm GS} - V_{\rm th})} \tag{1}$$

$$C_{G'} = \frac{\Delta Q_G}{V_{dd}}$$
(2)

Parameter 'k' comprises of device related terms, V<sub>GS</sub> is the gate-to-source voltage, V<sub>th</sub> is threshold voltage of MOS devices, V<sub>dd</sub> is the power supply voltage, and  $\Delta Q_{g}$  is the gate charge which depends on V<sub>GS</sub> [20].

The matter of associating DC/DP with the delay circuits is to make the delay cell design strong and stable. It is the proper capacity of these DC/DP techniques to tune the delay values which determine how they can generate variable delay at the output. So, it is important to understand how well these techniques suit with the fundamental delay elements.

#### 1.2 Consequences in the design of delay cell structures

During the literature survey, we concluded that the DC/ DP-DE implementation is more compatible with delay elements viz., CI-DE and CS-DE, instead of being incorporated with Trans-DE. The reason for this can be seen in Fig. 3(a) where the n-channel MOS (nMOS) i.e.,  $M_2$ and the p-channel MOS (pMOS) i.e.,  $M_1$  of the transmission gate ( $T_G$ ) are ON for most of the time to maintain proper signaling integrity from the input ( $V_{in}$ ) to the output (herein, the node 'P') and results in a significant amount of power dissipation across  $V_{dd}$ . That questions the appropriateness of the Trans-DE cell design.



**Figure 3:** (a) CMOS based Schmitt trigger attached to TG (b) Design style of CI-DE.

The CI-DE, being one of the primitive architectures of delay elements comprises of 2 CMOS inverters back-toback depicted in Fig. 3(b). Its physical time delay is given by equation (3) where  $C_A$  is the capacitance across node 'A' and  $V_{out}$  is the amount of voltage change at the output.

$$\tau = \frac{C_A V_{out}(t)}{I}$$
(3)

| Circuital Schemes | VDL<br>[12, 13]                                                                                                                       | Trans-DE<br>[8, 10]                                                                                                                                               | CI-DE<br>[8, 10]                                                                                                                             | CS-DE<br>[8, 9]                                                                                                                               |
|-------------------|---------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
| Tuning approach   |                                                                                                                                       | Voltage-controlled                                                                                                                                                | Vector-controlled                                                                                                                            | Vector-controlled                                                                                                                             |
| Pros              | <ul> <li>Constructed with<br/>series of stable buffer<br/>cells.</li> <li>Delivers different<br/>delay at all output taps.</li> </ul> | <ul> <li>No issue in the output signal strength.</li> <li>Small circuit.</li> <li>Voltage-level of 'S' and " helping to generate variable delay.</li> </ul>       | <ul> <li>Simple CMOS based design.</li> <li>Symmetric architecture.</li> <li>DC/DP technique to generate variable delay.</li> </ul>          | - Good adjustment of<br>I <sub>ch</sub> and I <sub>dis</sub> (I <sub>ch</sub> ≈I <sub>dis</sub> ).<br>- Implementation of<br>DC/DP technique. |
| Cons              | <ul> <li>Design is crowded<br/>with lot of redundant<br/>elements.</li> <li>The delay value can-<br/>not be tuned.</li> </ul>         | <ul> <li>Adjusting the transistor sizes is difficult due to impact of device body effect.</li> <li>Dependency on the proper generation of 'S' and 'S'.</li> </ul> | Difficult adjusting<br>of transistor sizes to<br>maintaining sym-<br>metricity, mostly the<br>problem is caused by<br>$I_{ch} \neq I_{dis.}$ | Presence of current<br>mirror and bias cir-<br>cuits.                                                                                         |

| Table 1: Design anal | ysis of kinds of circuital attem | ots in variable delay. |
|----------------------|----------------------------------|------------------------|
|                      |                                  |                        |

In this case, 'I' denotes the charging current and the discharging current (viz.,  $I_{ch}$  and  $I_{dis'}$ , respectively) based on input steady-state condition. When these delay elements (i.e., specifically CI-DE) are being used in on-chip sections like CDN, it is really important that the output rise/fall time (viz.,  $t_{rise}/t_{fall}$  or also indicated as rise/fall delay) of the delay element is almost equal. The near-symmetric rise/fall time is required or else there are many negative consequences that appear inside the chip signaling such as the inequality in the clock pulse-width which results in variation of the ON-OFF time as shown in Fig. 4.



**Figure 4:** Output signal depicted across the operation of CI-DE.

If  $I_{ch} = I_{dis'}$  t<sub>rise</sub> is equal to t<sub>fall</sub> which results in equal ON-OFF time for a clock signal. In a CI-DE it is not possible to guarantee  $I_{ch} = I_{dis}$  since it is a CMOS inverter based design. This kind of design has a pull-up section made of pMOS transistors that charge-up the output load and a pull-down section that discharges it through nMOS transistors. The device dimension of nMOS and pMOS must differ for CI-DEs to match the charge-carrier mobility because nMOS transistors have a higher mobility than pMOS transistors. To compensate this difference, pMOS transistors must have greater channel width.

Due to this CI-DE a not a symmetric architecture which can deliver nearly balanced output timing components (viz. rise time and fall time). Even if the input signal has  $t_{rise} = t_{fall'}$  the CI-DE output fails to replicate this and the effect is further increased by a long buffer chain. Since our concern is the delay elements of CDN, it can be inferred that the output of CI-DE (if used inside CDN) will have a tendency to incorrectly drive the on-chip sequential circuits, especially the ones that are leveltrigger sensitive.

So, it is quite important that  $I_{ch}$  and  $I_{dis}$  are matched. For that, some extra transistors are added to CI-DE (viz., P3 and N3 as shown in Fig. 5(a)), the design which is commonly referred as CS-DE. The use of P3 and N3 is to provide a source of current flowing from  $V_{dd}$  such that the values of  $I_{ch}$  and  $I_{dis}$  can be matched. However in the design of CS-DE, there are also P4 and N4 that have current limiting features and obstruct the supply voltage-level to the inverter (constituted by P2 and N2). This even has the possibility to induce power supply noise into the CS-DE output impacting the output signal integrity. Often, the design structure of CS-DE is improvised as shown in Fig. 5(b) so that this problem can be avoided. Initially, most of its nodes in CS-DE (viz.,



**Figure 5:** Structure of (a) conventional CS-DE (b) redesigned CS-DE [17].

M and N) are stuck-at logic '0' which allows P1 to be ON and therefore the output node 'out' is high. This enables P3 and N3 to be OFF at an early stage. Though the logic state of 'N4' is dependent on the input 'in', it does not impact the real-time signal transmission of 'in' to 'out'. This stability in the transmission is due to a CMOS inverter in addition to an nMOS 'N1' at the output. Interestingly, this version of CS-DE provides matching rise/ fall delay by tweaking the charging as well as discharging capacitances (viz., C<sub>o</sub>1 and C<sub>o</sub>2 respectively) across the output. Despite this the problem still prevails i.e.,  $I_{ch} \neq I_{dis}$  (since the paths of C<sub>o</sub>1 and C<sub>o</sub>2 are different) and as a whole that affects the magnitude of  $t_{rise}$  and  $t_{fall}$ .

Above all, prevalent DC/DP techniques [15-17] which are utilized for obtaining the different values of delay possibly enhance the difference in expected equality that  $I_{ch}$  also  $I_{dis}$  should have. In fact, the problem is there in almost all the kinds of delay circuits as reviewed and displayed in Table 1. Very few circuit designers has looked into this aspect and tried to balance rise delay of the output with its fall delay. Hence, it is our motivation to design a new delay element delivering almost equal values of I<sub>ch</sub> and I<sub>dis</sub> such that it is able to generate near symmetric output t<sub>rise</sub> and t<sub>fall</sub>. Besides, we have also concentrated on using a DC/DP based technique which will help to generate variable delay using the proposed delay element. This technique can be thought of as simplistic all-digital approach to produce variable delay at the output having near symmetric t<sub>rise</sub>/ t<sub>fall</sub>.

#### 1.3 Organization of this article

This article is structured as follows: In section 2, we provide justification for our proposed circuit design. In section 3, we introduce the new design of delay element and demonstrate a simple mathematical model. We also introduce our alternative approach to DC/DP technique in the same section. The performance analysis of the whole circuit setup is described in section 4. In the last section 5, we conclude our work by stating once again the relevancy of our proposed delay circuit design in modern processor systems.

## 2 Major Highlights

An efficient design of a delay circuit is only possible if the outputs exhibit almost equal  $t_{rise}/t_{fall}$ . So in this article, we have focused on a delay cell structure such that it is efficient in projecting varied input-to-output physical time delay based on the tweaking of proposed alternative of DC/DP technique and also the output signal is able to feature  $t_{rise} \approx t_{fall}$ .

The contents of this article are as follows:

- Need of variable delay cells in modern processors.
- Development of new delay cell complying with nearly balanced output timing components (viz. rise time and fall time).
- Constructing an alternate of DC-DE or DP-DE methodology to control variant values of delay across the proposed delay cell.
- Detailed performance analyses of schematic and layout based proposed vector-controlled variable delay cell using 90nm process design kit (PDK) [21].

### 3 The New Design of CI-DE

It has been mentioned earlier that the current designs of CI-DE is not capable of delivering equal t<sub>rise</sub> and t<sub>fall</sub> at its output. The major issue is non-symmetric design of pull-up-network (PUN) and pull-down-network (PDN) in CMOS based inverters. Despite this, circuit designers have been relying on CI-DE design structure and in most cases improvised by adding intermediate shunt capacitors. By doing so, the symmetricity within PUN and PDN is adjusted [22]. But, fabrication of these shunt capacitors in any deep sub-micron technology is difficult. However, an effective solution would be to embed MOS based resistors and capacitors in the CI-DE design instead of using shunt capacitors. In fact, this approach was first published in [20]. It is shown in Fig. 6(a) where the resistance (R<sub>1</sub>) and the capacitance (C) are placed adjacent to the inverter output as well as another resistance (R<sub>2</sub>) is placed underneath the pulldown section. Nevertheless, these R<sub>1</sub>, R<sub>2</sub> and C were not MOS-based cells and using them was not efficient in terms of layout design.



**Figure 6:** (a) Inverter design from [23] (b) improvised version based on the circuit from figure 6(a) which is the basis for the new CI-DE.

The inverter design in Fig. 6(a) delivered a good amount of propagation delay, provided  $R_2=0\Omega$  (or there was issues in determining output logic level '0') and  $R_1>>R_2$ . Though the value of  $R_1$  could be managed, adjusting the value  $R_2$  to  $0\Omega$  was technically quite difficult using MOS devices as intrinsic parameters always affect the device ON resistance to some extent. To solve this problem, we modified the circuit in Fig. 6(a) by discarding  $R_2$  and implementing  $R_1$  and C as MOS based resistance and capacitance respectively. Since nMOS is faster logic compared to pMOS [24, 25], we have preferred the nMOS based representation of resistance and capacitance.

#### 3.1 Mathematical model of delay estimation

Based on the circuit of Fig. 6(a), a different kind of CI-DE is obtained as shown in Fig. 6(b). It can be seen from the fundamentals depicted in equation 1(a) that there can be variation in the value of  $R_{ON}$  depending on the

change in V<sub>GS</sub> and V<sub>th</sub>. In this design, R<sub>ON</sub> of T3 and T7 can be varied based on the value of their common V<sub>GS</sub> (denoted by 'X' in Fig. 6 (b)). Now considering the first modified inverter in Fig. 6(b), let us assess the magnitude of output voltage at node 'C' and the amount of propagation delay incurred. While the node 'C' switches from high to low, the nMOS 'T2' is in saturation. Therefore, the current flowing across 'T2' is given by:

$$I_{2} = \frac{1}{2} k'_{n} \left( \frac{W}{L} \right)_{n} (V_{in} - V_{Tn})^{2}$$
(4)

In equation (4),  $k_n = \mu_n C_{ox}$  where  $\mu_n$  is the coefficient of electron carrier mobility and  $C_{ox}$  is the oxide-capacitance per unit area, W/L is the aspect ratio of 'T2',  $V_{Tn}$  is threshold voltage of nMOS. The value of  $I_2$  may be put in equation (3) and we have:

$$-C_{T4} \frac{dVC(t)}{dt} = \frac{1}{2} k'_n \left(\frac{W}{L}\right)_n \left(V_{in}(t) - V_{Tn}\right)^2$$

where  $C_{T4}$  is the capacitance of the MOS capacitor 'T4',  $V_c$  is potential at node 'C'.

$$\frac{dV_{C}(t)}{dt} = \frac{k_{n}}{2C_{T4}} \left(\frac{W}{L}\right)_{n} \left(V_{Tn}^{2} \times \left\{\frac{2 \times V_{in}(t)}{V_{Tn}} - 1\right\}\right)$$
(5)

For equation (5), we have not considered to include the squared terms while solving  $(V_{in}(t)-V_{Tn})^2$ . Such kind of condition can be taken in account when  $V_{in} << V_{Tn}$  and 'T2' switches to cut-off. Now, we know the obvious case is:

$$\frac{2 \times V_{in}(t)}{V_{Tn}} >> 1$$

So, equation (5) can be rewritten as:

$$\frac{dV_{C}(t)}{dt} = \frac{\dot{k_{n}}V_{Tn}}{C_{T4}} \left(\frac{W}{L}\right)_{n} \times V_{in}(t)$$
(6)

Assuming that 'T4' is initially charged with voltage 'V<sub>0</sub>', it will gradually discharge through the MOS resistance 'T3' (which has variable ON resistance 'R<sub>var</sub>' based on the gate voltage 'X') and fixed-finite resistance offered by 'T2' (denoted as  $R_{sat}$ ). Therefore, equation (6) can be rewritten as:

$$\frac{d}{dt}(V_0 \times e^{\frac{-t}{(Rvar + Rsat) \times C T^4}}) = \frac{\dot{k_n} V_{Tn}}{C_{T4}} \left(\frac{W}{L}\right)_n \times V_{in}(t)$$
(7)

Consider the Laplace transformation on both sides of equation (7) and analyse for zero initial condition. It is as follows:

$$V_{0}\left[\frac{s}{s+\frac{1}{(Rvar+Rsat)\times CT4}}\right] = \frac{k_{n}'V_{Tn}}{C_{T4}}\left(\frac{W}{L}\right)_{n} \times V_{in}(s)$$

Using s=j $\omega$  and obtaining the modulus of V<sub>0'</sub> the relation can be rewritten as:

$$\left|V_{0}\right| = \sqrt{\left(1 + \frac{1}{\omega^{2} \times (R_{var} + R_{sat})^{2} \times C_{T4}^{2}}\right)} \times \frac{\dot{k_{n}}V_{Tn}}{C_{T4}} \left(\frac{W}{L}\right)_{n} \times V_{in}(\omega) \quad (8)$$

Equation (8) models the voltage at output node 'C'. The crucial observation is that the output voltage is a function of variable resistance incurred by nMOS 'T3' and the input signal frequency. However while reconsidering equation (6) for particular point in time; it can be interpreted as:

$$V_{\rm C} = \frac{{\rm k}_{\rm n}^{\rm n} V_{\rm Tn}}{C_{\rm T4}} \left(\frac{W}{L}\right)_{\rm n} \times V_{\rm in} \times \int_{0}^{\tau} {\rm dt}$$
<sup>(9)</sup>

where  $\tau$  is propagation delay coefficient. Finally, equation (9) is simplified as shown in equation (10):

$$\tau = \frac{V_{C}}{\frac{\dot{k_{n}}VTn}{CT4} \left(\frac{W}{L}\right)_{n} \times Vin}$$
(10)

We consider the design in Fig. 6(b) to be symmetric i.e., the structural components across the input node 'in' to node 'C' and that of node 'C' to node 'E' are identical. All the device dimensions of the design and their intrinsic parameters are set in accordance to the details given in 90nm PDK. In fact, the device dimensions are adjusted to assure that  $I_1=I_2$  and  $I_4=I_5$ . Since hypothetically, our improvised CI-DE is a symmetric design, the amount of current flow across node 'C'  $\rightarrow$  ( $I_3$ ) and across node 'E'  $\rightarrow$ ( $I_6$ ) can be correlated in magnitude. The signal passing through node 'C' is inversed when it reaches the node 'E' and its  $t_{rise}=t_{fall}$ . A CMOS buffer (with  $t_{rise}=t_{fall}$ ) is attached at the end to enhance the range of delay. Therefore the proposed CI-DE has the capability of delivering an output signal with balanced rise and fall time.

### 3.2 Proposed System Architecture

The construction of the proposed of CI-DE is incomplete without setting up an alternative of DC/DP techniques that can generate values for the gate voltage 'X'. We propose a new circuit for setting the delay generated by our CI-DE as shown in Fig. 7. The resources used for constructing it are taken from 90nm PDK libraries. The proposed circuit comprises three circuit blocks:

- Potential Generator (PG).
- 8:1 Multiplexer (MUX).
- Proposed CI-DE module.

It is the PG unit which generates different voltages based on the supply voltage ' $V_{dd}$ '. These voltages are transferred to node 'X' through an 8:1 MUX controlled by select lines (S<sub>1</sub>, S<sub>2</sub> and S<sub>3</sub>).



Figure 7: Proposed architecture of the new CI-DE.

A significant part of the circuit in Fig. 7 is the PG unit. In the proposed circuit 8 voltage levels are generated: 780mV, 820mV, 840mV, 860mV, 900mV, 920mV, 970mV and 1V. The selection of these voltage levels is decided according to the parameters stated in Table 2 in a way that the proposed delay cell can generate meaningful range of delay values.

Table 2: Simulation setup used in this work.

| <b>a</b>                    | Temp. | V <sub>dd</sub><br>(volt) | Input signal   |                |                 |
|-----------------------------|-------|---------------------------|----------------|----------------|-----------------|
| Process Tech. (nm)          | (°C)  | (volt)                    | Rise time (ps) | Fall time (ps) | Frequency (GHz) |
| Typical<br>90nm<br>PDK [21] | 27    | 1.1                       | 100            | 100            | 1               |

The PG unit generates the voltage levels based on the Potential-Divider principle. The resistors are made of polysilicon ('resnpoly'). The sheet resistance of these resistors is intrinsically high and quite often used in MOS-based circuit designs [26]. The 'resnpoly' on the V<sub>dd</sub> side of the PG unit is fixed to 22 $\Omega$  and the value of the resistor near the ground line of PG is varied as mentioned in Table 3. The physical designs of PG, MUX and CI-DE are based on the definitions given in the 90nm PDK [21]. The layout of the proposed circuit is given in Fig. 8 and the estimated area is 1139.645µm<sup>2</sup> (where, area of the individual portions are as follow: PG=394.856µm<sup>2</sup>, 8:1

MUX=702.159 $\mu$ m<sup>2</sup>, and Delayed Clock section or proposed CI-DE module=42.63 $\mu$ m<sup>2</sup>).

Table 3: Resistance values used in the PG Unit.



**Figure 8:** Layout of the proposed delay cell architecture using 90nm PDK.

For pre & post-layout circuit simulation, commercial electronic design automation (EDA) tools like Cadence Virtuoso<sup>®</sup> and Mentor Graphics Calibre<sup>®</sup> were used. The results of the transient analysis are shown in Fig. 9.

The circuit exhibits greater delay in post-layout simulations. This can be considered as an added advantage based on the process technology used (i.e., 90nm PDK). However, the main concern is whether the circuit can generate equal  $t_{rise}/t_{fall}$  at its output.



**Figure 9:** Output signal obtained from pre- and postlayout simulation.

For this reason, we have plotted the rise time and fall time of the proposed delay cell output with respect to the change in input vector combinations and displayed it in Fig. 10(a) and 10(b). The difference between rise time and fall time in pre-layout simulations (denoted by ' $\Delta_1$ ') is much smaller when compared to the difference obtained in post-layout simulations (denoted by ' $\Delta_2$ '). The value of  $\Delta_1$  is approximately 0 when "100" is set as the input vector whereas approximation of  $\Delta_2$  is 0 for input vector "111". This is mainly because the extracted parasitic values (obtained from Calibre® PEX Runtime [27]) of the pre-layout version of the design are different from the post-layout version. However, that is not a matter of concern since there are always sophisticated layout techniques [28-31] which offer ways to avoid such design-level mismatch.

# 4 Estimation of circuit performance of the proposed delay cell

In this section, the performance analysis of our proposed circuit is presented based on parameters like rise



**Figure 10:** Rise delay & fall delay obtained from (a) prelayout and (b) post-layout simulations.

and fall delay, with their difference in value (denoted as ' $\Delta$ '), the average delay (t<sub>avg</sub>) and power-delay-product (PDP). The input vector for simulation is considered as "111".

# 4.1 Circuit performance based on process variation and corner analysis

It is important to test the delay cell performance for various temperature (T) and  $V_{dd}$  values. These results are plotted in Fig. 11(a) and 11(b).

The difference between rise delay and fall delay is negligible and the average delay is low at low temperatures. The average delay increases as the temperature is increased. The characteristic of balanced output rise/fall time is upheld across variations of V<sub>dd</sub> within ±9.08%. The proposed delay cell can deliver balanced rise and fall delay at the output as well as appropriate average delay while operating at room temperature (300°K) and 1.1V V<sub>dd</sub>.



Figure 11:  $t_{_{rise'}} \; t_{_{fall'}}$  and  $t_{_{avg.}}$  as function of (a) T(°C) and (b)  $V_{_{dd}}\!.$ 

**Table 4:** Proposed delay cell performance for 3 distinct process corners.

|                    | Performance Parameters |                    |           |                       |             |  |
|--------------------|------------------------|--------------------|-----------|-----------------------|-------------|--|
| Process<br>Corners | Rise<br>Delay<br>(ps)  | Fall Delay<br>(ps) | Δ<br>(ps) | Avg.<br>Delay<br>(ps) | PDP<br>(fJ) |  |
| FF                 | 157.808                | 154.815            | 2.993     | 156.31                | 3.965       |  |
| TT                 | 201.201                | 192.843            | 8.358     | 197.02                | 4.036       |  |
| SS                 | 284.833                | 262.114            | 22.719    | 273.47                | 4.815       |  |

The post-layout performance of the presented delay circuit is simulated for 3 different process corners (viz., Fast-Fast $\rightarrow$  'FF', Typical-Typical $\rightarrow$  'TT' and Slow-Slow $\rightarrow$  'SS'). The results are displayed in Table 4. The observation from here is noted as follows: (a) the  $\Delta$  in FF is 64.18% lesser than TT; whereas in SS,  $\Delta$  is found 63.21% more; (b) the average delay value measured in TT is 26.04% higher than the value in FF which is even higher in SS corner; (c) as per as the power dissipation of our circuit is concerned, the reading of PDP in TT corner is seen to be optimal.

### 4.2 Analysis of the proposed delay cell through Monte-Carlo simulation

In this section, the results are reported on carrying out the Monte-Carlo simulation of the proposed delay cell under nominal operating parameters of TT process corner. All the results are obtained from Cadence ADEXL<sup>®</sup>.



**Figure 12:** Plots depicting the results of Monte-Carlo simulation for the parameters i.e., (a) Rise Delay & (b) Fall Delay.

The histogram plot of the output rise/fall delay that we see in Fig. 12(a) and 12(b) are based on the data collected while all design parameters are varied randomly for 500 different instances. Considering 3 $\sigma$  process, the rise delay is found to range between 172ps to 232ps with a variability of 4.96% only; whereas the fall delay records a variability of 8.01% against the statistical variations. But, the mean of both the metrics are almost similar to what we have noted for TT corner simulation, which proves the reliability of the design. The  $\Delta$  delay is found to be as small as  $\pm$ 6.2% only, thereby justifying the worth of proposed delay cell configuration.

### 5 Conclusion

The design of the proposed delay cell is accomplished by reconstructing the primitive CI-DE architecture, adding components viz., resistances as well as capacitances at the appropriate places so that equal rise time and fall time can be obtained at the output. The delay at the circuit's output can be adjusted by setting the gate-voltage of the MOS based resistors. Our proposed delay circuit is tested in 90nm PDK with an input signal of frequency 1GHz and V<sub>dd</sub>=1.1V. It is noted that the difference in rise/fall time is only 4.24% of the average delay incurred by the proposed circuit and this value range from 260ps to 360ps. These values can be further increased by incorporating long buffer chains. We conclude that the proposed vector-controlled variable delay cell is fit for its purpose.

## 6 Acknowledgments

The author(s) would like to acknowledge the insights of Dr. A.J. Mondal during this work.

# 7 Conflict of Interest

The authors declare no conflict of interest in preparing this article.

### 8 References

- Dong, T., Dobrev, V., Kolev, T., Rieben, R., Tomov, S., & Dongarra, J. (2014, May). A step towards energy efficient computing: Redesigning a hydrodynamic application on CPU-GPU. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium (pp. 972-981). IEEE. <u>https://doi.org/10.1109/IPDPS.2014.103</u>
- Chen, W., & Dömer, R. (2015). Out-of-order Parallel Discrete Event Simulation for Electronic Systemlevel Design. Springer International Publishing. <u>https://doi.org/10.1007/978-3-319-08753-5</u>
- Anju, C., & Pande, K. S. (2012). Low Power GALS Interface Implementation with Stretchable Clocking Scheme. *International Journal of Computer Science Issues (IJCSI)*, 9(4), 209. https://www.ijcsi.org/ papers/IJCSI-9-4-3-209-213.pdf
- Dhirubhai, L. M., & Pande, K. S. (2019, July). Critical Path Delay Improvement in Logic Circuit Operated at Subthreshold Region. In 2019 International Conference on Communication and Electronics Systems (ICCES) (pp. 633-637). IEEE. https://doi.org/10.1109/ICCES45898.2019.9002233
- Abas, M. A., Russell, G., & Kinniment, D. J. (2007). Embedded high-resolution delay measurement system using time amplification. IET Computers & Digital Techniques, 1(2), 77-86. <u>https://doi.org/10.1049/iet-cdt:20060099</u>
- Abas, M. A., Russell, G., & Kinniment, D. J. (2007). Built-in time measurement circuits-a comparative design study. IET Computers & Digital Techniques, 1(2), 87-97. https://doi.org/10.1049/iet-cdt:20060111

- Banerjee, A., & Das, D. K. (2016). A New Squarer design with reduced area and delay. IET Computers & Digital Techniques, 10(5), 205-214. <u>https://doi.org/10.1049/iet-cdt.2015.0170</u>
- Mahapatra, N. R., Tareen, A., & Garimella, S. V. (2002). Comparison and analysis of delay elements. In Circuits and Systems, 2002. MWS-CAS-2002. The 2002 45th Midwest Symposium on (Vol. 2, pp. II-II). IEEE. https://doi.org/10.1109/MWSCAS.2002.1186901
- 9. Zhang, X., & Sridhar, R. (1994, September). CMOS wave pipelining using transmission-gate logic. In Proceedings Seventh Annual IEEE International ASIC Conference and Exhibit (pp. 92-95). IEEE. https://doi.org/10.1109/ASIC.1994.404602
- Mahapatra, N. R., Garimella, S. V., & Tareen, A. L. W. I. N. (2000, April). An empirical and analytical comparison of delay elements and a new delay element design. In Proceedings IEEE Computer Society Workshop on VLSI 2000. System Design for a System-on-Chip Era (pp. 81-86). IEEE. https://doi.org/10.1109/IWV.2000.844534
- 11. Jovanović, G. S., & Stojčev, M. K. (2006). Current starved delay element with symmetric load. International journal of electronics, 93(03), 167-175. https://doi.org/10.1080/00207210600560078
- 12. Moyer, G. C., Clements, M., & Liu, W. (1996). Precise delay generation using the Vernier technique. Electronics letters, 32(18), 1658-1659. https://doi.org/10.1049/el:19961149
- Li, G. H., & Chou, H. P. (2007, November). A high resolution time-to-digital converter using twolevel vernier delay line technique. In 2007 IEEE Nuclear Science Symposium Conference Record (Vol. 1, pp. 276-280). IEEE. https://doi.org/10.1109/NSSMIC.2007.4436330
- Johnson, M. G., & Hudson, E. L. (1988). A variable delay line PLL for CPU-coprocessor synchronization. IEEE Journal of Solid-State Circuits, 23(5), 1218-1223.

https://doi.org/10.1109/NSSMIC.2007.4436330

- 15. Maymandi-Nejad, M., & Sachdev, M. (2003). A digitally programmable delay element: design and analysis. IEEE transactions on very large scale integration (VLSI) systems, 11(5), 871-878. https://doi.org/10.1109/TVLSI.2003.810787
- Maymandi-Nejad, M., & Sachdev, M. (2005). A monotonic digitally controlled delay element. IEEE Journal of Solid-State Circuits, 40(11), 2212-2219. <u>https://doi.org/10.1109/JSSC.2005.857370</u>
- Kobenge, S. B., & Yang, H. (2009). A power efficient digitally programmable delay element for low power VLSI applications. In Quality Electronic Design, 2009. ASQED 2009. 1st Asia Symposium on (pp. 83-87). IEEE. https://doi.org/10.1109/ASQED.2009.5206292

- Sadhu, A., Bhattacharjee, P., & Koley, S. (2014). Performance Estimation of VLSI Design. Journal of VLSI Design Tools & Technology, 4(2), 59-66. <u>https://doi.org/10.37591/jovdtt.v4i2.3167</u>
- 19. Rajeswari, P., Shekar, G., Devi, S., & Purushothaman, A. (2018). Geometric Programming-Based Power Optimization and Design Automation for a Digitally Controlled Pulse Width Modulator. *Circuits, Systems, and Signal Processing, 37*(9), 4049-4064.

### https://doi.org/10.1007/s00034-017-0734-z

20. Nose, K., Chae, S. I., & Sakurai, T. (2000). Voltage dependent gate capacitance and its impact in estimating power and delay of CMOS digital circuits with low supply voltage (poster session). In Proceedings of the 2000 international symposium on Low power electronics and design (pp. 228-230). ACM.

### https://doi.org/10.1145/344166.344601

- 21. 90nm CMOS based Process Design Kit https:// www.themosisservice.com/products/fab-processes
- Andreani, P., Bigongiari, F., Roncella, R., Saletti, R., & Terreni, P. (1999). A digitally controlled shunt capacitor CMOS delay line. Analog Integrated Circuits and Signal Processing, 18(1), 89-96. <u>https://doi.org/10.1023/A:1008359721539</u>
- 23. Mondal J, A., A. Majumder, B. K. Bhattacharyya & P. Chakraborty. (2017). A Process Aware Delay Circuit with Reduce Impact of Input Switching at GHz Frequencies. IEEE VLSI Circuits and Systems Letters 3(2), 6-12. https://ieeecs-media.computer. org/media/technical-ac-tivties/tcvlsi/newsletters/2017/VLSI\_Circuits\_and\_Systems\_Vol-3\_Issue-2\_June2017.pdf
- 24. Kang, S. M., & Leblebici, Y. (2003). CMOS digital integrated circuits. Tata McGraw-Hill Education. https://www.amazon.in/dp/0071243429/ ref=cm\_sw\_em\_r\_mt\_dp\_wiuSFb5M7YZ5M
- 25. Xiang, Q. (2003). U.S. Patent No. 6,600,170. Washington, DC: U.S. Patent & Trademark Office. https://patentimages.storage.googleapis.com/71/ e0/22/fea13947c00b4a/US6600170.pdf
- Roy, A., Ender, F., Azadmehr, M., Ta, B. Q., & Aasmundtveit, K. E. (2017, July). Design considerations of CMOS micro-heaters to directly synthesize carbon nanotubes for gas sensing applications. In 2017 IEEE 17th International Conference on Nanotechnology (IEEE-NANO) (pp. 828-833). IEEE. https://doi.org/10.1109/TNANO.2019.2961415
- 27. Quantus RC Extraction https://www.cadence. com/content/cadence-www/global/en\_US/ home/tools/digital-design-and-signoff/siliconsignoff/quantus-extraction-solution.html

- 28. Saint, C., & Saint, J. (2002). IC mask design: Essential layout techniques. New York: McGraw-Hill. https://dl.acm.org/doi/abs/10.5555/1593630
- Martin-Gonthier, P., Havard, E., & Magnan, P. (2010). Custom transistor layout design techniques for random telegraph signal noise reduction in CMOS image sensors. Electronics Letters, 46(19), 1323-1324. <u>https://doi.org/10.1049/el.2010.1767</u>
- Megalingam, R. K., & Lal, L. S. (2014, April). Piezoresistive MEMS pressure sensors using Si, Ge, and SiC diaphragms: A VLSI layout optimization. In 2014 International Conference on Communication and Signal Processing (pp. 597-601). IEEE. https://doi.org/10.1109/ICCSP.2014.6949911
- 31. Geiger, R. L., Allen, P. E., & Strader, N. R. (1990). VLSI design techniques for analog and digital circuits (Vol. 90). New York: McGraw-Hill. https://cds.cern. ch/record/1544515



Copyright © 2021 by the Authors. This is an open access article distributed under the Creative Com-

mons Attribution (CC BY) License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Arrived: 17. 01. 2021 Accepted: 15 .04. 2021