# AREA-POWER EFFICIENT SHIFT REGISTER USING NON-OVERLAP DELAYED PULSED CLOCK

Gundavarapu Krishnaveni<sup>1</sup>, Gumadi Madhumathi<sup>2</sup> <sup>1,2</sup>Department of Electronics and Communication Engineering <sup>1</sup>PG Scholar, Dhanekula Institute of Engineering and Technology, Vijayawada, India1 <sup>2</sup>Head of The Department, Dhanekula Institute of Engineering and Technology, Vijayawada, India2

Abstract: Flip-flops is perilous timing elements in digital circuits which have a huge influence on the circuit speed and power consumption. This paper recommends a lowpower and area-efficient shift register using pulsed latches. Therefore area and power consumption are reduced by substituting flip-flops with pulsed latches. This technique explains the timing problem between pulsed latches through the use of multiple non-overlap delayed pulsed clock signals as an alternative of the conventional single pulsed clock signal. Also the shift register uses a small number of the pulsed clock signals by grouping the latches to more than a few sub shifter registers and using supplementary temporary storage latches. A256-bit shift register using pulsed latches was simulated using a 0.18µm CMOS process with VDD = 1.8V. The core area is 160000µm2. The power consumption is 25.48mw at a 100 MHz clock frequency. The proposed shift register saves 42% area and 50% power compared to the conventional shift register with flip-flops.

Keywords: flip-flop, pulsed latch, pulsed clock, shift register, area efficient.

# I. INTRODUCTION

The developing significance of portable systems and the need to limit power consumption in Very Large Scale Integration (VLSI) chips have led to massive and innovative developments in low-power design during these recent years. Designers are striving for small silicon area, speed, low power consumption and reliability due to ever increasing demand and popularity of portable electronics. Flip-flops are the critical timing elements in digital circuits which have a large impact on the circuit speed, area and power consumption. Flip-Flop is an electronic circuit that is used to store a logical state of any data input signals with response to the clock pulse. D-type flip-flops contribute a significant part of the total power dissipation of the system and it is one of the most fundamental building blocks in VLSI systems. It captures the value of D-input at a definite portion of clock cycle. The captured value becomes the Q output. These flipflops are an essential part of many electronic devices, as they form the basis for shift registers.

## Basic concept:

The architecture of a shift register is quite simple. An N-bit shift register is composed of series connected N data flipflops. The speed of the flip-flop is less important than the area and power consumption because there is no circuit between flip-flips in the shift register. The smallest flip-flop is suitable for the shift register to reduce the area and power consumption. Recently, pulsed latches have replaced flip-flops[7]—[12] in many applications, because a pulsed latch is much smaller than a flip-flop [2]–[5]. But the pulsed latch cannot be used in a shift register due to the timing problem between pulsed latches. The shift register solves the timing problem using multiple non-overlap delayed pulsed clock signals instead of the conventional single pulsed clock signal. The rest of the paper describes the proposed shift register architecture in section II, Chip implementation in section III, simulation Results are presented in section IV, application in section V and conclusion is given in section VI.

# II. ARCHITECTURE

This section consists of shift register architectures designed in conventional and proposed methods. And also this section shows the simulated waveforms of design implementation.

# Conventional method:

A SHIFT register is the fundamental building block in a VLSI circuit. Shift registers are regularly utilized as a part of numerous applications, for example digital filters, communication receivers and image processing ICs. Recently, as the size of the image data continues to increase due to the high demand for high quality image data, the word length of the shifter register increases to process large image data in image processing ICs. The Shift registers are formed by cascade connection of flip-flops, sharing the same clock as it is a type of sequential circuit. That means the output of first flip-flop is given as the "Data" input of next flip-flop due to which shifting takes place. The speed of flip-flop is less important than area and power consumption because there is no any circuit between flip-flops in shift register. As the length of the shift register increases the area and power consumption also increases, thereby power and area consumptions are most important in this consideration.



Fig 1: Master-slave flip-flop

Hence to reduce power and area the smallest flip flop is consider for the design. In conventional method shift register is designed by serial connection of the master-slave flip flops. The figure 1 shows the master slave flip flop.

## Proposed method:

Optimized shift register designs are not achieved with the use of master-slave flip flops. Performance parameters such as area and power can be reduced with use of pulsed latches. Hence master-slave using two latches are can be replaced by pulsed latch consisting of latch with pulsed clock signal [3] which is shown in figure 2.



Fig 2: Pulsed latch

The main challenges for designing low power area efficient shift registers using latches is to optimize power & reduce the area without affecting the response or timing problem.

## Timing problem:

The pulsed latch cannot be used in shift registers due to the timing problem occurred in latch. The following figure 3 shows the timing problem in the shift register.



Fig 3: Shift register with latches and a pulsed clock signal. The output signal of the first latch (q1) changes correctly because the inputs signal of the first latch (d) is constant during the clock pulse width. But the second latch has an uncertain output signal (q2) because its input signal (q1) changes during the clock pulse width as shown in the figure 4.



Fig 4: Simulation waveforms for Shift register with latches and a pulsed clock signal.

## Solving timing problem:

To overcome this timing problem various steps can be implemented such as

- To add delay circuits between latches
- Use multiple non-overlap delayed pulsed clock signals

To add delay circuits between latches



Fig 5: Shift register with latches, delay circuits, and a pulsed clock signal

One solution for the timing problem is to add delay circuits between latches, as shown in Figure. 5. The output signal of the latch is delayed and reaches the next latch after the clock pulse. As shown in Figure.6 the output signals of the first and second latches (q1 and q2) change during the clock pulse width , but the input signals of the second and third latches (d2 and d3) become the same as the output signals of the first and second latches (q1 and q2) after the clock pulse.



Fig 6: Shift register with latches, delay circuits, and a pulsed clock signal

As a result, all latches have constant input signals during the clock pulse and no timing problem occurs between the latches. However, the delay circuits cause large area and power over heads.

Use multiple non-overlap delayed pulsed clock signals:



Fig 7: Shift register with pulsed latches and non\_overlapped delayed pulsed clock signal

Another solution is to use multiple non-overlap delayed pulsed clock signals, as shown in Figure. 7. The delayed pulsed clock signals are generated when a pulsed clock signal goes through delay circuits. Each latch uses a pulsed clock signal which is delayed from the pulsed clock signal used in its next latch. Therefore, each latch updates the data after its next latch updates the data. As a result, each latch has a constant input during its clock pulse and no timing problem occurs between latches. However, this solution also requires many delay circuits. The figure 8 shows the simulation waveform for this type of arrangement.



Fig 8: Simulation waveforms of Shift register with pulsed

latches and non\_overlapped delayed pulsed clock signal In order to reduce the number of delay circuits and to avoid the timing problems a special design is considered.

#### Design implementation:

The proposed shift register is divided into M sub shifter registers to reduce the number of delayed pulsed clock signals as shown in figure.9.



Fig 9: Proposed Shift register

When an N-bit shift register is divided into M K-bit sub shift registers, the number of clock-pulse circuits is K+1 and the number of latches is N+N/K. A K-bit sub shift register consisting of K+1 latches requires K+1 pulsed clock signals. The number of sub shift registers M becomes N/K, each sub shift register has a temporary storage latch. Therefore, N/K latches are added for the temporary storage latches.

A 4-bit sub shift register consists of five latches and it performs shift operations with five non-overlap delayed pulsed clock signals (CLK\_pulse<1:4>) and CLK\_pulse<T>). In the 4-bit sub shift register #1, four

latches store 4-bit data (Q1-Q4) and the last latch stores 1-bit temporary data (T1) which will be stored in the first latch (Q5) of the 4-bit sub shift register #2. Initially, the pulsed clock signal CLK\_pulse<T> updates the latch data T1 from Q4. And then, the pulsed clock signals CLK\_pulse<1:4> update the four latch data from Q4 to Q1 sequentially. The latches Q2–Q4 receive data from their previous latches Q1– Q3 but the first latch Q1 receives data from the input of the shift register (IN). The operations of the other sub shift registers are the same as that of the sub shift register #1 except that the first latch receives data from the temporary storage latch in the previous sub shift register. The proposed shift register reduces the number of delayed pulsed clock signals significantly, but it increases the number of latches because of the additional temporary storage latches.

#### III. CHIP IMPLEMENTATION

This section includes chip implementation of Static Differential Sense Amp Shared Pulse Latch (SSASPL), implementation of Power PC style Flip-Flop (PPCFF) and implementation of clock pulse generator. The maximum clock frequency in the conventional shift register is limited to only the delay of flip-flops because there is no delay between flip-flips. Therefore, the area and power consumption are more important than the speed for selecting the flip-flop. The proposed shift register uses latches instead of flip-flops to reduce the area and power consumption. SSASPL:

The SSASPL uses the smallest number of transistors (7 transistors) [2] and it consumes the lowest clock power because it has a single transistor driven by the pulsed clock signal.



The SSASPL updates the data with three NMOS transistors and it holds the data with four transistors in two crosscoupled inverters. It requires two differential data inputs (D and  $D_b$ ) and a pulsed clock signal. When the pulsed clock signal is high, its data is updated. The node Q or  $Q_b$  is pulled down to ground according to the input data (D and  $D_b$ ). The pull-down current of the NMOS transistors must be larger than the pull-up current of the PMOS transistors in the inverters.

## Schematic:

The SSASPL schematic is shown in figure 11. The SSASPL was implemented and simulated with a 0.18  $\mu$ m CMOS process at V<sub>DD</sub>=1.8V.



Fig 11: Schematic of SSASPL

# Layout:

The layout of SSASPL is shown in figure 12. The layout was extracted using cadence analog tool (virtuoso).







# PPCFF:

Figure. 14 shows the schematic of the PPCFF, which is a typical master-slave flip-flop [7]-[12] composed of two latches. The PPCFF consists of 16 transistors and has 8 transistors driven by clock signals.



Schematic:

The PPCFF schematic is shown in figure 15. The PPCFF was implemented and simulated with a 0.18  $\mu m$  CMOS process at  $V_{DD}{=}1.8V.$ 



Fig 15: Schematic of PPCFF

Layout: The layout of PPCFF is shown in figure 16.



Fig 16: Layout of PPCFF



Fig 17: simulation waveforms of PPCFF Clock pulse generator:

The conventional delayed pulsed clock circuits in Figure. 3 can be used to save the AND gates in the delayed pulsed clock generator in Figure. 18. In the conventional delayed pulsed clock circuits, the clock pulse width must be larger than the summation of the rising and falling times in all inverters in the delay circuits to keep the shape of the pulsed clock. However, in the delayed pulsed clock generator in Figure 18 the clock pulsed width can be shorter than the summation of the rising and falling times because each sharp pulsed clock signal is generated from an AND gate and two delayed signals. Therefore, the delayed pulsed clock signals.

CLK\_pulse<T> CLK\_pulse<4> •••• CLK\_pulse<1>



The K+1 pulsed clock signals in Figure 19 are supplied to all sub shift registers. Each pulsed clock signal arrives at the sub shift registers at different time due to the pulse skew in the wire. The pulse skew increases proportional to the wire distance from the delayed pulsed clock generator. All pulsed clock signals have almost the same pulse skews when they arrive at the same sub shift register. Therefore, in the same sub shift register, the pulse skew differences between the pulsed clock signals are very small. The clock pulse intervals larger than the pulse skew differences cancel out the effects of the pulse skew differences. Also, the pulse skew differences between the different sub shift registers do not cause any timing problem, because two latches connecting two sub shift registers use the first and last pulsed clocks (CLK\_pulseT and CLK\_pulse1) which have a long clock pulse interval.



Fig 19: Simulation result of delayed pulsed clock generator

## IV. SIMULATION RESULTS

The designs projected during this paper has been developed using Cadence analog design (virtuoso).

All the shift registers schematics are drawn in Schematic-Editor of virtuoso Tools and from which using analog design environment EDA-L editing simulations are carried out and wave forms are generated.

Two 256-bit area-efficient shift registers using the SSASPL and PPCFF were implemented to show the effectiveness of the proposed shift register.

256-bit shift register using pulsed latch:

The SSASPL uses 7 transistors, which is the smallest number of transistors among the pulsed latches [8]-[10]. The schematic of 256-bit shift register design is shown in figure 20.





Fig 20: schematic of 256-bit shift register using pulsed latch

Layout:

The figure 21 shows the layout of 256-bit shift register using pulsed latch. The area occupied is  $160000 \mu m^2$ . The power consumption is 25.48mW at a frequency of 100Mhz.



Fig 21: layout of 256-bit shift register using pulsed latch Simulated waveform:



Fig 22: simulated waveform of 256-bit shift register using pulsed latch

256-bit shift register using flip-flop:

The PPCFF uses 16 transistors, which is the smallest number of transistors among the flip-flops. Figure. 11 shows the schematic of the PPCFF, which is a typical master-slave flipflop composed of two latches. The PPCFF consists of 16 transistors and has 8 transistors driven by clock signals. For a fair comparison, it uses the minimum size of transistors. Schematic:



#### Layout:

The figure 24 shows the layout of 256-bit shift register using flip-flop. The area occupied is 380000µm2. The power consumption is 43.26mW at a frequency of 100Mhz.



Fig 24: layout of 256-bit shift register using flip-flop

## Simulated waveform:



flip-flop

## Performance comparison:

The table 1 shows the comparison between SSASPL and PPCFF cells in terms of area, power, and number of transistors. Table 2 shows the comparison between 2 shift registers using flip-flop and latch.

| -       | -     | -       |         |        |           |
|---------|-------|---------|---------|--------|-----------|
| TABLE 1 | : Com | parison | between | SSASPL | and PPCFF |

| Туре                          |       | PPCFF                         | SSASPL                                                       |  |
|-------------------------------|-------|-------------------------------|--------------------------------------------------------------|--|
| Number<br>of                  | Total | 16                            | 7                                                            |  |
| transistors                   | clock | 8                             | 1                                                            |  |
| Area                          |       | 456.62 μm <sup>2</sup>        | 65.4682μm <sup>2</sup>                                       |  |
| Power @100Mhz                 |       | 1.4477mw =<br>1447.7µW        | 864.6µW                                                      |  |
| Sizes of<br>transistors (W/L) |       | NMOS=0.5/0.18<br>PMOS= 1/0.18 | M1-M3=1/0.18<br>NMOS=0.5/0.18<br>PMOS=1/0.18<br>In invertors |  |

| TADLE 2. Comparison between 2 sint registers | TABLE 2 : Comparis | on between | 2 shift | registers |
|----------------------------------------------|--------------------|------------|---------|-----------|
|----------------------------------------------|--------------------|------------|---------|-----------|

| Туре                                     | SSASPL | PPCFF       |
|------------------------------------------|--------|-------------|
| Word length of shift<br>register(N)      | 256    | 256         |
| Word length of sub shift<br>register (K) | 4      | No division |

| ISSN ( | Online) | ): | 2347 | - 4718 |
|--------|---------|----|------|--------|
|--------|---------|----|------|--------|

| Total number of latches/<br>flip-flop | 320                          | 256                                |
|---------------------------------------|------------------------------|------------------------------------|
| Area                                  | $160000 \ \mu m^2$<br>= 16cm | $380000_{2} \mu m^{2} = 38 cm^{2}$ |
| Power @100Mhz                         | 25.48mW                      | 43.26mW                            |

# V. APPLICATION

## Universal shift register:

A universal shift register is an integrated logic circuit that can transfer data in three different modes. Like a parallel register it can load and transmit data in parallel. Like shift registers it can load and transmit data in serial fashions, through left shifts or right shifts. In addition, the universal shift register can combine the capabilities of both parallel and shift registers to accomplish tasks that neither basic type of register can perform on its own. For instance, on a particular job a universal register can load data in series (e.g. through a sequence of left shifts) and then transmit/output data in parallel. In order for the universal shift register to operate in a specific mode, it must first select the mode. To accomplish mode selection the universal register uses a set of two selector switches, S1 and S0. As shown in Table 1, each permutation of the switches corresponds to a loading/input mode.

Table 3 : modes of shift register

| Operating mode   | S1 | S2 |
|------------------|----|----|
| locked           | 0  | 0  |
| Shift right      | 0  | 1  |
| Shift left       | 1  | 0  |
| Parallel loading | 1  | 1  |

Schematic of universal shift register:



Fig 26: Schematic of universal shift register



Fig 27: simulated waveforms of universal shift register

# VI. CONCLUSION AND FUTURE SCOPE

The area and power consumption of shift register is reduced by replacing flip-flops with pulsed latches. A 256-bit shift register was implemented using a 0.18  $\mu$ m CMOS technology with V<sub>DD</sub>=1.8V at clock frequency of 100 MHz. The 256-bit shift register using pulsed latches consumes 25.48mw power and 160000  $\mu$ m<sup>2</sup> area compared to conventional method that is shift register using flip flops and it saves 42% area and 50% power as compared to conventional method. In encryption basic building block is the shift register. A proposed shift register design is used instead of using general shift register design in the encryption, the area and the power consumption reduces.

#### REFERENCES

- [1]. Byung-Do Yang "Low-Power and Area-Efficient Shift Register Using Pulsed Latches" ieee transactions on circuits and systems—i: regular papers, vol. 62, no. 6, june 2015
- [2]. S. S. KHOT & PEDGAONKAR SNEHAL,"Design And Implementation Of Area Efficient And Low power Shift Register Using Pulsed Latch" International Journal of Electrical and Electronics Engineering Research (IJEEER)
- [3]. S. Heo, R. Krashinsky, and K. Asanovic, "Activitysensitive flip-flop and latch selection for reduced energy," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 15, no. 9, pp. 1060–1064, Sep. 2007.
- [4]. S. Naffziger and G. Hammond, "The implementation of the nextgeneration 64 b itanium microprocessor," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2002, pp. 276–504.
- [5]. H. Partovi et al., "Flow-through latch and edgetriggered flip-flop hybrid elements," IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 138–139, Feb. 1996.
- [6]. E. Consoli, M. Alioto, G. Palumbo, and J. Rabaey, "Conditional push-pull pulsed latch with 726 fJops energy delay product in 65 nm CMOS," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2012, pp. 482–483.
- [7]. V. Stojanovic and V. Oklobdzija, "Comparative analysis of masterslave latches and flip-flops for high-performance and low-power systems," IEEE J. Solid-State Circuits, vol. 34, no. 4, pp. 536–548, Apr. 1999.
- [8]. J. Montanaro et al., "A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor," IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1703–1714, Nov. 1996.
- [9]. S. Nomura et al., "A 9.7 mW AAC-decoding, 620 mW H.264 720p 60fps decoding, 8-core media processor with embedded forwardbody- biasing and power-gating circuit in 65 nm CMOS technology," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2008, pp. 262–264.
- [10]. Y. Ueda et al., "6.33 mW MPEG audio decoding on a multimedia processor," in IEEE Int. Solid-State

Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2006, pp. 1636–1637.

- [11]. B.-S. Kong, S.-S. Kim, and Y.-H. Jun, "Conditional-capture flip-flop for statistical power reduction," IEEE J. Solid-State Circuits, vol. 36, pp. 1263–1271, Aug. 2001.
- [12]. C. K. Teh, T. Fujita, H. Hara, and M. Hamada, "A 77% energy-saving 22-transistor single-phaseclocking D-flip-flop with adaptive-coupling configuration in 40 nm CMOS," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2011, pp. 338–339.