### 128-BIT AREA-EFFICIENT CARRY SELECT ADDER

Saraswati VISI Design and Embedded Systems VTU Regional centre, Gulbarga, Karnataka, India.

Abstract: In many data-processing processors Carry Select Adder (CSLA) is one a fastest adders used to perform arithmetic functions. The upcoming technologies depicts that there is a scope for reducing the area and power consumption in the CSLA. This work uses a simple gatelevel modification to significantly reduce the area and power of the CSLA. Based on this modification CSLA architecture have been developed and can be compared with the regular CSLA architecture. The proposed design has reduced area and power as compared with the regular CSLA with only a slight increase in the delay. This work evaluates the performance of the proposed designs in terms of delay, area, power, and their products by hand with logical effort and through custom design and layout in 0.18-m CMOS process technology. The results analysis shows that the proposed CSLA structure is better than the regular CSLA.

#### I. INTRODUCTION

In VLSI systems, Design of area and power efficient high speed information lane logic systems are one of the wellorganized areas of research. In normal digital adders, the speed of addition is restricted by the time required to transmit a carry through the adder. The sum for each bit position in an elementary adder is generated sequentially only after the previous bit position has been summed and a carry propagated into the next position. The CLSA is used in many computational systems to alleviate the problem of carry transmission delay by separately generate multiple carries and then choose a carry to produce the sum [1]. However, the present CSLA is not area competent because it uses multiple pairs of Ripple Carry Adders (RCA) to generate sum and carry by considering carry input and then the final sum and carry are selected by multiplexers (mux). The basic idea of this work is to use Binary to Excess-1 converted (BEC) instead of RCA with in the regular CSLA to achieve lower area and power consumption [2]-[4]. The BEC logic is more advantageous than RCA as it is constructed from lesser number of logic gates than the bit Full Adder (FA) structure used in RCA. The details of the BEC logic are discussed in Section 3. Section 2 deals with the delay and area evaluation methodology of the basic adder blocks. Section 3 presents the structure and function of BEC logic. The CSLA has been chosen for comparison with the proposed design as it has more balanced delay, and requires low power and area [5], [6]. The delay and area evaluation methodology of the regular and modified CSLA are presented in Sections 4 and 5, respectively. The ASIC implementation details and results are analyzed in Section 4. Finally, the work is concluded in Section 7.



Fig.1 XOR gate



Fig 2.gate level implementation of 4-bit BEC



Fig. 3. 4-b BEC with 8:4 mux.

# II. DELAY AND AREA ESTIMATION METHODOLOGY OF THE REGULAR CSLA

The Gate level implementation (AOI) of an XOR gate is shown in Fig. 1. The gates are performing the operations in parallel and the numeric representation of each gate indicates the delay contributed by that gate. The delay and area evaluation methodology considers all gates to be made up of AND, OR, and Inverter, each having delay equal to 1 unit and area equal to 1 unit. The maximum delay is calculated by adding up the number of gates in the longest path of a logic block that contributes. The area evaluation is done by counting the total number of AOI gates obligatory for each logic block. Based on this loom, the CSLA adder blocks of 2:1 mux, Half Adder (HA), and FA are evaluated and listed in Table I.

#### III. BCD TO EXCESS-1 CONVERTION

As stated above, in order to reduce the area and power consumption of the regular CSLA the main idea of this work is to use BEC instead of the RCA with  $C_{in}$ =1. To replace the n bit RCA, an n+1 bit BEC is required. A structure and the function table of a 4 bit BEC are show in Fig.2 and Table II respectively.



Fig. 4. Regular 16-h SQRT CSLA.

TABLE I
DELAY AND AREA COUNT OF THE BASIC BLOCKS OF CSLA

| Delay  | Area        |   |
|--------|-------------|---|
| 3      | 5           |   |
| 3<br>3 | 4<br>6      |   |
|        |             | 6 |
|        | 3<br>3<br>3 |   |

Fir.3 illustrates how the basic function of the CSLA is obtained by using 4-bit BEC together with the mux. One input of the 8:4 mux gets as it input ((B3, B2, B1, and B0) and another input of the mux is the BEC output. This produces the two possible partial results in parallel and the mux is used to select either the BEC output or the direct

inputs according to the control signal Cin. The importance of the BEC logic stems from the large silicon area reduction when the CSLA with large number of bits are designed. The Boolean expression of the 4-bit BEC is listed as symbol.



# IV. DELAY AND AREA ESTIMATION METHODOLOGY OF REGULAR 16-B CSLA

The structure of the 16-b regular CSLA is shown in Fig. 4. It has five groups of different size RCA. The delay and area estimation of each group are shown in Fig. 5, in which the numerals within specify the delay values, e.g., sum2 requires 10 gate delays. The steps leading to the evaluation are as follows. 1) The group2 [see Fig. 5(a)] has two sets of 2-b RCA. Based on the consideration of delay values of Table I, the arrival time of selection input of 6:3 mux is earlier than and later than. Thus, is summation of and 2) Except for group2, the arrival time of mux selection input is always greater than the arrival time of data outputs from the RCA's. Thus, the delay of group3 to group5 is determined, respectively as follows: 3) the one set of 2-b RCA in group2 has 2 FA and the other set has 1 FA and 1 HA. Based on the area count of Table I, the total number of gate counts in group2 is determined as follows:

Gate count = 43 (FA + HA + Mux + BEC)

FA = 13(1 \* 13)

HA = 6(1\*6)

AND = 1

NOT = 1

XOR = 10(2\*5)

Mux = 12(3\*4).

4) Similarly, the evaluated maximum delay and area of the other groups in the regular SQRT CSLA are evaluated and listed.

### V. DELAY AND AREA ESTIMATION METHODOLOGY OF MODIFIED 16-B CSLA

The structure of the proposed 16-b SQRT CSLA using BEC for RCA to minimize the area and power is shown in Fig. 6. We again split the structure into five groups. The delay and area estimation of the other groups of the modified CSLA are evaluated and listed in Table IV. Comparing Tables III and IV, it is clear that the proposed modified CSLA saves 113 gate areas than the regular CSLA, with only 11 increases in gate delays. To further evaluate the performance, we have resorted to ASIC implementation and simulation.

#### VI. ASIC IMPLEMENTATION RESULTS

The design proposed in this paper has been developed using Verilog-HDL and synthesized in Cadence RTL compiler using typical libraries of TSMC 0.18 um technology. The synthesized Verilog netlist and their respective design constraints file (SDC) are imported to Cadence SoC Encounter and are used to generate automated layout from standard cells and placement and routing [7]. Parasitic extraction is per-formed using Encounter's Native RC extraction tool and the extracted parasitic RC (SPEF format) is back annotated to Common Timing Engine in Encounter platform for static timing analysis. For each word size of the adder, the same value changed dump (VCD) file is generated for all possible input conditions and imported the same to Cadence Encounter Power Analysis to perform the power simulations.

TABLE IV
DELAY AND AREA COUNT OF MODIFIED SQRT CSLA

| Group  | Delay | Area |
|--------|-------|------|
| Group2 | 13    | 43   |
| Group3 | 16    | 61   |
| Group4 | 19    | 84   |
| Group5 | 22    | 107  |



Fig. 7. Delay and area evaluation of modified SQRT CSLA: (a) group2, (b) group3, (c) group4, and (d) group5. H is a Half Adder.

### VII. CONCLUSION

An approach is proposed in this paper to reduce the area and power of CSLA architecture. The reduced number of gates of this work offers the great advantage in the reduction of area and also the total power. The compared results show that the modified SQRT CSLA has a slightly larger delay (only 3.70%), but the area and power of the 128-b modified

CSLA are significantly reduced by 17.6% and 15.8% respectively. The power-delay product and also the areadelay product of the proposed design show a decrease for all sizes which indicates the success of the method and not a tradeoff of delay for power and area. The modified CSLA architecture is therefore, low power, low area, simple and efficient for VLSI hardware implementation.

### **REFERENCES**

- [1] O. J. Bedrij, "Carry-select adder," IRE Trans. Electron. Comput. 1962.
- [2] B. Ramkumar, H.M. Kittur, and P. M. Kannan, "AS IC implementation of modified faster carry save adder," Eur. J. Sci. Res., vol. 41, no. 1, 2009.
- [3] T. Y. Ceiang and M. J. Hsiao, "Carry-select adder using single ripple carry adder," Electron. Lett. vol. 34, no. 22, pp. 2101–2103, Oct. 1998.

- [4] Y. Kim and L.-S. Kim, "64-bit carry-select adder with reduced area," Electron. Lett. vol. 36, no. 9, May 2000.
- [5] J. M. Rabaey, Digtal Integrated Circuits—A Design Perspective. Upper Saddle River, NJ: Prentice-Hall, 2001.
- [6] Y. He, C. H. Chang, and J. Gu, "An area efficient 64-bit square root carry-select adder for low power applications," in P roc. IEEE Int. Symp. Circuits Syst., 2005, vol. 4, pp. 4082–4085.