DESIGN OF AN AREA EFFICIENT MOTION ESTIMATION ARCHITECTURE THROUGH VERILOG HDL

Motapalukua Rajesh¹, Dr.V.Padanabha Reddy², Mrs.K.Ragini³
¹PG scholar, ²Professor, ³Associate Professor
¹Dept of ECE(VLSI), CMRIT, Kandlakoya(v), Medchal road, Hyderabad, TS, India, 
²³Dept of ECE, CMRIT, Kandlakoya(v), Medchal road, Hyderabad, TS, India,

Abstract: The variable block sizes motion estimation in H.264 is key technique to remove inter-frame redundancy. This technique not only requires huge memory bandwidth but also its computation complexity is higher. Therefore, this paper proposes one efficient sub-pixel search algorithm for reducing computation complexity and block utilization, and a novel VLSI architecture for this algorithm which simplifies variable block sizes motion estimation. The proposed method is efficient compared with those of existing methods which have negative effects on compression, with respect to chip area, operation frequency, and throughput rate. The proposed sub-pixel search architecture decreases the numbers of search pixels of full pixels motion estimation by around 70% and the chip area by around 40% than the others search algorithm. Besides, an optimized motion estimation MV prediction algorithm is used to remove data dependency, and optimization storage policies are used to save hardware resources. The proposed sub-pixel search architecture can work at 200 MHz with 530k gate count, which supports high-definition television 1920×1080 format.

Key-Words: Sub-pixel search, Systolic array, H.264 Encoder, Motion Estimation, 1080P, HDTV, VLSI.

I. INTRODUCTION

The H.264 Advanced Video Codec is an ITU standard for encoding and decoding video with a target coding efficiency twice that of H.263 and with comparable quality [1]. An increasing number of services and growing popularity of HDTV are creating much more need for higher coding efficiency. An important coding tool of H.264 is the variable block size matching algorithm for the ME (Motion Estimation) which is a part of the prediction step [2]. Because of the use of variable block-matching motion estimation (Variable Block Sizes Motion Estimation, VBSME), multiple reference frame motion compensation (Motion Compensation, MC) and Lagrange rate-distortion optimization (Rate Distortion Optimization, RDO) and other advanced coding techniques, making the integer pixel motion estimation (IME) and fractional pixel motion estimation (FME) consisting of inter-frame motion estimation process takes up more than 70% of the entire encoder encoding computing time. So the integer pixel motion estimation is the bottleneck of H.264 encoder hardware implementation. The key problem of the H.264 encoder for HDTV is that the bandwidth of memories access is limited. Many efficient techniques have been used to reduce the complexity, for example Full-Search motion estimation, UMHexagon search, NTSS search etc by JM software. At the same time many hardware architecture have been proposed by some researchers. The authors researched full-pixel search motion estimation hardware implementation includes Anchao Tsai, Mohammed Sayed and Weifeng He etc. In paper, the authors present an efficient architecture design based on the search point reduction for HDTV variable block size ME of H.264/AVC. The hardware architecture is implemented with the 2-D systolic array and it successfully increases the coding speed at the expense of hardware cost. The 2-D systolic array successfully reduces the data reuse for pixel SAD computation, but it increased the number of control circuits for its complexity.

The authors researched parallel-pipelined architecture based on full search block matching algorithm, proposed an architecture consisting of two main parts: the SAD computing part and the SAD comparing part with pipeline registers between them and a control unit to control their operation. Using the techniques of pipeline circuit and reducing supply voltage reduce the power consumption and simplify the control circuit. But the drawback of this technique is that the data reuse rate is low for reading data from storage. The full-search algorithm exhaustively computes all candidate blocks to find the best match within a particular window. Therefore, this technique has enormous complexity. In order to reduce the motion estimation complexity, many fast searching algorithms are presented, but they have not perfect solutions. In paper use fast ME algorithm called HMDS to reduce bandwidth, but the hardware implementation of HMDS algorithms need more logic circuits. The modified three step search (TSS) algorithm is used to reduce the computational cost and the memory access in the motion estimation part. Those fast ME algorithm can dramatically reduce the search points, but the efficiency of VLSI architecture is decreased because of the lack of regularity. So the most VLSI implementations of motion estimations adopt full-search mode for regular designs. However, such full-search chips are not suitable for portable systems due to more bandwidth and power consumption for HDTV. This paper proposes an efficient sub-pixel search algorithm for variable block-matching motion estimation. The efficiency of the sub-pixel search cooperated with a simplified predicted MV is verified for H.264/AVC encoders. We found that the sub-pixel search ME can reduce hardware consumption around 40% compared to JM reference software with negligible video quality loss. To realize the sub-pixel search algorithm, the
VBSME architecture is designed by using a 1-D systolic array. Thus, the proposed architecture can compute the optimal MV more efficiently than the existing ones found in the literature. The proposed VBSME architecture with input memory array includes the sum computation of absolute difference (SAD) and Lagrangian cost function. Simulation results demonstrate that the proposed scheme has better coding performance than conventional architectures.

II. LITERATURE SURVEY

Efficient Hardware Architecture for Selective Gray Coded Bit Plane Based Low Complexity Motion Estimation.

In video compression, motion estimation is exploited to remove temporal redundancy. Computationally, motion estimation is one of the most expensive parts of a video encoder. In this work, efficient and novel hardware architecture is proposed to implement selective Gray-coded bit-plane based motion estimation algorithm. Spiral search algorithm is employed as search scheme in the novel hardware architecture. Experimental results show that considerable amount of hardware resources are saved thanks to the proposed architecture compared to the recent works in the literature.

HIGH PERFORMANCE HARDWARE ARCHITECTURE FOR CONSTRAINED ONE-BIT TRANSFORM BASED MOTION ESTIMATION

Motion estimation (ME) processes is considered as the most computationally intensive part of the conventional video compression standards. Low bit-depth representation based ME approaches provides an important alternative to reduce this computational load by making use of a lightweight and hardware efficient matching criteria. Constrained one-bit transform (C-1BT) based ME employs only two bit-planes and stands out its superior performance compared to other low bit-depth based ME approaches. Recently an adaptive search range determination algorithm is proposed to further speed-up C-1BT based ME. This paper presents novel hardware architecture for adaptive search range determination based ME method mentioned above. Proposed architecture implements spiral search method. No on-chip memory is needed neither for reference and nor for current macro blocks. Thus, there is no need to design a complex memory hierarchy and control logic to implement the spiral search method. A data reuse scheme among adjacent search windows is utilised with thanks to two axis rotatable two dimensional shift register architecture. Thus, a very low off-chip memory bandwidth can be achieved.

C-1BT BASED ME USING ADAPTIVE SEARCH RANGE

Novel hardware architecture for C-1BT based ME with adaptive search range method is proposed. Additionally, spiral search scheme is utilised which is not mentioned. The block diagram of the data path of the hardware architecture is shown in Figure 1.
Figure 3- Utilized spiral search scheme for the proposed hardware architecture.

The combinatorial path delay of the parallel counter block has been obtained about 11.57 ns for a 45nm technology FPGA device. Actually this number reduces the maximum achievable clock frequency to approximately 90 MHz that is why 2 pipeline stages are added between the sub parallel counter stages of 15 4 31 5 and 63 6 127 7 . The output of the parallel counter is also synchronized with clock to divide the combinational path to next stage where comparison operation is performed. 8 input LUT based architecture is proposed to count the number of non-matching pixels. Then, several architectures are proposed to overcome the bottleneck due to the logarithmic relation between input width and the LUT depth. 4 input LUT based non-matching counter architecture is proposed to reduce the area of the hardware. Recently parallel counter method is proposed to count the number of non-matching pixels and according to the best of our knowledge it is the most hardware efficient method reported yet.

III. PROPOSED SYSTEM
BINARIZATION APPROACH
At the first step of BPM based ME methods it is required to convert full bit depth image frames into lower bit-depth representation. Then, motion estimation is performed by making use of a suitable matching criterion and search range. The main advantages of the BPM based ME methods originate from their higher speed, smaller footprint for area and power in hardware implementation. As shown in many recent works [38]-[43], efficient hardware architectures are presented in the literature for BPM based ME methods. However, the cost of binarization and its hardware cost in the case of 1BT, MF-1BT, 2BT, C-1BT, WC-1BT based ME is neglected. The T-GCBPM based method has a significant advantage since the binarization can be implemented by making use of simple EX-OR operations or look-up tables (LUTs). Gray-coded bit-plane matching based methods propose to employ a pre-selected single or several bit-planes, respectively. Eight gray-coded bit-plane of an image frame from the Foreman sequence. As seen from this figure, higher bit-planes contain most of the information available in the original frame. However, when a single Gray coded bit-plane is evaluated, it does not provide enough information about the original frame. Since the method in [21] utilizes only a certain Gray coded bit-plane its ME performance for different image contents may not be adequate. However, because of the single bit-plane utilized, the overall computational complexity at the matching stage will be lower. On the other hand, the T-GCBPM based method employs the 3 most significant bit-planes (i.e. g7, g6, g5) to represent images and thus provides better performance with additional computation complexity in matching stage. In this paper, we propose a novel combination of the methods presented by Çelebi et al [23] and Kuo et al [37] to construct a single bit-plane for each candidate positions which contains Gray coded pixel values from the 3 most significant bit-planes. By the proposed selection and placement of the 3 most significant bits of pixel Gray-code to construct single bit-plane for matching, it becomes possible to exploit advantages of both methods. Note that, the proposed method utilizes a different bit-plane selection and placement scheme for each candidate location compared to the method presented by where 4 bits are utilized in an interlaced fashion as shown in Fig. 4. In this paper, we present a novel bit-plane selection and placement scheme which improves ME accuracy compared. The bit-plane selection approach proposed in this paper is shown in Fig. 5 for a 16x16 image block. Note that we construct binary image blocks for each candidate location separately. The related works in GCBPM based ME that the contribution of the five least significant bitplanes to ME accuracy is limited compared to the most significant 3-bit planes.
Thus, we prefer not to include g4 into our selection scheme. Additionally, distributed placement of bit-planes compared to the method presented by Kuo et al enables accurate matching since the distance between selected bit-plane positions are increased for neighbor pixels. Our experiments show that the proposed bit-plane selection and placement approach is able to improve ME accuracy.

IV. CONCLUSION
A selective Gray-coded bit-plane based binarization approach for low complexity motion estimation with its hardware architecture is presented. The proposed BPM based ME method outperforms single bit-plane based methods existing in the literature while providing similar or better performance than the methods utilizing two bit-planes. It is important to note that selective Gray-coded bit-plane based method has the lowest binarization cost among the compared methods except the conventional Gray coded BPM methods. The proposed binarization approach is efficiently implemented in hardware. It is shown that the architecture proposed is suitable for seamless integration into state of the consumer electronics devices by making use of a common bus interconnect. Experimental results revealed that the proposed architecture is capable of providing data reuse to reduce both off chip data access time and power consumption dramatically.

REFERENCES

Authors profile
MOTAPALUKULA RAJESHreceived his bachelor’s degree in 2015 in electronics and communication engineering from AIZZA COLLEGE OF ENGG &TECH, India which is affiliated with JNTU Hyderabad, India. His areas of interest include VLSI design. He is pursuing his M-Tech in VLSI SYSTEM DESIGN from CMR Institute of Technology.

Dr.V.PADMANABHA REDDY,(received the B.Tech in Electronics and Communications Engineering from Regional Engineering College, Warangal, Telengana, India and M.Tech in Digital systems and Computer Electronics from J.N.T University college of Engineering Hyderabad. Telengana, Ph.D., in Sri Venkateswara University College of Engineering Tirupati, Andra Pradesh, India. He has 20 years experience of teaching under graduate and post graduate students. Currently working as a faculty in the department of ECE, CMR Institute of Technology, Hyderabad, Telengana, India. His research interests are in the areas of Signal Processing, Digital Image processing and Digital Image watermarketing.

Mrs.K.RAGINiis working as an assistant professor in CMR Institute of Technology