2D Digital Filter Implementation on a FPGA

Tsuei, Danny Teng-Hsiang

2D Digital Filter Implementation on a FPGA

Files

Tsuei_Danny.pdf (2.33 MB)

Date

2011-08-31T18:27:59Z

Authors

Tsuei, Danny Teng-Hsiang

Publisher

University of Waterloo

Abstract

The use of two dimensional (2D) digital filters for real-time 2D data processing has found important practical applications in many areas, such as aerial surveillance, satellite imaging and pattern recognition. In the case of military operations, real-time image pro-cessing is extensively used in target acquisition and tracking, automatic target recognition and identi cation, and guidance of autonomous robots. Furthermore, equal opportunities exist in civil industries such as vacuum cleaner path recognition and mapping and car collision detection and avoidance. Many of these applications require dedicated hardware for signal processing. It is not efficient to implement 2D digital filters using a single processor for real-time applications due to the large amount of data. A multiprocessor implementation can be used in order to reduce processing time. Previous work explored several realizations of 2D denominator separable digital filters with minimal throughput delay by utilizing parallel processors. It was shown that regardless of the order of the filter, a throughput delay of one adder and one multiplier can be achieved. The proposed realizations have high regularity due to the nature of the processors. In this thesis, all four realizations are implemented in a Field Programming Gate Array (FPGA) with floating point adders, multipliers and shift registers. The implementation details and design trade-offs are discussed. Simulation results in terms of performance, area and power are compared. From the experimental results, realization four is the ideal candidate for implementation on an Application Specific Integrated Circuit (ASIC) since it has the best performance, dissipates the lowest power, and uses the least amount of logic when compared to other realizations of the same filter size. For a filter size of 5 by 5, realization four can produce a throughput of 16.3 million pixels per second, which is comparable to realization one and about 34% increase in performance compared to realization one and two. For the given filter size, realization four dissipates the same amount of dynamic power as realization one, and roughly 54% less than realization three and 140% less than realization two. Furthermore, area reduction can be applied by converting floating point algorithms to fixed point algorithms. Alternatively, the denormalization and normalization stage of the floating point pipeline can be eliminated and fused together in order to save hardware resources.