Chapter 6
In many digital system processing (DSP) and communication algorithms a large proportion of multiplications are by constant numbers. For example, the finite impulse response (FIR) and infinite impulse response (IIR) filters are realized by difference equations with constant coefficients. In image compression, the discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) are computed using data that is multiplied by cosine values that have been pre-computed and implemented as multiplication by constants. The same is the case for fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT) computation. For fully dedicated architecture (FDA), where multiplication by a constant is mapped on a dedicated multiplier, the complexity of a general purpose multiplier is not required.
The binary representation of a constant clearly shows the non-zero bits that require the generation of respective partial products (PPs) whereas the bits that are zero in the representation can be ignored for the PP generation operation. Representing the constant in canonic sign digit (CSD) form can further reduce the number of partial products as the CSD representation of a number has minimum number of non-zero bits. All the constant multipliers in an algorithm are in double-precision floating-point format. These numbers are first converted to appropriate fixed-point format. In the case of hardware mapping of the algorithm as FDA, these numbers in fixed-point format are then converted into CSD representation.
The chapter gives the example of an FIR filter. This filter is one of the most commonly used algorithmic building blocks in DSP and digital communication applications. An FIR filter is implemented by a convolution equation. To compute an output sample, the equation takes the dot product of a tap delay line of the inputs with the array of filter coefficients. The coefficients are predesigned and are double-precision floating-point numbers. These numbers are first converted to fixed-point format and then to CSD representation by applying the string property on their binary representation. A simple realization generates the PPs for all the multiplication operations in the dot product and reduces them using any reduction tree discussed in Chapter 5. The reduction reduces the PPs to two layers of sum and carry, which are then added using any carry propagate adder (CPA). The combinational cloud of the reduction logic can be pipelined to reduce the critical path delay of the design. Retiming is applied on an FIR filter and the transformed filter becomes a transposed direct form (TDF) FIR filter.
The chapter then describes techniques for complexity reduction. This further reduces the complexity of design that involves multiplication by constants. These techniques exploit the multiple appearances of common sub-expressions in the CSD representation of constants. The techniques are also applicable for designs where a variable is multiplied by an array of constants, as in a TDF implementation of an FIR filter.
The chapter discusses mapping a signal processing algorithm represented as a dataflow graph (DFG) on optimal hardware. The optimization techniques extensively use compression trees and avoid the use of CPAs, because from the area and timing perspectives a fast CPA is one of the most expensive building blocks in FDA implementation. The DFG can be transformed to avoid or reduce the use of CPAs. The technique is applied on IIR systems as well. These systems are recursive in nature. All the multipliers are implemented as compression trees that reduce all PPs to two layers of carry and sum. These two layers are not added inside the feedback loop, rather they are fed back as a partial solution to the next block. This helps in improving the timing of the implementation.
|