Chapter 8
Major decisions in digital design are based on the ratio of sampling clock to circuit clock. The sampling clock is specific to an application and is derived from the Nyquist sampling criteria or band-pass sampling constraint. The circuit clock, on the other hand, primarily depends on the design and the technology used for implementation. In many high-end applications, the main focus of design is to run the circuit at the highest possible clock rate to get the desired throughput. If a simple mapping of the dataflow graph (DFG) on hardware cannot be synthesized at the required clock rate, the designer opts to use several techniques.
In feedforward designs, an unfolding transformation makes parallel processing possible. This results in an increase in throughput. Pipelining is another option in feedforward design for better timing. Pipelining is usually the option of choice as it results in a smaller area than with an unfolded design. In feedback DFGs, the unfolding transformation does not result in true parallel processing as the circuit clock rate needs to be reduced and hence does not give any potential iteration period bound (IPB) improvement. The only benefit of the unfolding transformation is that the circuit can be run at slower clock as each register is slower by the unfolding factor. In FPGA-based design, with a fixed number of registers and embedded computational units, unfolding helps in optimizing designs that require too many algorithmic registers. The design is first unfolded and then the excessive registers are retimed to give better timing. The chapter presents designs of FIR and IIR filters where unfolding and then retiming achieves better performance.
In contrast to dedicated or parallel architectures, time-shared architectures are designed in instances where the circuit clock is at least twice as fast as the sampling clock. The design running at circuit clock speed can reuse its hardware resources, as the input data remains valid for multiple circuit clocks. For many applications the designer can easily come up with a time-shared architecture. The chapter describes several examples to highlight these design issues.
For many applications, designing an optimal time-shared architecture may not be simple. This chapter covers mathematical transformation techniques for folding in time-multiplexed architectures. These transformations take the DFG representation of a synchronous digital signal processing (DSP) algorithm, a folding factor along with schedule of folding and then they systematically generate a folded architecture that maps the DFGon fewer hardware computational units. The folded architecture with the schedule can then be easily implemented using, respectively, a datapath consisting of computational nodes and a controller based on a finite state machine. The chapter gives examples to illustrate the methodology. These examples are linked with their implementation as state machine-based architecture.
|