Chapter 4

Although there are many applications that are commonly mapped on field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs), the primary focus of this book is to explore architectures and design techniques that implement real-time signal processing systems in hardware. A high data rate, secure wireless communication gateway is a good example of a signal processing system that is mapped in hardware. Such a device multiplexes several individual voice, data and video channels, performs encryption and digital modulation on the discrete signal, up-converts the signal, and digitally mixes it with an intermediate frequency (IF) carrier.

These types of digital design implement complex algorithms that operate on large amounts of data in real time. For example, a digital receiver for aG703compliantE3microwave link needs to demodulate 34.368Mbps of information that were modulated using binary phase-shift keying (BPSK) or quadrature phase shift keying (QPSK). The system further requires forward error-correction (FEC) decoding and decompression along with several other auxiliary operations in real time.

These applications can be conveniently conceived as a network of interconnected components. For effective HW mapping, these components should be placed to work independently of other components in the system. The Kahn Process Network (KPN) is the simplest way of synchronizing their interworking. Although the implementation of a system as a KPN theoretically uses an unbounded FIFO (first-in/first-out) between two components, several techniques exist to optimally size these FIFOs. The execution of each component is blocked on a read operation on FIFOs on incoming links. A KPN also works well to implement event-driven and multi-rate systems. For signal processing applications, each component in the KPN implements one distinct block of the system. For example, the FEC in a communication receiver is perceived as a distinct block. This chapter gives a detailed account of the KPN and demonstrates its effectiveness and viability for digital system design with a case study.

In addition to this, for implementing logic in each block, the algorithm for the block can first be described graphically. There are different graphical ways of representing signal processing algorithms. The dataflow graph (DFG), signal flow graph, block diagram and data dependency graph (DDG) are a few choices for the designer. These representations are quite similar but have subtle differences. This chapter gives an extensive account of DFG and its variants. Graphical representation of the algorithm in any of these formats gives convenient visualization of algorithms for architecture mapping.

For blocks that need to process data at a very high data rate, the fastest achievable clock frequency is almost the same as the number of samples the block needs to process every second. For example, a typical BPSK receiver needs four samples per symbol to decode bits; for processing 36.5 Mbps of data, the receiver must process 146 million samples every second. The block implementing the front end of such a digital receiver is tasked to process even higher numbers of samples per symbol. In these applications the designer aims to achieve a clock frequency that matches the sampling frequency. This matching eases the digital design of the system, because the entire algorithm then requires one-to-one mapping of algorithmic operations to HW operators. This class of architecture is called ‘fully dedicated architecture’ (FDA). The designer, after mapping each operation to a HW operator, may need to appropriately place pipeline registers to bring the clock frequency equal to the sampling frequency.

The chapter describes this one-to-one mapping and techniques and transformations for adding one or multiple stages of pipelining for better timing. The scope of the design is limited to synchronous systems where all changes to the design are mastered by either a global clock or multiple clocks. The focus is to design digital logic at register transfer level (RTL). The chapter highlights that the design at RTL should be visualized as a mix of combinational clouds and registers, and the developer then optimizes the combinational clouds by using faster computational units to make the HW run at the desired clock while constraining it to fit within a budgeted silicon area. With feed forward algorithms the designer has the option to add pipeline registers in slower paths, whereas in feedback designs the registers can be added only after applying certain mathematical transformations. Although this chapter mentions pipelining, a detail treatment of pipelining and retiming are given exclusive coverage in Chapter 7.