Description
The book is written to be adopted for a number of courses at senior undergraduate and graduate levels. It can be used for a senior undergraduate course on Advanced Digital Design and VLSI Signal Processing. Similarly the contents can be selected in a way to form a graduate level course in these two subjects.
This book covers contents while keeping in perspective that a student, after taking the course, would be productive in an industrial setting in different roles. He or she could be a good algorithm developer, a digital designer and a verification engineer. This website also hosts RTL Verilog code of the examples in the book and readers can also download PDF files of Microsoft PowerPoint presentations of lectures covering the material in the book. The lab exercises are also provided for teachers’ support.
The main features of the book are given below:
- The contents of this book show how to code algorithms in high-level languages in a way that facilities their subsequent mapping on hardware-specific platforms. The book covers issues in implementing algorithms using fixed-point format. The ultimate conversion of algorithms developed in double-precision floating-point format to fixed-point is a critical design stage in system implementation. The conversion not only requires simple translation but in many cases also requires the designer to explore other structural options for mitigating quantization effects of fixed-point implementation. A number of commercially available system design and simulation tools are adding support for fixed-point conversion and simulation. The MATLAB_ fixed-point toolbox and utilities are important, and so is the support extended for fixed-point arithmetic in other high-level description languages such as SystemC. The issues of overflow, saturation, truncation and rounding are critical. The normalization and block floating-point option to optimize implementation should also be learnt. Chapter 3 covers all these issues and demonstrates the methodology with examples.
- The next step in system design is to perform HW–SW partitioning. Usually this decision is made by an experienced designer. Chapters 1 and 3 give broad rules that help the designer to make this decision. The portion that is set aside for mapping in hardware is then explored for several architectural design options. Different ways of representing algorithms and their coding in MATLAB_ are covered in Chapter 4. The chapter also covers mapping of the graphical representation of an algorithm on fully dedicated hardware.
- Following the discussion on fully dedicated architectures, Chapter 5 lists designs of basic computational blocks. The chapter also highlights the architecture of embedded computational units in FPGAs and their effective use in the design. This discussion logically extends to algorithms that require multiplications with constants.
- Chapter 6 gives an account of architectural optimization for designs that involve multiplications with constants. Depending on the throughput requirement and the target technology that constrains the clock rate, the architectural design decisions are made. Mapping an application written in a high-level language to hardware requires insight into the algorithm. Usually signal processing applications use nested loops. Unfolding and folding techniques are presented in Chapter 7. These techniques are discussed for code written in high-level languages and for algorithms that are described graphically as a dataflow graph (DFG). Chapter 4 covers the representation of algorithms as dataflow graphs. Different classes of DFGs are discussed. Many top-level design options are also discussed in Chapter 4 and Chapter 13. These options include a peer-to-peer KPN-connected network, shared bus-based designs, and network-on-chip (NoC) based architectures. The top-level design is critical in overall performance, easy programmability and verification.
- In Chapter 13 a complex application is considered as a network of connected processing elements (PEs). The PEs implement the functionality in an algorithm whereas the interconnection framework provides inter-PE communication.
- While discussing the hardware mapping of functionality in a PE, several design options are considered. These options include fully dedicated architecture (Chapter 4 and Chapter 6), parallel and unfolded architectures (Chapter 8), folded and time-shared architectures (Chapters 8 and Chapter 9) and programmable instruction set architectures (Chapter 10). Each architectural design option is discussed in detail with examples. Tradeoffs are also specified for the designer to gauge preferences of one over the other. Special consideration is given to the target platform. Examples of FGPAs with embedded blocks of multipliers with fixed set of registers are discussed in Chapter 5.
- Mapping of an algorithm in hardware must take into account the target technology. Novel methodologies for designing optimal architectures that meet stringent design constraints while keeping in perspective the target technology are elaborated. For a time-shared design, systolic and simple folded architectures are covered. Intricacies in folding a design usually require a dedicated controller to schedule operands on a shared HW resource. The controller is implemented as a finite state machine (FSM). FSM representations and designs are covered in Chapter 9. The chapter gives design examples. The testing of complex FSMs requires a lot of thought. Different coverage metrics are listed. Techniques are described that ensure maximum path coverage for effective testing of FSMs. For many complex applications, the designer has an option to define an instruction set that can effectively implement the application. A micro-programmed state machine design is covered in Chapter 10. Design examples are given to demonstrate the effectiveness of this design option. The designs are coded in RTL Verilog. The designer must know the coding rules and RTL guidelines from a synthesis perspective. Verilog HDL is covered, with mention of the guidelines for effective coding, in Chapter 2. The chapter also gives a brief description of System Verilog that primarily facilities testing and verification of the design. It also helps in modeling and simulating a system at higher levels of abstraction especially at transaction levels. Features of System Verilog that help in writing an effective stimulus are also given. For many examples the RTL Verilog code is also listed with synthesis results. The book also provides an example of a communication receiver.
- Two case studies of designs are discussed in detail. Chapter 11 presents an instruction set for implementing an adaptive algorithm for computationally intensive applications. Several architectural options are explored that trade off area with performance. Chapter 12 explores design options for a CORDIC-based DDFS algorithm. The chapter provides MATLAB_ implementation of the basic CORDIC algorithm, and then explores fully parallel and folding architecture for implementing the CORDIC algorithm.
- The book presents novel architectures for signal processing applications. Chapter 7 presents novel IIR filter-based decimation and interpolation designs. IIR filters are traditionally not used in these applications because for computing the current output sample they require previous output samples, so all samples need to be computed. This requires running the design at a faster clock for decimation and interpolation applications. The transformations are defined that only require an IIR filter to compute samples at a slower rate in decimation and interpolation applications.
- In Chapter 10, the design of a DDFS based on the CORDIC algorithm is given. The chapter also presents a complete working of a novel design that requires only one stage of a CORDIC element and computes sine and cosine values. Then in Chapter 13 a novel design of time-shared and systolic AES architecture is presented. These architectures transform the AES algorithm to fit in an 8-bit datapath. Several innovative techniques are used to reduce the hardware complexity and memory requirements while enhancing the throughput performance of the design. Similarly novel architectures for massively parallel data compression applications are also covered.
|