The applications of Digital Signal Processing (DSP) continue to expand, driven by trends such as the increased use of video and still images and the demand for increasingly reconfigurable systems such as Software Defined Radio (SDR). Many of these applications combine the need for significant DSP processing with cost sensitivity, creating demand for high-performance, low-cost DSP solutions.
In order to meet the increasing market demand, the processing elements and their supporting hardware platforms must be able to provide increased calculation throughput without the cost of additional latency. For example, in the 3G and 4G wireless application space, both baseband and Remote Radio Head (RRH) cards are required to handle multiple protocols as well as increased throughput in order to support higher cellular data rates while maintaining high Signal to Noise Ratio (SNR).
- Up to 320 multipliers (18x18)
- Enhanced 3rd Generation sysDSP Architecture
- Dual-slice architecture
- Fully cascadable blocks
- Backwards compatible with ECP2M sysDSP block
- Programmable Multipliers
- Two 18x18, four 9x9 per slice
- Single 36x36 across two adjacent slices for double precision / floating point
- 18x36 MAC & 18x18 MMAC modes
- 54-bit Cascadable ALU
- Rounding & truncation
- Neighboring ALU output chainable as third input for ternary adds
|
- High performance modes
- MULT (Multiplier)
- MAC (Multiplier Accumulate)
- MMAC (Multiplier Multiplier Accumulate)
- MULTADDSUB (Multiplier Add/Subtract)
- MULTADDSUBSUM (Multiply Add/Subtract and SUM)
- SLICE (Fully-configurable sysDSP slice used for advanced functions)
- Adder Tree
- Wide Mux
|
Dual Slice Architecture
The LatticeECP3 sysDSP block consists of two identical slices to enable increased performance within the DSP block, provide finer control capability and to allow independent ALU operation. Bypassable pipeline registers within each slice, allow the designer to remove propagation latencies. The slices can also be chained with no routing penalties enabling wider multiplication and accumulator operations.
|
 |
Cascadability
For many signal processing applications, where large FIR filters or FFTs are employed, it may be necessary to create large signal processing functions. To accommodate this need, it is necessary to have DSP blocks cascaded together. LatticeECP3 addresses the need for high performance signal processing functions, by connecting the accumulator output of one block directly to its adjacent DSP block input.
|
|
Lattice has developed the following filter designs to demonstrate the powerful DSP capability of the LatticeECP3 FPGA.
- Direct Form 64-Tap FIR Filter: In the direct form FIR filter, the input samples are shifted into a shift register queue and each shift register is connected to a multiplier. The products from the multipliers are added together to get the FIR filter’s output sample. This example shows a 64-tap FIR filter using 16 sysDSP blocks and approximately 512 slices in the LatticeECP3 FPGA.

- 128-Tap Long Asymmetrical Filters Using Ladder Architecture: Using the ladder architecture, the FIR filter is split into sections each having the same coefficient set as if it was a single continuous filter chain. Instead of connecting the shifted data and the result outputs from the first section to the corresponding input of the next section, the ladder network connects a delayed version of the first stage input data to the second stage input data and sums a delayed version of the first stage sum output with the second stage sum output.

- 256-Tap Long Symmetrical Filters Using Ladder Architecture: The impulse response for most FIR filters is symmetric. This symmetry can generally be exploited to reduce the arithmetic requirements and produce area-efficient filter realizations. It is possible to use only half the multipliers for symmetric coefficients compared to that used for a similar filter with non-symmetric coefficients. An implementation for symmetric coefficients is shown in the figure below. The 256-tap long symmetrical filter example uses only 32 sysDSP slices, 2EBR and 3.5K slices.

- Polyphase Interpolator FIR Filter Designs: The polyphase interpolation filter implements the computationally efficient 1-to-P interpolation filter where P is an integer greater than 1. The example below shows a design with an interpolation by 16 that uses 128 taps. This requires 8 polyphase filters (sub-filters) with 16 coefficients each.
