ADSP-TS101S
Rev. C
|
Page 3 of 48
|
May 2009
GENERAL DESCRIPTION
The ADSP-TS101S TigerSHARC
processor is an ultrahigh per-
formance, Static Superscalar
TM
processor optimized for large
signal processing tasks and communications infrastructure. The
DSP combines very wide memory widths with dual computa-
tion blocks—supporting 32- and 40-bit floating-point and 8-,
16-, 32-, and 64-bit fixed-point processing—to set a new stan-
dard of performance for digital signal processors. The
TigerSHARC processor’s Static Superscalar architecture lets the
processor execute up to four instructions each cycle, performing
24 fixed-point (16-bit) operations or six floating-point
operations.
Three independent 128-bit-wide internal data buses, each
connecting to one of the three 2M bit memory banks, enable
quad word data, instruction, and I/O accesses and provide
14.4G bytes per second of internal memory bandwidth. Operat-
ing at 300 MHz, the ADSP-TS101S processor’s core has a 3.3 ns
instruction cycle time. Using its single-instruction, multiple-
data (SIMD) features, the ADSP-TS101S can perform 2.4 billion
40-bit MACs or 600 million 80-bit MACs per second.
Table 1and
Table 2 show the DSP’s performance benchmarks.
The ADSP-TS101S is code compatible with the other
TigerSHARC processors.
The Functional Block Diagram on
Page 1 shows the processor’s
architectural blocks. These blocks include:
Dual compute blocks, each consisting of an ALU, multi-
plier, 64-bit shifter, and 32-word register file and associated
data alignment buffers (DABs)
Dual integer ALUs (IALUs), each with its own 31-word
register file for data addressing
A program sequencer with instruction alignment buffer
(IAB), branch target buffer (BTB), and interrupt controller
Three 128-bit internal data buses, each connecting to one
of three 2M bit memory banks
On-chip SRAM (6M bit)
An external port that provides the interface to host proces-
sors, multiprocessing space (DSPs), off-chip memory-
mapped peripherals, and external SRAM and SDRAM
A 14-channel DMA controller
Four link ports
Two 64-bit interval timers and timer expired pin
A 1149.1 IEEE compliant JTAG test access port for on-chip
emulation
Figure 2 shows a typical single-processor system with external
system.
The TigerSHARC processor uses a Static Superscalar architec-
ture. This architecture is superscalar in that the ADSP-TS101S
processor’s core can execute simultaneously from one to four
32-bit instructions encoded in a very large instruction word
(VLIW) instruction line using the DSP’s dual compute blocks.
Because the DSP does not perform instruction reordering at
runtime—the programmer selects which operations will execute
in parallel prior to runtime—the order of instructions is static.
With few exceptions, an instruction line, whether it contains
one, two, three, or four 32-bit instructions, executes with a
throughput of one cycle in an eight-deep processor pipeline.
For optimal DSP program execution, programmers must follow
the DSP’s set of instruction parallelism rules when encoding an
instruction line. In general, the selection of instructions that the
DSP can execute in parallel each cycle depends on the instruc-
tion line resources each instruction requires and on the source
and destination registers used in the instructions. The program-
mer has direct control of three core components—the IALUs,
the compute blocks, and the program sequencer.
Static Superscalar is a trademark of Analog Devices, Inc.
Table 1. General-Purpose Algorithm Benchmarks
at 300 MHz
Benchmark
Speed
Clock
Cycles
32-bit algorithm, 600 million MACs/s peak performance
1024 point complex FFT (Radix 2)
32.78 μs
9,835
50-tap FIR on 1024 input
91.67 μs
27,500
Single FIR MAC
1.83 ns
0.55
16-bit algorithm, 2.4 billion MACs/s peak performance
256 point complex FFT (Radix 2)
3.67 μs
1,100
50-tap FIR on 1024 input
24.0 μs
7,200
Single FIR MAC
0.47 ns
0.14
Single complex FIR MAC
1.9 ns
0.57
I/O DMA transfer rate
External port
800M bytes/s
n/a
Link ports (each)
250M bytes/s
n/a
Table 2. 3G Wireless Algorithm Benchmarks
Benchmark
Execution
(MIPS)
1
1 The execution speed is in instruction cycles per second.
Turbo decode
384 kbps data channel
51 MIPS
2
Viterbi decode
12.2 kbps AMR3 voice channel
0.86 MIPS
Complex correlation
3.84 Mcps4 with a spreading factor of 256
0.27 MIPS
2 This value is for six iterations of the algorithm. For eight iterations of the turbo
decoder, this benchmark is 67 MIPS.
3 Adaptive multi rate (AMR)
4 Megachips per second (Mcps)