ChipFind - документация

Электронный компонент: ADSP-TS201

Скачать:  PDF   ZIP

Document Outline

Preliminary Technical Data
TigerSHARC and the TigerSHARC logo are registered trademarks of Analog Devices, Inc.
TigerSHARC
Embedded Processor
ADSP-TS201S
Rev. PrH
Information furnished by Analog Devices is believed to be accurate and reliable.
However, no responsibility is assumed by Analog Devices for its use, nor for any
infringements of patents or other rights of third parties that may result from its use.
Specifications subject to change without notice. No license is granted by implication
or otherwise under any patent or patent rights of Analog Devices. Trademarks and
registered trademarks are the property of their respective owners.
One Technology Way, P.O.Box 9106, Norwood, MA 02062-9106 U.S.A.
Tel:781/329-4700
www.analog.com
Fax:781/326-8703
2003 Analog Devices, Inc. All rights reserved.
KEY FEATURES
Up to 600 MHz, 1.67 ns Instruction Cycle Rate
24M Bits of Internal--On-Chip--DRAM Memory
2525 mm (576-Ball) Thermally Enhanced Ball Grid Array
Package
Dual Computation Blocks--Each Containing an ALU, a Multi-
plier, a Shifter, a Register File, and a Communications Logic
Unit (CLU)
Dual Integer ALUs, providing Data Addressing and Pointer
Manipulation
Integrated I/O Includes 14 Channel DMA Controller, External
Port, Four Link Ports, SDRAM Controller, Programmable
Flag Pins, Two Timers, and Timer Expired Pin for System
Integration
1149.1 IEEE Compliant JTAG Test Access Port for On-Chip
Emulation
On-Chip Arbitration for Glueless Multiprocessing
KEY BENEFITS
Provides High-Performance Static Superscalar DSP Opera-
tions, Optimized for Telecommunications Infrastructure
and Other Large, Demanding Multiprocessor DSP
Applications
Performs Exceptionally Well on DSP Algorithm and I/O
Benchmarks (See Benchmarks in
Table 1
)
Supports Low-Overhead DMA Transfers Between Internal
Memory, External Memory, Memory-Mapped Peripherals,
Link Ports, Host Processors, and Other (Multiprocessor)
DSPs
Eases DSP Programming Through Extremely Flexible Instruc-
tion Set and High-Level-Language Friendly DSP
Architecture
Enables Scalable Multiprocessing Systems With Low Commu-
nications Overhead
Figure 1. Functional block diagram
T
L0
8
4
8
4
8
4
8
4
8
4
8
4
8
4
8
4
IN
OUT
HOST
MULTI
PROC
C-BUS
ARB
DATA
64
LINK PORTS
JTAG PORT
EXTERNAL
PORT
ADDR
32
6
SOC BUS
DMA
JTAG
SDRAM
CTRL
EXT DMA
REQ
J-BUS DATA
IAB
PC
BTB
ADDR
FETCH
PROGRAM
SEQUENCER
COMPUTATIONAL BLOCKS
J-BUS ADDR
K-BUS DATA
K-BUS ADDR
I-BUS DATA
I-BUS ADDR
S-BUS DATA
S-BUS ADDR
INTEGER
K ALU
INTEGER
J ALU
32
32
32X32
32X32
DATA ADDRESS GENERATION
X
REGISTER
FILE
32x32
M
U
L
T
I
P
L
I
E
R
A
L
U
S
H
I
F
T
E
R
C
L
U
DAB
128
128
DAB
128
128
MEMORY BLOCKS
A
D
24M BITS INTERNAL MEMORY
4xCROSSBAR CONNECT
(PAGE CACHE)
A
D
A
D
A
D
S
O
C
I
N
T
E
R
F
A
C
E
Y
REGISTER
FILE
32x32
M
U
L
T
I
P
L
I
E
R
A
L
U
S
H
I
F
T
E
R
C
L
U
L1
IN
OUT
L2
IN
OUT
L3
IN
OUT
CTRL
8
CTRL
10
32
128
32
128
32
128
32
128
4
Rev. PrH
|
Page 2 of 40
|
December 2003
ADSP-TS201S
Preliminary Technical Data
TABLE OF CONTENTS
General Description ................................................. 3
Dual Compute Blocks ............................................ 4
Data Alignment Buffer (DAB) .................................. 4
Dual Integer ALU (IALU) ....................................... 4
Program Sequencer ............................................... 5
Interrupt Controller ........................................... 5
Flexible Instruction Set ........................................ 5
DSP Memory ....................................................... 5
External Port
(Off-Chip Memory/Peripherals Interface) ................ 6
Host Interface ................................................... 6
Multiprocessor Interface ...................................... 7
SDRAM Controller ............................................ 7
EPROM Interface .............................................. 7
DMA Controller ................................................... 7
Link Ports (LVDS) ................................................ 8
Timer and General-Purpose I/O ............................... 9
Reset and Booting ................................................. 9
Clock Domains .................................................... 9
Power Domains .................................................... 9
Filtering Reference Voltage and Clocks ...................... 9
Development Tools ............................................. 10
Designing an Emulator-Compatible DSP Board (Target) 11
Additional Information ........................................ 11
Pin Function Descriptions ........................................ 12
Strap Pin Function Descriptions ................................ 19
ADSP-TS201S--Specifications ................................... 21
Recommended Operating Conditions ...................... 21
Electrical Characteristics ....................................... 21
Absolute Maximum Ratings ................................... 22
ESD Sensitivity ................................................... 22
Timing Specifications ........................................... 23
General AC Timing .......................................... 23
Link Port Low-Voltage, Differential-Signal (LVDS)
Electrical Characteristics and Timing ................. 27
Link Port--Data Out Timing ........................... 28
Link Port--Data In Timing .............................. 31
Output Drive Currents ......................................... 32
Test Conditions .................................................. 33
Output Disable Time ......................................... 33
Output Enable Time ......................................... 34
Capacitive Loading ........................................... 34
Environmental Conditions .................................... 36
Thermal Characteristics ..................................... 36
576-Ball BGA_ED Pin Configurations ......................... 36
Outline Dimensions ................................................ 40
Ordering Guide ..................................................... 40
REVISION HISTORY
Revision PrH:
Applies corrections and additional information (includ-
ing information on 600 MHz parts) to
VREF Filtering
Scheme (page 10)
,
SCLK_VREF Filtering Scheme
(page 10)
,
Drive Strength/Output Impedance Selection
(page 19)
,
Recommended Operating Conditions
(page 22)
,
Electrical Characteristics (page 22)
,
Reference
Clocks (page 24)
,
Power-Up Reset Timing (page 25)
,
AC
Signal Specifications (page 26)
,
Link Port--Data Out
Timing (page 29)
,
Link Port--Data In Timing (page 32)
,
and
Ordering Guide (page 42)
.
Provides unused pin termination data in
Pin Function
Descriptions (page 13)
.
Changes pins R2 and R3 to NC in
576-Ball (25 mm 25
mm) BGA_ED Pin Assignments (page 38)
.
ADSP-TS201S
Preliminary Technical Data
Rev. PrH
|
Page 3 of 40
|
December 2003
GENERAL DESCRIPTION
The ADSP-TS201S TigerSHARC processor is an ultra-high per-
formance, static superscalar processor optimized for large signal
processing tasks and communications infrastructure. The DSP
combines very wide memory widths with dual computation
blocks--supporting 32- and 40-bit floating-point and support-
ing 8-, 16-, 32-, and 64-bit fixed-point processing--to set a new
standard of performance for digital signal processors. The
TigerSHARC static superscalar architecture lets the DSP exe-
cute up to four instructions each cycle, performing twenty-four
16-bit fixed-point operations or six floating-point operations.
Four independent 128-bit wide internal data buses, each con-
necting to the six 4M bit memory banks, enable quad-word
data, instruction, and I/O accesses and provide 33.6G bytes per
second of internal memory bandwidth. Operating at 600 MHz,
the ADSP-TS201S processor's core has a 1.67 ns instruction
cycle time. Using its Single-Instruction, Multiple-Data (SIMD)
features, the ADSP-TS201S processor can perform 4.8 billion
40-bit MACs or 1.2 billion 80-bit MACs per second.
Table 1
shows the DSP's performance benchmarks.
The ADSP-TS201S processor is code-compatible with the other
TigerSHARC processors.
The Functional Block Diagram
on page 1
shows the ADSP-
TS201S processor's architectural blocks. These blocks include:
Dual compute blocks, each consisting of an ALU, multi-
plier, 64-bit shifter, 128-bit CLU, and 32-word register file
and associated Data Alignment Buffers (DABs)
Dual integer ALUs (IALUs), each with its own 31-word
register file for data addressing and a status register
A program sequencer with Instruction Alignment Buffer
(IAB) and Branch Target Buffer (BTB)
An interrupt controller that supports hardware and soft-
ware interrupts, supports level- or edge-triggers, and
supports prioritized, nested interrupts
Four 128-bit internal data buses, each connecting to the six
4M bit memory banks
On-chip DRAM (24M bit)
An external port that provides the interface to host proces-
sors, multiprocessing space (DSPs), off-chip memory-
mapped peripherals, and external SRAM and SDRAM
A 14 channel DMA controller
Four full-duplex LVDS link ports
Two 64-bit interval timers and timer expired pin
A 1149.1 IEEE compliant JTAG test access port for on-chip
emulation
Figure 2 on page 3
shows a typical single-processor system with
external SRAM and SDRAM.
Figure 3 on page 6
shows a typical
multiprocessor system.
The TigerSHARC DSP uses a Static Superscalar
*
architecture.
This architecture is superscalar in that the ADSP-TS201S pro-
cessor's core can execute simultaneously from one to four 32-bit
instructions encoded in a Very Large Instruction Word (VLIW)
instruction line using the DSP's dual compute blocks. Because
Table 1. General Purpose Algorithm Benchmarks
at 600 MHz
Benchmark
Speed
Clock
Cycles
32-bit Algorithm, 1.2 billion MACs/s peak performance
1K Point Complex FFT
1
(Radix2)
15.7 s
9419
64K Point Complex FFT
1
(Radix2)
2.33 ms
1397544
FIR Filter (per real tap)
0.83 ns
0.5
[8 8][8 8] Matrix Multiply (Complex,
Floating-point)
2.3 s
1399
16-bit Algorithm, 4.8 billion MACs/s peak performance
256 Point Complex FFT
1
(Radix 2)
1
Cache preloaded
1.5 s
928
I/O DMA Transfer Rate
External port
1G bytes/s
n/a
Link ports (each)
1G bytes/s
n/a
Figure 2. ADSP-TS201S Single-Processor System With External SDRAM
*
Static SuperscalarTM is a trademark of Analog Devices, Inc.
BOFF
CONTROLIMP10
DMAR30
HBG
HBR
DMA DEVICE
(OPTIONAL)
DATA
MSH
FLAG30
ID20
IOEN
RAS
CAS
LDQM
HDQM
SDWE
SDCKE
SDA10
IRQ30
SCLK
SCLKRAT20
SCLK_V
REF
V
REF
TMR0E
BM
MSSD30
BUSLOCK
SDRAM
MEMORY
(OPTIONAL)
CS
RAS
CAS
DQM
WE
CKE
A10
ADDR
DATA
CLK
POR_IN
JTAG
ADSP-TS201S
BMS
CLOCK
LINK
DEVICES
(4 MAX)
(OPTIONAL)
BOOT
EPROM
(OPTIONAL)
ADDR
MEMORY
(OPTIONAL)
OE
DATA
ADDR
DATA
HOST
PROCESSOR
INTERFACE
(OPTIONAL)
ACK
BR70
CPA
MS10
DATA630
DATA
ADDR
CS
ACK
WE
ADDR310
D
A
T
A
C
O
N
T
R
O
L
A
D
D
R
E
S
S
BRST
REFERENCE
RD
WRH/WRL
DPA
DS20
CS
LxCLKINP/N
LxACKO
LxDATI30P/N
LxBCMPI
LxBCMPO
LxDATO30P/N
LxCLKOUTP/N
LxACKI
IORD
IOWR
RST_OUT
RST_IN
REFERENCE
Rev. PrH
|
Page 4 of 40
|
December 2003
ADSP-TS201S
Preliminary Technical Data
the DSP does not perform instruction re-ordering at runtime--
the programmer selects which operations will execute in parallel
prior to runtime--the order of instructions is static.
With few exceptions, an instruction line, whether it contains
one, two, three, or four 32-bit instructions, executes with a
throughput of one cycle in a ten-deep processor pipeline.
For optimal DSP program execution, programmers must follow
the DSP's set of instruction parallelism rules when encoding an
instruction line. In general, the selection of instructions that the
DSP can execute in parallel each cycle depends on the instruc-
tion line resources each instruction requires and on the source
and destination registers used in the instructions. The program-
mer has direct control of three core components--the IALUs,
the compute blocks, and the program sequencer.
The ADSP-TS201S processor, in most cases, has a two-cycle
execution pipeline that is fully interlocked, so--whenever a
computation result is unavailable for another operation depen-
dent on it--the DSP automatically inserts one or more stall
cycles as needed. Efficient programming with dependency-free
instructions can eliminate most computational and memory
transfer data dependencies.
In addition, the ADSP-TS201S processor supports SIMD opera-
tions two ways--SIMD compute blocks and SIMD
computations. The programmer can load both compute blocks
with the same data (broadcast distribution) or different data
(merged distribution).
DUAL COMPUTE BLOCKS
The ADSP-TS201S processor has compute blocks that can exe-
cute computations either independently or together as a Single-
Instruction, Multiple-Data (SIMD) engine. The DSP can issue
up to two compute instructions per compute block each cycle,
instructing the ALU, multiplier, shifter, or CLU to perform
independent, simultaneous operations. Each compute block can
execute eight 8-bit, four 16-bit, two 32-bit, or one 64-bit SIMD
computations in parallel with the operation in the other block.
The compute blocks are referred to as X and Y in assembly syn-
tax, and each block contains four computational units--an
ALU, a multiplier, a 64-bit shifter, a 128-bit CLU--and a 32-
word register file.
Register File--Each Compute Block has a multiported 32-
word, fully orthogonal register file used for transferring
data between the computation units and data buses and for
storing intermediate results. Instructions can access the
registers in the register file individually (word-aligned), in
sets of two (dual-aligned), or in sets of four (quad-aligned).
ALU--The ALU performs a standard set of arithmetic
operations in both fixed- and floating-point formats. It also
performs logic operations.
Multiplier--The multiplier performs both fixed- and float-
ing-point multiplication and fixed-point multiply and
accumulate.
Shifter--The 64-bit shifter performs logical and arithmetic
shifts, bit and bitstream manipulation, and field deposit
and extraction operations.
Communications Logic Unit (CLU)--This is a 128-bit unit
provides Trellis Decoding (for example, Viterbi and Turbo
decoders) and executes complex correlations for CDMA
communication applications (for example chip-rate and
symbol-rate functions).
Using these features, the compute blocks can:
Provide 8 MACs per cycle peak and 7.1 MACs per cycle
sustained 16-bit performance and provide 2 MACs per
cycle peak and 1.8 MACs per cycle sustained 32-bit perfor-
mance (based on FIR)
Execute six single-precision floating-point or execute
twenty-four 16-bit fixed-point operations per cycle, pro-
viding 3 GFLOPS or 12.0 GOPS performance
Perform two complex 16-bit MACs per cycle
Execute eight Trellis butterflies in one cycle
DATA ALIGNMENT BUFFER (DAB)
The DAB is a quad-word FIFO that enables loading of quad-
word data from nonaligned addresses. Normally, load instruc-
tions must be aligned to their data size so that quad words are
loaded from a quad-aligned address. Using the DAB signifi-
cantly improves the efficiency of some applications, such as FIR
filters.
DUAL INTEGER ALU (IALU)
The ADSP-TS201S processor has two IALUs that provide pow-
erful address generation capabilities and perform many general-
purpose integer operations. The IALUs are referred to as J and
K in assembly syntax and have the following features:
Provides memory addresses for data and update pointers
Supports circular buffering and bit-reverse addressing
Performs general-purpose integer operations, increasing
programming flexibility
Includes a 31-word register file for each IALU
As address generators, the IALUs perform immediate or indi-
rect (pre- and post-modify) addressing. They perform modulus
and bit-reverse operations with no constraints placed on mem-
ory addresses for the modulus data buffer placement. Each
IALU can specify either a single-, dual-, or quad-word access
from memory.
The IALUs have hardware support for circular buffers, bit
reverse, and zero-overhead looping. Circular buffers facilitate
efficient programming of delay lines and other data structures
required in digital signal processing, and they are commonly
used in digital filters and Fourier transforms. Each IALU pro-
vides registers for four circular buffers, so applications can set
up a total of eight circular buffers. The IALUs handle address
pointer wraparound automatically, reducing overhead, increas-
ing performance, and simplifying implementation. Circular
buffers can start and end at any memory location.
ADSP-TS201S
Preliminary Technical Data
Rev. PrH
|
Page 5 of 40
|
December 2003
Because the IALU's computational pipeline is one cycle deep, in
most cases integer results are available in the next cycle. Hard-
ware (register dependency check) causes a stall if a result is
unavailable in a given cycle.
PROGRAM SEQUENCER
The ADSP-TS201S processor's program sequencer supports the
following:
A fully interruptible programming model with flexible pro-
gramming in assembly and C/C++ languages; handles
hardware interrupts with high throughput and no aborted
instruction cycles
A ten-cycle instruction pipeline--four-cycle fetch pipe and
six-cycle execution pipe--computation results available
two cycles after operands are available
Supply of instruction fetch memory addresses; the
sequencer's Instruction Alignment Buffer (IAB) caches up
to five fetched instruction lines waiting to execute; the pro-
gram sequencer extracts an instruction line from the IAB
and distributes it to the appropriate core component for
execution
Management of program structures and program flow
determined according to JUMP, CALL, RTI, RTS instruc-
tions, loop structures, conditions, interrupts, and software
exceptions
Branch prediction and a 128-entry branch target buffer
(BTB) to reduce branch delays for efficient execution of
conditional and unconditional branch instructions and
zero-overhead looping; correctly predicted branches that
are taken occur with zero overhead cycles, overcoming the
five-to-nine stage branch penalty
Compact code without the requirement to align code in
memory; the IAB handles alignment
Interrupt Controller
The DSP supports nested and nonnested interrupts. Each inter-
rupt type has a register in the interrupt vector table. Also, each
has a bit in both the interrupt latch register and the interrupt
mask register. All interrupts are fixed as either level-sensitive or
edge-sensitive, except the IRQ30 hardware interrupts, which
are programmable.
The DSP distinguishes between hardware interrupts and soft-
ware exceptions, handling them differently. When a software
exception occurs, the DSP aborts all other instructions in the
instruction pipe. When a hardware interrupt occurs, the DSP
continues to execute instructions already in the instruction pipe.
Flexible Instruction Set
The 128-bit instruction line, which can contain up to four 32-bit
instructions, accommodates a variety of parallel operations for
concise programming. For example, one instruction line can
direct the DSP to conditionally execute a multiply, an add, and a
subtract in both computation blocks while it also branches to
another location in the program. Some key features of the
instruction set include:
CLU instructions for communications infrastructure to
govern Trellis Decoding (for example, Viterbi and Turbo
decoders) and Despreading via complex correlations
Algebraic assembly language syntax
Direct support for all DSP, imaging, and video arithmetic
types
Eliminates toggling DSP hardware modes because modes
are supported as options (for example, rounding, satura-
tion, and others) within instructions
Branch prediction encoded in instruction; enables zero-
overhead loops
Parallelism encoded in instruction line
Conditional execution optional for all instructions
User defined partitioning between program and data
memory
DSP MEMORY
The DSP's internal and external memory is organized into a
unified memory map, which defines the location (address) of all
elements in the system, as shown in
Figure 3
.
The memory map is divided into four memory areas--host
space, external memory, multiprocessor space, and internal
memory--and each memory space, except host memory, is sub-
divided into smaller memory spaces.
The ADSP-TS201S processor internal memory has 24M bits of
on-chip DRAM memory, divided into six blocks of 4M bits
(128K words 32 bits). Each block--M0, M2, M4, M6, M8, and
M10--can store program, data, or both, so applications can
configure memory to suit specific needs. Placing program
instructions and data in different memory blocks, however,
enables the DSP to access data while performing an instruction
fetch. Each memory segment contains a 128K bit cache to
enable single cycle accesses to internal DRAM.
The six internal memory blocks connect to the four 128-bit wide
internal buses through a crossbar connection, enabling the DSP
to perform four memory transfers in the same cycle. The DSP's
internal bus architecture provides a total memory bandwidth of
33.6G bytes per second, enabling the core and I/O to access
eight 32-bit data words and four 32-bit instructions each cycle.
The DSP's flexible memory structure enables:
DSP core and I/O accesses to different memory blocks in
the same cycle
DSP core access to three memory blocks in parallel--one
instruction and two data accesses
Programmable partitioning of program and data memory
Program access of all memory as 32-, 64-, or 128-bit
words--16-bit words with the DAB