
Philips Semiconductors
Cache Architecture
File: cache.fm5, modified 7/24/99
PRELIMINARY INFORMATION
5-9
5.4.4
Replacement Policy
The hierarchical LRU replacement policy implemented
by the instruction cache is identical to that implemented
LRU algorithm.
5.4.5
Location of Program Code
All program code must first be loaded into SDRAM. The
instruction cache cannot fetch instructions from other
memories or devices. In particular, the cache cannot
fetch code from on-chip devices or over the PCI bus.
5.4.6
Branch Units
The instruction cache is closely coupled to three branch
units. Each unit can accept a branch independently, so
three branches can be processed simultaneously in the
same cycle.
Branches in TM1100 are so-called delayed branches be-
cause the effect of a successful (taken) branch is not
seen in the flow of control until some number of cycles af-
ter the successful branch is executed. The number of cy-
cles of latency is called the branch delay, and on
TM1100, the branch delay is three cycles.
Although three branches can be executed simultaneous-
ly, correct operation of the DSPCPU requires that only
one be successful (taken) in any one cycle. DSPCPU op-
eration is undefined if more than one concurrent branch
operation is successful.
Each branch unit takes four inputs from the DSPCPU:
the branch opcode, a guard bit, a branch condition, and
a branch target address. A branch is deemed successful
if and only if the opcode is a branch opcode, the guard bit
is TRUE (i.e., = 1), and the condition (determined by the
opcode) is satisfied.
5.4.7
Coherency: Special iclr Operation
A program can exercise some control over the operation
of the instruction cache by executing the special iclr op-
eration. This operation causes the instruction cache to
clear the valid bits for all blocks in the cache, including
locked blocks. The LRU replacement status of all blocks
is reset to their initialization value. The CPU is stalled
while iclr is executing.
sion of coherency issues.
5.4.8
Reading Tags and Cache Status
The instruction cache supports read access to its tag and
status bits, but not with special operations as with the
data cache. Since the instruction cache and branch units
can execute only resultless operations, access to the in-
struction-cache tags and status bits is implemented us-
ing normal load operations that reference a special re-
gion in the MMIO address aperture. The region is 64 KB
long and starts at MMIO_BASE. Instruction cache tags
and status bits are read-only; store operations to this re-
gion have no effect. MMIO operations to this special re-
gion are only allowed by the DSPCPU, not by any other
masters of the on-chip data highway, such as external
PCI initiators.
Programmer’s note: Tag and status information can not
be read by PCI access, but only by DSPCPU access.
Tag and status read cannot be scheduled in the same
cycle with or one cycle after an iclr operation.
Reading A Tag And Valid Bit. To read the tag and valid
bit for a block in the instruction cache, a program can ex-
ecute a ld32 operation directed at the instruction-cache
shows the required format for the target address. The
most-significant 16 bits must be equal to MMIO_BASE,
the least-significant 15 bits select the block (by naming
the set and set member), and bit 15 must be set to zero
to perform a tag read. Note that in TM1100, valid set
numbers range from 0 to 63. Space to encode set num-
bers 64 to 511 is provided for future extensions.
A ld32 with an address as specified above returns a 32-
Table 5-13. Instruction Address Field Partitioning
Field
Address
Bits
Purpose
Offset
5..0
Byte offset into a set
Set
11..6
Selects one of the sets in the cache (one
of 64 in the case of TM1100)
Tag
31..12
Compared against address tags of set
members
0
Offset
Set
Tag
31
5
6
11
12
Instruction Cache
Address
Figure 5-9. Instruction-cache address partitioning.
31
0
3
7
11
15
19
23
27
To Read Tag & Valid Bit
To Read LRU Bits
SET
MMIO_BASE
10000
0
MMIO_BASE
TAG_I_MUX
SET
00
Figure 5-10. Required address format for reading instruction-cache tags and status.