TM1100 Preliminary Data Book
Philips Semiconductors
5-4
PRELIMINARY INFORMATION
File: cache.fm5, modified 7/24/99
5.3.3
Miss Processing Order
When a miss occurs, the data cache fills the block con-
taining the requested word from the critical word first.
The CPU is stalled until the first word is transferred. The
block is then filled up while the CPU keeps running.
5.3.4
Replacement Policies, Coherency
The cache implements a copyback replacement policy
with one dirty bit per 64-B block. Thus, when a miss oc-
curs and the block selected for replacement has its dirty
bit set, the dirty block must be written to main memory to
preserve its modified contents. On TM1100, the dirty
block is written to memory before the needed block is
fetched.
Coherency is not maintained in any way by hardware be-
tween the data cache, the instruction cache, and main
memory. Special operations are available to implement
Write misses are handled with an allocate-on-write poli-
cy—the write that caused the miss stores its data in the
cache after the missing block is fetched into the cache.
The cache implements a hierarchical LRU replacement
algorithm to determine which of the eight elements
(blocks) in a set is replaced. The algorithm partitions the
eight set elements into four groups, each group with two
elements. The hierarchical LRU replacement victim is
determined by selecting the least-recently used group of
two elements and then selecting the least-recently used
element in that group. This hierarchical algorithm yields
performance close to full LRU but is simpler to imple-
ment.
the LRU algorithm.
5.3.5
Alignment, Partial-Word Transfers,
Endian-ness
The cache implements 32-bit word, 16-bit half-word, and
8-bit byte transfers. All transfers, however, must be to
addresses that are naturally aligned; that is, 32-bit words
must be aligned on 32-bit boundaries, and 16-bit half-
words must be aligned on 16-bit boundaries.
As TM1100’s other processing units, the CPU have the
capability to use either big- or little-endian byte order.
Detailed endian-ness description can be found in
Appen-5.3.6
Dual Ports
To allow two accesses to proceed in parallel, the data
cache is quasi-dual ported. The cache is implemented as
eight banks of single-ported memory, but the hardware
allows each bank to operate independently. Thus, when
the addresses of two simultaneous accesses select two
different banks, both accesses can complete simulta-
neously. Bank selection is determined by the three low-
order address bits [4..2] of each address. Thus, the
words in a 64-byte cache block are distributed among the
eight blocks, which prevents conflicts between two simul-
taneously issued accesses to adjacent words in a cache
block. The TM1100 compiling system attempts to avoid
bank conflicts as much as possible.
The dual-ported cache can execute the load and store
opcodes (ild8d, uld8d, ild16d, uld16d, ld32d, h_st8d,
h_st16d, h_st32d, ild8r, uld8r, ild16r, uld16r, ld32r,
ild16x, uld16x, ld32x) in either or both of the two ports.
The special opcodes alloc, dcb, dinvalid, pref, rdtag and
rdstatus can only be executed in the second port, not in
the first port. Whenever any of these special opcodes is
issued in the second port, there should not be a concur-
rent load or store operation in the first. This is a special
scheduling constraint.
5.3.7
Cache Locking
The data cache allows the contents of up to one-half of
its blocks to be locked. Thus, on TM1100, up to 8K bytes
of the cache can be used as a high-speed local data
memory. Only four out of eight blocks in any set can be
locked.
A locked block is never chosen as a victim by the re-
placement algorithm; its contents remain undisturbed
until either (1) the block’s locked status is changed ex-
plicitly by software, or (2) a dinvalid operation is executed
that targets the locked block.
Cache locking occurs only for the data in the address
range
described
by
the
MMIO
registers
DC_LOCK_ADDR and DC_LOCK_SIZE. The granulari-
ty of the address range is one 64-byte cache block. The
MMIO register DC_LOCK_CTL contains the cache-lock-
the layout of the data-cache lock registers. Locking will
occur for an address if locking is enabled and both of the
following are true:
1. The address is greater than or equal to the value in
DC_LOCK_ADDR.
DC_LOCK_ADDR (r/w)
0x10 0014
DC_LOCK_ADDRESS
DC_LOCK_SIZE (r/w)
0x10 0018
DC_LOCK_SIZE
000000
0
00000
31
0
3
7
11
15
19
23
27
DC_LOCK_CTL (r/w)
0x10 0010
00000
000000
00
000000
DC_LOCK_ENABLE
MMIO_BASE
offset:
00
0
00
0
00
0
APERTURE_CONTROL
reserved
65
Figure 5-5. Formats of the registers in charge of data-cache locking.