Philips Semiconductors
Cache Architecture
File: cache.fm5, modified 7/24/99
PRELIMINARY INFORMATION
5-11
3. At the completion of the block invalidation scan, the
stall signal to the DSPCPU and data cache are deas-
serted.
4. The DSPCPU begins normal operation with an in-
struction fetch from the address reset_vector.
The initialization process takes 512 clock cycles. Reset
sets reset_vector equal to DRAM_BASE so that program
execution starts at the initial value of DRAM_BASE. The
initial value of DRAM_BASE is determined as described
5.5
LRU ALGORITHM
When a cache miss occurs, the block containing the re-
quested data must be brought in to the cache, and this
requested block must replace an existing block in the
cache. The LRU algorithm is responsible for selecting
the replacement victim, and the algorithm attempts to se-
lect the least-recently-used block.
The eight-way set-associative caches implement a hier-
archical LRU replacement algorithm, which works as fol-
lows:
The eight sets are partitioned into four groups of two
elements each. To select the LRU element:
First, the LRU pair out of the four pairs is selected
using a four-way LRU algorithm.
Second, the LRU element of the pair is selected
using a two-way LRU algorithm.
5.5.1
Two-Way Algorithm
The two-way LRU requires an administration of one bit
per pair of elements. On every cache hit to one of the two
blocks, the cache writes once to this bit (just a write, not
a read-modify-write). If the even-numbered block is ac-
cessed, the LRU bit is set to one; if the odd-numbered
block is accessed, the LRU bit is set to zero. On a miss,
the cache replaces the LRU element, i.e. if the LRU bit is
zero, the even numbered element will be replaced; if the
LRU bit is one, the odd numbered element will be re-
placed.
5.5.2
Four-Way Algorithm
For administration of the four-way algorithm, the cache
maintains an upper-left triangular matrix R of one-bit ele-
ments without the diagonal. R contains six bits (in gener-
al, n
×(n–1)/2 bits for n-way LRU). If set element k is ref-
erenced, the cache sets row k to one and column k to
zero:
R[k, 0..n–1]
← 1,
R[0..n–1, k]
← 0
The LRU element is the one for which the entire row is
zero (or empty) and the entire column is one (or empty):
R[k, 0..n–1] = 0
and R[0..n–1, k] = 1
For a four-way set-associative cache, this algorithm re-
quires six bits per set of four cache blocks. On every
cache hit, the LRU info is updated by setting three of the
six bits to zero or one, depending on the set element that
was accessed. The bits need only be written, no read-
modify-write is necessary. On a miss, the cache reads
the six LRU bits to determine the replacement block.
TM1100 combines the two-way and four-way algorithms
into an eight-way hierarchical LRU algorithm. A total of
ten administration bits are required: six to maintain the
four-way LRU plus four bits maintain the four two-way
LRUs.
The hierarchical algorithm has performance close to full
eight-way LRU, but it requires far fewer bits—ten instead
of 28 bits—and is much simpler to implement.
To update the LRU bits on a cache hit to element j (with
0 <= j <= 7), the cache applies m = (j div 2) to the four-
way LRU administration and (j mod 2) is applied to the
two-way administration of pair m. To select a replace-
ment victim, the cache first determines the pair p from
the four-way LRU and then retrieves the LRU bit q of pair
p. The overall LRU element is the p
×2+q.
5.5.3
LRU Initialization
Reset causes the LRU administration bits to initialized to
a legal state:
R[1,0]
← R[2,0] ← R[3,0] ← 1
R[2,1]
← R[3,1] ← R[3,2] ← 0
2_way[3]
← 2_way[2] ← 2_way[1] ← 2_way[0] ← 0
5.5.4
LRU Bit Definitions
The ten LRU bits per set are mapped as shown in
turned by the special operation rdstatus for the data
cache.
5.5.5
LRU for the Dual-Ported Cache
For the TM1100 dual-ported data cache, two memory
operations to the same set are possible in a single clock
cycle. To support this concurrency, two updates of the
LRU bits of a single set must be possible.
The following rules are used by TM1100:
1. LRU bits that are changed by exactly one port receive
the value according to the algorithm described above.
LRU bit 0
R[3,1]
R[3,0]
R[3,2]
R[2,0]
R[1,0]
R[2,1]
2_way[1]
2_way[0]
2_way[3]
2_way[2]
LRU bit 1
LRU bit 2
LRU bit 3
LRU bit 4
LRU bit 5
LRU bit 6
LRU bit 7
LRU bit 8
LRU bit 9
Figure 5-13. LRU bit definitions; 2_way[k] is the two-way LRU bit of pair k = (j div 2) for set element j.