Chapter 8
Cache Organization
177
20695H/0—March 1998
AMD-K6
Processor Data Sheet
Preliminary Information
8.6
Cache-Line Replacements
As programs execute and task switches occur, some cache lines
eventually require replacement.
Instruction cache lines are replaced using a Least Recently
Used (LRU) algorithm. If line replacement is required, lines are
replaced when read cache misses occur.
The data cache uses a slightly different approach to line
replacement. If a miss occurs, and a replacement is required,
lines are replaced by using a Least Recently Allocated (LRA)
algorithm.
Two forms of cache misses and associated cache fills can take
place—a sector replacement and a cache line replacement. In
the case of a sector replacement, the miss is due to a tag
mismatch, in which case the required cache line is filled from
external memory, and the cache line within the sector that was
not required is marked as invalid. In the case of a cache line
replacement, the address matches the tag, but the requested
cache line is marked as invalid. The required cache line is filled
from external memory, and the cache line within the sector that
is not required remains in the same cache state.
8.7
Write Allocate
Write allocate, if enabled, occurs when the processor has a
pending memory write cycle to a cacheable line and the line
does not currently reside in the L1 data cache. In this case, the
processor performs a burst read cycle to fetch the data-cache
line addressed by the pending write cycle. The data associated
with the pending write cycle is merged with the
recently-allocated data-cache line and stored in the processor’s
L1 data cache. The final MESI state of the cache line depends
on the state of the WB/WT# and PWT signals during the burst
read cycle and the subsequent cache write hit (See Table 30 on
page 182 to determine the cache-line states and the access
types following a cache read miss and cache write hit).
During write allocates, a 32-byte burst read cycle is executed in
place of a non-burst write cycle. While the burst read cycle
generally takes longer to execute than the write cycle,
performance gains are realized on subsequent write cycle hits