
Chapter 3. L1 and L2 Cache Operation
3-41
Cache Control
The data brought into the cache as a result of this instruction is validated in the same manner
that a load instruction would be (that is, it is marked as exclusive or shared). The memory
reference of a
dcbt
instruction causes the reference bit to be set. Note also that the
successful execution of the
dcbt
instruction affects the state of the TLB and cache LRU bits
as deTned by the PLRU algorithm (see Section 3.6.8, òCache Block Replacement
Selectionó).
3.5.3.2 Data Cache Block Touch for Store (dcbtst)
The Data Cache Block Touch for Store (
dcbtst
) instruction behaves similarly to the
dcbt
instruction except for the following:
¥
If the target address of a
dcbtst
instruction is marked write-through (W = 1), the
instruction is treated as a no-op
¥
If the
dcbtst
hits in the L1 data cache, the state of the block is not changed
¥
If the
dcbtst
misses in the L1 data cache, but hits in the L2 cache, the data is brought
into the L1 data cache and is marked with the same state as in the L2 cache
¥
If the the
dcbtst
misses in both the L1 data cache and the L2 cache, the cache block
Tll request is signaled on the bus as a read-with-intent-to-modify (60x-bus mode) or
as a read-claim (MPX bus mode) and the data is marked exclusive when it is brought
into the L1 data cache from the system bus
Note that since the
dcbtst
instruction is treated like a load in the cache hierarchy, cache
blocks fetched by the
dcbtst
can not participate in the store-miss-merging mechanism.
From a programming point of view, it is not wise to use a
dcbtst
unless the
dcbtst
can be
placed sufTciently far ahead of any subsequent store to that same cache block such that the
dcbtst
can fully reload the L1 data cache before the store is attempted. If the store is
attempted while the
dcbtst
cache block Tll is still outstanding, the store will stall until the
dcbtst
has reloaded the L1. This can back up the load/store units committed store queue
(CSQ). If the
dcbtst
instruction cannot be placed sufTciently ahead of the subsequent store
instruction, it may be better to omit the
dcbtst
entirely.
If
dcbtst
(or
dstst
) is being used to prefetch a 32-byte coherency granule that will
eventually be fully consumed by 32-bytes worth of stores (that is, two back-to-back
AltiVec
stvx
instructions), the inclusion of touch-for-store may reduce performance if the
system is bandwidth-limited. This is due to the fact that a touch-for-store must perform both
a 32-byte coherency operation on the address bus (two or more bus cycles) and a 32-byte
data transfer (four or more bus cycles). On the other hand, caching-allowed, write-back
stores that merge to 32-bytes only require a 32-byte coherency operation (two or more bus
cycles) because of the store-miss-merging mechanism. Since these store misses are already
fully pipelined on MPC7400, placing a touch-for-store before a series of adjacent stores
that will naturally merge may in fact degrade performance due to data bus bandwidth
limitations.