Philips Semiconductors
Cache Architecture
File: cache.fm5, modified 7/24/99
PRELIMINARY INFORMATION
5-7
locate a block and set the status of this block to valid. No
data is fetched from main memory. The allocated block
is undefined after this operation. The programmer has to
fill it with valid data by store operations. Allocation oper-
ations to other apertures than cacheable DRAM will be
discarded. Allocation of a non-dirty block causes 3 stall
cycles. Allocation of a dirty block will cause writeback of
this block to the SDRAM and take at least 11 stall cycles.
5.3.10.4
Data-Cache Prefetch Operation
The data cache controller recognizes prefetch opera-
tions as shown in
Table 5-10. The prefetch operations
load a full cache block from memory concurrently with
other computation. If the prefetched block is already in
cache, no data is fetched from main memory. Prefetch
operations to other apertures than cacheable DRAM are
discarded. This operation is not guaranteed to execute -
it will not execute if the cache is already occupied with
two cache misses when the operation is issued. The
prefetch operations cause 3 stall cycles, if there is no
copyback of a dirty block. If a dirty block is the target of
the prefetch, the dirty block will be written back to
SDRAM, and at least 11 stall cycles are taken.
5.3.11
Memory Operation Ordering
The TM1100 memory system implements traditional or-
dering for memory operations that are issued in different
clock cycles. That is, the effects of a memory operation
issued in cycle j occur before the effects of a memory op-
eration issued in cycle j+1.
For memory operations issued in the same cycle, how-
ever, it is not possible to execute memory operations in
a traditional order. So long as the simultaneous memory
operations access different addresses (aliasing is not
possible in TM1100), no problems can occur. If two si-
multaneous operations do access the same address,
however, TM1100 behavior is undefined. Specifically,
two cases are possible:
1. When multiple values are written to the same address
in the same cycle, the resulting value in memory is un-
dened.
2. When a read and a write occur to the same address
in the same clock cycle, the value returned by the
read is undened.
The behavior of simultaneous accesses to the same ad-
dress is undefined regardless of whether one or both
memory operations hit in the cache.
Hidden Memory System Concurrency
. Some cache
operations may be overlapped with CPU execution. In
general, a program cannot determine in what order
cache misses will complete nor can a program determine
when and in what order copyback operations will com-
plete. A program can, however, enforce the completion
of copyback transactions to main memory because copy-
back and invalidate operations can complete only if
pending copyback transactions for the same block have
completed. Thus, a program can synchronize to the com-
pletion of a copyback operation by dirtying a block, issu-
ing a copyback operation for the block, and then issuing
an invalidate operation for the block.
Ordering Of Special Memory Operations. The follow-
ing are special memory operations:
1. Loads or stores to MMIO addresses.
2. Non-cached loads or stores.
3. Any copyback or invalidate operation.
4. Loads or stores that cause a PCI-bus access.
The CPU is stalled until these special memory opera-
tions are completed; there is no overlap of CPU execu-
tion with these special memory operations. Thus, a pro-
grammer can assume that traditional memory operation
ordering applies to special memory operations. Note,
however, that ordering is undefined for two special mem-
ory operations issued in the same cycle.
5.3.12
Operation Latency
Load and store operations have an operation latency of
three cycles, regardless of the size of the data transfer.
Table 5-9. Data cache allocation operations
Mnemonic
Description
allocd(offset) r
src1
Data-cache allocate block with dis-
placement. Causes the block with
address (rsrc1+offset) &
(~(cache_block_size - 1)) to be allo-
cated and set valid.
allocr r
src1 rsrc2
Data-cache allocate block with index.
Causes the block with address
(rsrc1+rsrc2) & (~(cache_block_size -
1)) to be allocated and set valid.
allocx r
src1 rsrc2
Data-cache allocate block with scaled
index. Causes the block with address
(rsrc1 + 4 * rsrc2) &
(~(cache_block_size - 1)) to be allo-
cated and set valid.
Table 5-10. Data cache prefetch operations
Mnemonic
Description
prefd(offset) r
src1
Data-cache prefetch block with dis-
placement. Causes the block with
address (rsrc1+offset) &
(~(cache_block_size - 1)) to be
prefetched
prefr r
src1 rsrc2
Data-cache prefetch block with index.
Causes the block with address
(rsrc1+rsrc2) & (~(cache_block_size -
1)) to be prefetched.
pref16x r
src1 rsrc2
Data-cache prefetch block with scaled
16 bit index. Causes the block with
address (rsrc1 + 2 * rsrc2) &
(~(cache_block_size - 1)) to be
prefetched.
pref32x r
src1 rsrc2
Data-cache prefetch block with scaled
32 bit index. Causes the block with
address (rsrc1 + 4 * rsrc2) &
(~(cache_block_size - 1)) to be
prefetched.