Proprietary and Confidential to PMC-Sierra, Inc and for its Customer
’
s Internal Use
Document ID: PMC-2002175, Issue 1
21
RM7000
Microprocessor with On-Chip Secondary Cache Datasheet
Released
address translation by the DTLB, the DTLB is filled from the JTLB. The DTLB refill is pseudo-
LRU: the least recently used entry of the least recently used pair of entries is filled. The operation
of the DTLB is completely transparent to the user.
4.16 Cache Memory
In order to keep the RM7000
’
s superscalar pipeline full and operating efficiently, the RM7000 has
integrated primary instruction and data caches with single cycle access as well as a large unified
secondary cache with a three cycle miss penalty from the primaries. Each primary cache has a 64-
bit read path, a 128-bit write path, and both caches can be accessed simultaneously. The primary
caches provide the integer and floating-point units with an aggregate bandwidth of 4.8 GB per
second at an internal clock frequency of 300 MHz. During an instruction or data primary cache
refill, the secondary cache can provide a 64-bit datum every cycle following the initial three cycle
latency for a peak bandwidth of 2.4 GB per second. For applications requiring even higher
performance, the RM7000 also has a direct interface to a large external tertiary cache.
4.17 Instruction Cache
The RM7000 has an integrated 16 KB, four-way set associative instruction cache and, even though
instruction address translation is done in parallel with the cache access, the combination of 4-way
set associativity and 16 KB size results in a cache which is virtually indexed and physically
tagged. Since the effective physical index eliminates the potential for virtual aliases in the cache, it
is possible that some operating system code can be simplified vis-a-vis the RM5200 Family,
R5000 and R4000 class processors.
The data array portion of the instruction cache is 64 bits wide and protected by word parity while
the tag array holds a 24-bit physical address, 14 housekeeping bits, a valid bit, and a single bit of
parity protection.
By accessing 64 bits per cycle, the instruction cache is able to supply two instructions per cycle to
the superscalar dispatch unit. For signal processing, graphics, and other numerical code sequences
where a floating-point load or store and a floating-point computation instruction are being issued
together in a loop, the entire bandwidth available from the instruction cache will be consumed by
instruction issue. For typical integer code mixes, where instruction dependencies and other
resource constraints restrict the achievable parallelism, the extra instruction cache bandwidth is
used to fetch both the taken and non-taken branch paths to minimize the overall penalty for
branches.
A 32-byte (eight instruction) line size is used to maximize the communication efficiency between
the instruction cache and the secondary cache, tertiary cache, or memory system.
The RM7000 is the first MIPS RISC microprocessor to support cache locking on a per line basis.
The contents of each line of the cache can be
locked
by setting a bit in the Tag. Locking the line
prevents its contents from being overwritten by a subsequent cache miss. Refill will occur only
into unlocked cache lines. This mechanism allows the programmer to lock critical code into the
cache thereby guaranteeing deterministic behavior for the locked code sequence.
4.18 Data Cache
The RM7000 has an integrated 16 KB, four-way set associative data cache, and even though data
address translation is done in parallel with the cache access, the combination of 4-way set