
15
Chapter 1 Architectural Overview
more effective than a BTC. At these cache sizes the bandwith requirements are suffi-
ciently reduced as to make a shared instruction/data bus practicable.
Each cache entry (known as a block) contains four consecutive instructions.
They are tagged in a similar manner to the BTC mechanism of the Am29000 proces-
sor. This allows cache entries to be used for both User mode and Supervisor mode
code at the same time, and entries to remain valid during application system calls and
system interrupt handlers. However, since entries are not tagged with per–process
identifiers, the cache entries must be invalidated when a task context switch occurs.
The cache is 2–way set associative. The 4k bytes of instruction cache provided by
each set results in 256 entries per set (each entry being four instructions, i.e. 16 bytes).
When a branch instruction is executed and the block containing the target
instruction sequence is not found in the cache, the processor fetches the missing
block and marks it valid. Complete blocks are always fetched, even if the target
instruction lies at the
end
of the block. However, the cache forwards instructions to
the decoder without waiting for the block to be reloaded. If the cache is not disabled
and the block to be replaced in the cache is not valid–and–locked, then the fetched
block is placed in the cache. The 2–way cache associativity provides two possible
cache blocks for storing any selected memory block. When a cache miss occurs, and
both associated blocks are valid but not locked, a block is chosen at random for re-
placement.
Locking valid blocks into the cache is not provided for on a per–block basis but
in terms of the complete cache or one set of the two sets. When a set is locked, valid
blocks are not replaced; invalid blocks will be replaced and marked valid and locked.
Cache locking can be used to preload the cache with instruction sequences critical to
performance. However, it is often difficult to use cache locking in a way that can out–
perform the supported random replacement algorithm.
The processor supports Scalable Clocking
which enables the processor to op-
erate at the same or twice the speed of the off–chip memory system. A 33 MHz pro-
cessor could be built around a 20 MHz memory system, and depending on cache uti-
lization there may be little drop–off in performance compared to having constructed
a 33 MHz memory system. This provides for higher system performance without in-
creasing memory system costs or design complexity. Additionally, a performance
upgrade path is provided for systems which were originally built to operate at lower
speeds. The processor need merely be replaced by a pin–compatible higher frequen-
cy device (at higher cost) to realize improved system performance.
Memory system design is further simplified by enforcing a 2–cycle minimum
access time for data and instruction accesses. Even if 1–cycle burst–mode is sup-
ported by a memory system, the first access in the burst is hardwired by the processor
to take 2–cycles. This is effective in relaxing memory system timing constraints and
generally appreciated by memory system designers. The high frequency operation of
the Am29030 processor can easily result in electrical noise [AMD1992c]. Enforcing