TSPC603P
28/38
5.2.1.2. Calculating effective addresses
The effective address (EA) is the 32-bit address computed by the processor when executing a memory access or branch instruction
or when fetching the next sequential instruction.
The PowerPC architecture supports two simple memory addressing modes :
D EA = (RA|0) + offset (including offset = 0) (register indirect with immediate index).
D EA = (RA|0) + rB (register indirect with index).
These simple addressing modes allow efficient address generation for memory accesses. Calculation of the effective address for
aligned transfers occurs in a single clock cycle.
For a memory access instruction, if the sum of the effective address and the operand length exceeds the maximum effective address,
the memory operand is considered to wrap around from the maximum effective address to effective address 0.
Effective address computations for both data and instruction accesses use 32-bit unsigned binary arithmetic. A carry from bit 0 is
ignored in 32-bit implementations.
5.2.2. PowerPC 603p microprocessor instruction set
The 603p instruction set is defined as follows :
D The 603p provides hardware support for all 32-bit PowerPC instructions.
D The 603p provides two implementation-specific instructions used for software table search operations following TLB misses :
- Load Data TLB Entry (tlbld).
- Load Instruction TLB Entry (tlbli).
D The 603p implements the following instructions which are defined as optional by the PowerPC architecture :
- External Control In Word Indexed (eciwx).
- External Control Out Word Indexed (ecowx).
- Floating Select (fsed).
- Floating Reciprocal Estimate Single-Precision (fres).
- Floating Reciprocal Square Root Estimate (frsqrte).
- Store Floating-Point as Integer Word (stfiwx).
5.3. Cache implementation
The following subsections describe the PowerPC architecture’s treatment of cache in general, and the 603p specific implementation,
respectively.
5.3.1. PowerPC cache characteristics
The PowerPC architecture does not define hardware aspects of cache implementations. For example, some PowerPC processors,
including the 603p, have separate instruction and data caches (harvare architecture), while others, such as the PowerPC 601
microprocessor, implement a unified cache.
PowerPC microprocessor control the following memory access modes on a page or block basis :
D Write-back/write-through mode.
D Cache-inhibited mode.
D Memory coherency.
Note that in the 603p, a cache line is defined as eight words. The VEA defines cache management instructions that provide a means
by which the application programmer can affect the cache contents.
5.3.2. PowerPC 603p microprocessor cache implementation
The 603p has two 16-Kbyte, four-way set-associative (instruction and data) caches. The caches are physically addressed, and the
data cache can operate in either write-back or write-through mode as specified by the PowerPC architecture.
The data cache is configured as 128 sets of 4 lines each. Each line consists of 32 bytes, two state bits, and an address tag. The two
state bits implement the three-state MEI (modified/exclusive/invalid) protocol. Each line contains eight 32-bit words. Note that the
PowerPC architecture defines the term block as the cacheable unit. For the 603p, the block size is equivalent to a cache line. A block
diagram of the data cache organization is shown in Figure 16.
The instruction cache also consists of 128 sets of 4 lines, and each line consists of 32 bytes, an address tag, and a valid bit. The
instruction cache may not be written to except through a line fill operation. The instruction cache is not snooped, and cache coherency
must be maintained by software. A fast hardware invalidation capability is provided to support cache maintenance. The organization
of the instruction cache is very similar to the data cache shown in Figure 16.
Each cache line contains eight contiguous words from memory that are loaded from an 8-word boundary (that is, bits A27-A32 of the
effective addresses are zero) ; thus, a cache line never crosses a page boundary. Misaligned accesses across a page boundary can
incur a performance penalty.
The 603’s cache lines are loaded in four beats of 64 bits each. The burst load is performed as ”critical double word first”. The cache
that is being loaded is blocked to internal accesses until the load completes. The critical double word is simultaneously written to the
cache and forwarded to the requesting unit, thus minimizing stalls due to load delays.