26
Internal Architecture
Chapter 2
AMD-K6-2E+ Embedded Processor Data Sheet
23542A/0—September 2000
Preliminary Information
Branch Target Cache
To avoid a one clock cache-fetch penalty when a branch is
predicted taken, a built-in branch target cache supplies the first
16 bytes of instructions directly to the instruction buffer
(assuming the target address hits this cache). (See
Figure 3 onThe branch target cache is organized as 16 entries of 16 bytes.
In total, the branch prediction logic achieves branch prediction
rates greater than 95%.
Return Address Stack
The return address stack is a special device designed to
optimize CALL and RET pairs. Software is typically compiled
with subroutines that are frequently called from various places
in a program. This is usually done to save space.
Entry into the subroutine occurs with the execution of a CALL
instruction. At that time, the processor pushes the address of
the next instruction in memory following the CALL instruction
onto the stack (allocated space in memory). When the processor
encounters a RET instruction (within or at the end of the
subroutine), the branch logic pops the address from the stack
and begins fetching from that location. To avoid the latency of
main memory accesses during CALL and RET operations, the
return address stack caches the pushed addresses.
Branch Execution
Unit
The branch execution unit enables efficient speculative
execution. This unit gives the processor the ability to execute
instructions beyond conditional branches before knowing
whether the branch prediction was correct.
The AMD-K6-2E+ processor does not permanently update the
x86 registers or memory locations until all speculatively
executed conditional branch instructions are resolved. When a
prediction is incorrect, the processor backs out to the point of
the mispredicted branch instruction and restores all registers.
Th e A M D-K6-2 E+ proces sor can s upport up to se ve n
outstanding branches.