Overview
SPC563M64
Doc ID 14642 Rev 6
performed to accelerate taken branches. Prefetched instructions are placed into an
instruction buffer capable of holding six instructions.
Branches can also be decoded at the instruction buffer and branch target addresses
calculated prior to the branch reaching the instruction decode stage, allowing the branch
target to be prefetched early. When a branch is detected at the instruction buffer, a
prediction may be made on whether the branch is taken or not. If the branch is predicted to
be taken, a target fetch is initiated and its target instructions are placed in the instruction
buffer following the branch instruction. Many branches take zero cycle to execute by using
branch folding. Branches are folded out from the instruction execution pipe whenever
possible. These include unconditional branches and conditional branches with condition
codes that can be resolved early.
Conditional branches which are not taken and not folded execute in a single clock.
Branches with successful target prefetching which are not folded have an effective execution
time of one clock. All other taken branches have an execution time of two clocks. Memory
load and store operations are provided for byte, halfword, and word (32-bit) data with
automatic zero or sign extension of byte and halfword load data as well as optional byte
reversal of data. These instructions can be pipelined to allow effective single cycle
throughput. Load and store multiple word instructions allow low overhead context save and
restore operations. The load/store unit contains a dedicated effective address adder to allow
effective address generation to be optimized. Also, a load-to-use dependency does not incur
any pipeline bubbles for most cases.
The Condition Register unit supports the condition register (CR) and condition register
operations defined by the Power Architecture. The condition register consists of eight 4-bit
fields that reflect the results of certain operations, such as move, integer and floating-point
compare, arithmetic, and logical instructions, and provide a mechanism for testing and
branching. Vectored and autovectored interrupts are supported by the CPU. Vectored
interrupt support is provided to allow multiple interrupt sources to have unique interrupt
handlers invoked with no software overhead.
The hardware floating-point unit utilizes the IEEE-754 single-precision floating-point format
and supports single-precision floating-point operations in a pipelined fashion. The general
purpose register file is used for source and destination operands, thus there is a unified
storage model for single-precision floating-point data types of 32 bits and the normal integer
type. Single-cycle floating-point add, subtract, multiply, compare, and conversion operations
are provided. Divide instructions are multi-cycle and are not pipelined.
The Signal Processing Extension (SPE) Auxiliary Processing Unit (APU) provides hardware
SIMD operations and supports a full complement of dual integer arithmetic operation
including Multiply Accumulate (MAC) and dual integer multiply (MUL) in a pipelined fashion.
The general purpose register file is enhanced such that all 32 of the GPRs are extended to
64 bits wide and are used for source and destination operands, thus there is a unified
storage model for 32 x 32 MAC operations which generate greater than 32-bit results.
The majority of both scalar and vector operations (including MAC and MUL) are executed in
a single clock cycle. Both scalar and vector divides take multiple clocks. The SPE APU also
provides extended load and store operations to support the transfer of data to and from the
extended 64-bit GPRs.
The CPU includes support for Variable Length Encoding (VLE) instruction enhancements.
This enables the classic Power Architecture instruction set to be represented by a modified
instruction set made up from a mixture of 16- and 32-bit instructions. This results in a
significantly smaller code size footprint without noticeably affecting performance. The classic
Power Architecture instruction set and VLE instruction set are available concurrently.