Page 6
Version 1.02
12/8/99
PowerPC 740 and PowerPC 750 Microprocessor
CMOS 0.20
μ
m Copper Technology, PID-8p, PPC740L and PPC750L, dd3.2
Features
This section summarizes features of the implementation of the PowerPC 750 architecture. For details, see
the PowerPC 740 and PowerPC750 User’s Manual
Branch processing unit
- Four instructions fetched per clock.
- One branch processed per cycle (plus resolving 2 speculations).
- Up to 1 speculative stream in execution, 1 additional speculative stream in fetch.
- 512-entry branch history table (BHT) for dynamic prediction.
- 64-entry, 4-way set associative branch target instruction cache (BTIC) for eliminating branch delay
slots.
Dispatch unit
- Full hardware detection of dependencies (resolved in the execution units).
- Dispatch two instructions to six independent units (system, branch, load/store, fixed-point unit 1,
fixed-point unit 2, or floating-point).
- Serialization control (predispatch, postdispatch, execution, serialization).
Decode
- Register file access.
- Forwarding control.
- Partial instruction decode.
Load/store unit
- One cycle load or store cache access (byte, half word, word, double word).
- Effective address generation.
- Hits under misses (one outstanding miss).
- Single-cycle misaligned access within double word boundary.
- Alignment, zero padding, sign extend for integer register file.
- Floating-point internal format conversion (alignment, normalization).
- Sequencing for load/store multiples and string operations.
- Store gathering.
- Cache and TLB instructions.
- Big and little-endian byte addressing supported.
- Misaligned little-endian support in hardware.
Fixed-point units
- Fixed-point unit 1 (FXU1); multiply, divide, shift, rotate, arithmetic, logical.
- Fixed-point unit 2 (FXU2); shift, rotate, arithmetic, logical.
- Single-cycle arithmetic, shift, rotate, logical.
- Multiply and divide support (multi-cycle).
- Early out multiply.
Floating-point unit
- Support for IEEE-754 standard single- and double-precision floating-point arithmetic.
- 3 cycle latency, 1 cycle throughput, single-precision multiply-add.
- 3 cycle latency, 1 cycle throughput, double-precision add.
- 4 cycle latency, 2 cycle throughput, double-precision multiply-add.
- Hardware support for divide.
- Hardware support for denormalized numbers.
- Time deterministic non-IEEE mode.