IDT MIPS32 4Kc Processor Core
Functional Overview
79RC32438 User Reference Manual
2 - 3
November 4, 2002
Notes
Functional Overview
Figure 2.1 shows a block diagram of the 4Kc CPU core.
Figure 2.1 RC32438 Block Diagram
Blocks
The following sections describe the various blocks in the 4Kc processor core.
Execution Unit
The execution unit includes:
32-bit adder used for calculating the data address
Address unit for calculating the next instruction address
Logic for branch determination and branch target address calculation
Load aligner
Bypass multiplexers used to avoid stalls when executing instruction streams where data-producing
instructions are followed closely by consumers of their results
Zero/One detect unit for implementing the CLZ and CLO instructions
ALU for performing bitwise logical operations
Shifter and Store aligner
The core execution unit implements a load-store architecture with single-cycle Arithmetic Logic Unit
(ALU) operations (logical, shift, add, subtract) and an autonomous multiply-divide unit. The core contains
thirty-two 32-bit general-purpose registers used for scalar integer operations and address calculation. The
register file consists of two read ports and one write port and is fully bypassed to minimize operation latency
in the pipeline.
Multiply/Divide Unit (MDU)
The Multiply/Divide unit performs multiply and divide operations. In the 4Kc processor, the MDU consists
of a 32x16 booth-encoded multiplier, result-accumulation registers (HI and LO), a divide state machine, and
all multiplexers and control logic required to perform these functions. This pipelined MDU supports execu-
tion of a 16x16 or 32x16 multiply operation every clock cycle; 32x32 multiply operations can be issued
every other clock cycle. Appropriate interlocks are implemented to stall the issue of back-to-back 32x32
multiply operations. Divide operations are implemented with a simple 1 bit per clock iterative algorithm and
may require up to 35 clock cycles (worst case scenario) to complete. In the early stages of executions, the
algorithm detects a sign extension of the dividend and, if its actual size is 24, 16, or 8 bits. Based on this
System
Coprocessor
Cache
Controller
MDU
TLB or FM
MMU
D-Cache
BIU
TAP
EJTAG
Power
Mgmt
I-Cache
Off-Chip
Debug I/F
Execution Core
(RF/ALU/Shift
T
O