IDT MIPS32 4Kc Processor Core
Pipeline Description
79RC32438 User Reference Manual
2 - 9
November 4, 2002
Notes
Once the data is returned to the core, the critical word of data passes through the aligner before being
forwarded to the execution unit and register file. The bypass mechanism allows the core to use the data
once it becomes available, as opposed to having the entire cache line written to the data cache, then
reading out the required word.
Figure 2.5 shows a timing diagram of a data cache miss for the 4Kc core.
Figure 2.5 Load/Store Cache Miss Timing
Multiply/Divide Operations
The 4Kc core implements the standard MIPS II multiply and divide instructions. In addition, several
new instructions have been added that enhance the core’s performance.
The targeted multiply instruction, MUL, specifies that multiply results are placed in the general purpose
register file instead of the HI/LO register pair. By avoiding the explicit MFLO instruction, required when
using the LO register, and by supporting multiple destination registers, the throughput of multiply-intensive
operations is increased.
Four instructions — multiply-add (MADD), multiply-add-unsigned (MADDU), multiply-subtract (MSUB),
and multiply-subtract-unsigned (MSUBU) — are used to perform the multiply-accumulate and multiply-
subtract operations. The MADD/MADDU instruction multiplies two numbers and then adds the product to
the current contents of the HI and LO registers. Similarly, the MSUB/MSUBU instruction multiplies two oper-
ands and then subtracts the product from the HI and LO registers. The MADD/MADDU and MSUB/MSUBU
operations are commonly used in DSP algorithms.
All multiply operations (except the MUL instruction) write to the HI/LO register pair. All integer operations
write to the general purpose registers (GPR). Because MDU operations write to different registers than
integer operations, integer instructions that follow MDU operations can execute before the MDU operation
has finished. The MFLO and MFHI instructions are used to move data from the HI/LO register pair to the
GPR file. If a MFLO or MFHI instruction is issued before the MDU operation finishes, the instruction will stall
to wait for the data.
MDU Pipeline
The 4Kc processor core contains an autonomous multiply/divide unit (MDU) with a separate pipeline for
multiply and divide operations. This pipeline operates in parallel with the integer unit (IU) pipeline and does
not stall when the IU pipeline stalls. This allows long-running MDU operations, such as a divide, to be
partially masked by system stalls and/or other integer unit instructions.
The MDU consists of a 32x16 booth encoded multiplier, result/accumulation registers (HI and LO), a
divide state machine, and all necessary multiplexers and control logic. The first number shown (‘32’ of
32x16) represents the rs operand. The second number (‘16’ of 32x16) represents the rt operand. The core
only checks the latter (rt) operand value to determine how many times the operation must pass through the
multiplier. The 16x16 and 32x16 operations pass through the multiplier once. A 32x32 operation passes
through the multiplier twice.
D-TLB
D-Cache
ALU1
B-ASel
RegR
Bus*
RegW
Align
* Contains all of the time that address and data are utilizing the bus.
W
A
A
A
A
M
E