IDT MIPS32 4Kc Processor Core
Pipeline Description
79RC32438 User Reference Manual
2 - 10
November 4, 2002
Notes
The MDU supports execution of a 16x16 or 32x16 multiply operation every clock cycle; 32x32 multiply
operations can be issued every other clock cycle. Appropriate interlocks are implemented to stall the issue
of back-to-back 32x32 multiply operations. Multiply operand size is automatically determined by logic built
into the MDU. Divide operations are implemented with a simple 1 bit per clock iterative algorithm with an
early in detection of sign extension on the dividend (rs). Any attempt to issue a subsequent MDU instruction
while a divide is still active causes an IU pipeline stall until the divide operation is completed.
Table 2.1 lists the latencies (number of cycles until a result is available) for multiply and divide instruc-
tions. The latencies are listed in terms of pipeline clocks. In this table “l(fā)atency” refers to the number of
cycles necessary for the first instruction to produce the result needed by the second instruction.
In Table 2.1, a latency of one means that the first and second instruction can be issued back to back in
the code without the MDU causing any stalls in the IU pipeline. A latency of two means that if the instruc-
tions are issued back to back, the IU pipeline will be stalled for one cycle. An MUL operation is special
because it needs to stall the IU pipeline in order to maintain its register file write slot. Consequently, the
MUL 16x16 or 32x16 operation will always force a one cycle stall of the IU pipeline, and the MUL 32x32 will
force a two cycle stall. If the integer instruction immediately following the MUL operation uses its (MUL
operation) result, an additional stall is forced on the IU pipeline.
Operand Size of
1st Instruction
1
1.
For multiply operations, this is the rt operand. For divide operations, this is the rs operand.
2.
Integer operation refers to any integer instruction that uses the result of a previous MDU operation.
3.
This does not include the 1 or 2 IU pipeline stalls (16 bit or 32 bit) that MUL operation causes regardless of the
following instruction. These stalls do not add to the latency of 2.
4.
If both operands are positive, the Sign Adjust stage is bypassed. Latency is then the same as for DIVU.
Instruction Sequence
Latency
Clocks
1st Instruction
2nd Instruction
16 bit
MULT/MULTU, MADD/
MADDU, or MSUB/
MSUBU
MADD/MADDU, MSUB/
MSUBU, or MFHI/MFLO
1
32 bit
MULT/MULTU, MADD/
MADDU, or MSUB/
MSUBU
MADD/MADDU, MSUB/
MSUBU, or MFHI/MFLO
2
16 bit
MUL
Integer operation
2
2
3
32 bit
MUL
Integer operation
2
2
3
8 bit
DIVU
MFHI/MFLO
9
16 bit
DIVU
MFHI/MFLO
17
24 bit
DIVU
MFHI/MFLO
25
32 bit
DIVU
MFHI/MFLO
33
8 bit
DIV
MFHI/MFLO
10
4
16 bit
DIV
MFHI/MFLO
18
4
24 bit
DIV
MFHI/MFLO
26
4
32 bit
DIV
MFHI/MFLO
34
4
any
MFHI/MFLO
Integer operation
2
2
any
MTHI/MTLO
MADD/MADDU or
MSUB/MSUBU
1
Table 2.1 4Kc Core Instruction Latencies