4
RM5271 Microprocessor, Document Rev. 1.3
Quantum Effect Devices
www.qedinc.com
result go directly to the integer register file rather than the
Loregister. The portion of the multiply that would have nor-
mally gone into the Hiregister is discarded. For applica-
tions where it is known that the high half of the multiply
result is not required, using the MUL instruction eliminates
the necessity of executing an explicit MFLO instruction.
The multiply-add instructions (MAD) multiplies two oper-
ands and adds the resulting product to the current contents
of the Hiand Loregisters. The multiply-accumulate opera-
tion is the core primitive of almost all signal processing
algorithms, allowing the RM5271 to eliminate the need for a
separate DSP engine in many embedded applications.
Floating-Point Co-Processor
The RM5271 incorporates a high-performance fully pipe-
lined floating-point coprocessor which includes a floating-
point register file and autonomous execution units for multi-
ply/add/convert and divide/square root. The floating-point
coprocessor is a tightly coupled execution unit, decoding
and executing instructions in parallel with, and in the case
of floating-point loads and stores, in cooperation with the
integer unit. The superscalar capabilities of the RM5271
allow floating-point computation instructions to be issued
concurrently with integer instructions.
Floating-Point Unit
The RM5271 floating-point execution unit supports single
and double precision arithmetic, as specified in the IEEE
Standard 754. The execution unit is broken into a separate
divide/square root unit and a pipelined multiply/add unit.
Overlap of the divide/square root and multiply/add instruc-
tion is supported.
The RM5271 maintains fully precise floating-point excep-
tions while allowing both overlapped and pipelined opera-
tions. Precise exceptions are extremely important in object-
oriented programming environments and highly desirable
for debugging in any environment.
Floating-point operations include;
add
subtract
multiply
divide
square root
reciprocal
reciprocal square root
conditional moves
conversion between fixed-point and floating-
point format
conversion between floating-point formats
floating-point compare
Table 2 gives the latencies of the floating-point instructions
in internal processor cycles.
Floating-Point General Register File
The floating-point general register file (FGR) is made up of
thirty-two 64-bit registers. With the floating-point load dou-
ble (LDC1) and store double (SDC1) instructions, the float-
ing-point unit can take advantage of the 64-bit wide data
cache and issue a floating-point co-processor load or store
doubleword instruction in every cycle.
The floating-point control register space contains two regis-
ters; one for determining configuration and revision infor-
mation for the coprocessor and one for control and status
information. These are primarily used for diagnostic soft-
ware, exception handling, state saving and restoring, and
control of rounding modes. To support superscalar opera-
tion, the FGR has four read ports and two write ports, and
is fully bypassed to minimize operation latency in the pipe-
line. Three of the read ports and one write port are used to
support the combined multiply-add instruction while the
fourth read and second write port allows a concurrent float-
ing-point load or store.
Table 2:
Floating-Point Instruction Cycles
Operation
Latency
4
Repeat Rate
1
fadd
fsub
4
1
fmult
4/5
1/2
fmadd
4/5
1/2
fmsub
4/5
1/2
fdiv
21/36
19/34
fsqrt
21/36
19/34
frecip
21/36
19/34
frsqrt
38/68
36/66
fcvt.s.d
4
1
fcvt.s.w
6
3
fcvt.s.l
6
3
fcvt.d.s
4
1
fcvt.d.w
4
1
fcvt.d.l
4
1
fcvt.w.s
4
1
fcvt.w.d
4
1
fcvt.l.s
4
1
fcvt.l.d
4
1
fcmp
1
1
fmov
1
1
fmovc
1
1
fabs
1
1
fneg
1
1