6-10
ColdFire CF4e Core User’s Manual
For More Information On This Product,
Go to: www.freescale.com
Operand Execution Pipeline (OEP)
Using the nomenclature from the superscalar MC68060, two sequential instructions are
loaded into the OEP when an opcode is requested from the IFP. The OEP is implemented
as the primary OEP (pOEP) plus the DS stage for the secondary OEP (sOEP). The sOEP
instruction dispatch criteria are evaluated in the DS pipeline stage; if successful, the
secondary instruction is issued to the OAG stage. V4 sOEP instructions are restricted to 16
bits with no operand memory references and are executed in the ExComputeEngine.
V4 instruction pairs are grouped into the three following broad categories. By supporting
superscalar dispatch for heavily-used instruction constructs, the design approaches the
performance of a full, dual-pipeline OEP at a much lower silicon cost.
Group 1: Zero-cycle loads
pOEP inst = mov.l {Ry | <mem>y},Rx
sOEP inst = op.l {Rw | #qimm},Rx
(
#qimm
represents a 3-bit quick immediate operand)
For this pair, a MOVE shares a destination register with a secondary instruction and the full
capabilities of the ExComputeEngine’s three-terminal structure can be exploited. The
executeResult is a function of (operand1, operand2). The combined instructions are issued
into the OAG stage as op.l {Ry | <mem>y},{Rw | #qimm},Rx to be executed by the OEP
EX stage.
Group 2: Zero-cycle stores
pOEP inst = store.* Ry,<mem>x
sOEP inst = mov.l Rw,Rz
movq.l #imm,Rz
For this pair, a store operation (mov.{b,w,l} or clr.{b,w,l}) is combined with a simple
register load. The Ex compute engine store unit executes the operand write. This function
is tied directly to the operand1_ex register, providing the required post-alignment
multiplexing logic. The sOEP instruction is issued to the barrel shifter (BSU), that performs
a passOperand2 operation. ExComputeEngine processes both operations concurrently.
or,
Group 3: Zero-cycle address results
pOEP inst =
lea <ea>y,Rx
mov.l #imm,Rx
movq.l #imm,Rx
clr.l Rx
mov3q.l #qimm,Rx
op.l
{Rw | #qimm},Rz or,
mov.l
Rw,Rz
movq.l #imm,Rz
cmp.l
Rw,Rz
or,
or,
or,
or,
sOEP inst =
or,
or,
This pair combines a pOEP instruction executed by the OagComputeEngine (the
AddressResult) with a sOEP instruction executed in the ExComputeEngine.
Measurements show that folding instructions to create zero-cycle moves improves overall
processor performance by 10% on compiled code. Combining this with improvements
F
Freescale Semiconductor, Inc.
n
.