![](http://datasheet.mmic.net.cn/230000/V4ECFUM_datasheet_15625205/V4ECFUM_131.png)
Chapter 6. Instruction Pipeline and Timing
For More Information On This Product,
Go to: www.freescale.com
6-15
Operand Execution Pipeline (OEP)
The following example shows the basic operation of the CF4e OEP with the FPU. Consider
the following code example taken from a popular floating-point benchmark:
rInnerproduct (result, a, b, row, column)
float *result, a[rowsize+1][rowsize+1], b[rowsize+1][rowsize+1];
int row, column;
/* computes the inner product of A[row,*] and B[*,column] */
{
int i;
*result = 0.0;
for (i = 1; i <= rowsize; i++)
*result = *result + a[row][i] * b[i][column];
}
The inner for loop generates the following compiled code:
for_loop:
fmov.s
fmul.s
fadd.s
subq.l
lea
fmov.s
bpl.b
(a5),fp0
(a1)+,fp0
(a4),fp0
#1,d7
(const,a5),a5
fp0,(a4)
for_loop
; fp0 = b[i][column]
; fp0 = a[row][i] * b[i][column]
; fp0 = result + a[][] * b[][]
; decrement loop counter
; adjust pointer for b[i][column]
; store result
; if done, exit, else continue loop
Due to concurrent OEP and FPU instruction execution, the visible execution time for this
loop is less than the simple summation of individual execution times.
Figure 6-5 shows the OEP/FPU pipeline diagrams for the example in Table 6-4.
Table 6-4. FPU Execution Example
Instruction
Instruction Latency (CPU cycles)
Apparent Latency (CPU cycles)
Comments
fmov.s
1
1
FP load
fmul.s
4
4
FP multiply
fadd.s
4
4
FP add
subq.l
1
0
Hidden in OEP
lea
1
0
Hidden in OEP
fmov.s
2
2
FP store
bpl.b
1
0
Hidden via instruction folding
TOTAL
14
11
F
Freescale Semiconductor, Inc.
n
.