Instruction Cycle Times
8-22
Copyright 2000 ARM Limited. All rights reserved.
ARM DDI 0165B
For example, the following sequence incurs a two-cycle interlock on the first ADD
instruction, but the second ADD does not incur any interlocks:
LDRB
r0, [r1, #1]
ADD
r2, r0, r3
ADD
r4, r0, r5
A two-cycle interlock refers to the number of unwaited ARM9E-S clock cycles to
which the interlock applies. If a multi-cycle instruction separates a load instruction and
the instruction using the result of the load, then no interlock can apply. The following
example does not incur an interlock:
LDRB
r0, [r1]
MUL
r6, r7, r8
ADD
r4, r0, r5
There is no forwarding path from loaded data to the C read port of the register bank,
which is used for the store data of STR and STM instructions and for the accumulate
operand of multiply accumulate instructions. The result of a load must reach the Write
stage of the pipeline before it can be made available at the C read port, resulting in a
single-cycle load-use interlock from loaded data to the C read port.
The following example incurs a single-cycle interlock:
LDR
r0, [r1]
STR
r0, [r2]
The following example also incurs a single-cycle interlock:
LDR
r0, [r1]
MLA
r2, r3, r4, r0
The following example does not incur an interlock:
LDR
r0, [r1]
NOP ** Code to be changed to remove NOP **
STR
r0, [r2]
Most interlock conditions are determined when the instruction being interlocked is still
in the Decode stage of the pipeline. Load multiple and Store multiple instructions can
incur a Decode stage interlock when the base register is not available due to a previous
instruction. Store multiple instructions can also incur an Execute stage interlock when
the first register to be stored is not available due to a previous instruction. This is
referred to as a second-cycle interlock.
The following example incurs a single-cycle interlock:
LDR
r0, [r1]
STMIA
r0, {r1-r2}