
Chapter 3 Assembly Language Programming
185
plication , and one for division/square root. The functional units used by the pipelines
all operate separately. This enables multiple floating–point instructions to be in
execute at the same time. Additionally floating–point operations can commence in
parallel with operations carried out by the processor’s integer pipeline. The operation
of some of the pipeline functional units can be multicycle and contention for re-
sources can result if simultaneous floating–point operations are being performed.
However, all floating–point operations are fully interlocked, and operations requir-
ing the result of a previous functional unit operation are prevented from proceeding
until that result is available. The programmer never has to become involved in the
pipeline stage details to ensure the success of an operation.
To sustain efficient use of the floating–point pipelines, four floating–point accu-
mulator registers are provided. The programmer must multiplex their use during
heavily pipelined code sequences to reduce resource contention. The Am29050 pro-
cessor can issue a new floating–point instruction every cycle but many of the opera-
tions have multicycle latency. Thus to avoid pipeline stalling, the results should not
be used until a sufficient number of delay cycles has passed (see Am29050 processor
User’s Manual). The processor has an additional 64–bit write port on the general pur-
pose register file for use by the floating–point unit. This enables floating–point re-
sults to be written back at the same time as integer pipeline results.
The floating–point accumulators can be accessed by the MTACC (move–to)
and MFACC (move–from) instructions which are available to User mode code. Only
29K family members which directly support floating–point operations implement
these instructions.
3.4
DELAYED EFFECTS OF INSTRUCTIONS
Modification of some registers has a delayed effect on processor behavior.
When developing assembly code, care must be taken to prevent unexpected behav-
ior. The easiest of the delayed effects to remember is the one cycle that must follow
the use of an indirect pointer after having set it. This occurs most often with the regis-
ter stack pointer. It cannot be used to access a local register in the instruction that fol-
lows the instruction that writes to
gr1
. An instruction that does not require
gr1
(and
that means all local registers referenced via
gr1
) can be placed immediately after the
instruction that updates
gr1
.
Direct modification of the CPS register must also be done carefully. Particularly
where the freeze (FZ) bit is cleared. When the processor is frozen, the special-pur-
pose registers are not updated during instruction execution. This means that the
PC1
register does not reflect the actual program counter value at the current execution ad-
dress, but rather at the point where freeze mode was entered. When the processor is
unfrozen, either by an interrupt return or direct modification of the CPS, two cycles
are required before the PC1 buffer register reflects the new execution address. Unless
the CPS register is being modified directly, this creates no problem.