
RISC Microprocessor Division
Page 42
To understand the flow within a superscalar architecture, one cannot ignore instruction-specific details.
For example, consider
Figure 1
, in which load C would ordinarily bypass stores A and B. However, if
the data address of C can potentially collide (
alias
) with the data address of A or B, then C will stall in
the LSU EA slot until the aliasing store passes out of the LSU store queue.
Address translation may occur after alias checking. Since only the lower 12 bits remain constant
through translation, these are the only bits that can be checked. In addition, the addresses are
checked with word granularity (four bytes, mask = 0xffc) if the sizes of both load and store are less than
or equal to four bytes, or with double-word granularity (eight bytes, mask = 0xff8) otherwise. For
instance, 0x2000 and 0x3003 would alias to each other, but 0x2000 and 0x2020 would not.
Note that it is possible to have an alias stall even if the load and store do not actually access the same
location, because only the lower 12 bits of the address can be compared.
In a superscalar architecture, other stalls may occur due to timing considerations. For example, if a
load which aliases a store has spent only one cycle in the LSU EA stage, then the LSU circuitry is not
fast enough to prevent the load from bypassing the store in accessing the data cache. Since this
aliased load should not access the cache before the store, the LSU must cancel the load in the
subsequent cycle.
Figures 2-5
depict this situation.
In
Figure 2
, load B and store A have aliasing addresses. If B has been in the LSU EA stage for more
than one cycle (due to some other stall), then there is time to prevent it from accessing the data cache,
and the next cycle A will access the data cache. However, if B has only been in the LSU EA for one
cycle, the alias check comes too late to prevent the cache access shown in
Figure 3
. A is stalled and
cannot access the cache.
In the next cycle (
Figure 4
), the load is canceled, and in
Figure 5
the store propagates to the data
cache. Note that in this example, the store also misses in the cache and blocks the load from
accessing the data cache the next cycle.