TM1100 Preliminary Data Book
Philips Semiconductors
4-6
PRELIMINARY INFORMATION
File: cstm.fm5, modified 7/26/99
computations (one group per pixel) do not depend on
each other.
After some experience is gained with custom operations,
it is not necessary to unroll loops to discover situations
where custom operations are useful. Often, a good pro-
grammer with knowledge of the function of the custom
operations can see by simple inspection opportunities to
exploit custom operations.
To understand how quadavg and dspuquadaddui can be
used in this code, we examine the function of these cus-
tom operations.
The quadavg custom operation performs pixel averaging
on four pairs of pixels in parallel. Formally, the operation
of quadavg is as follows:
quadavg rscr1 rsrc2 -> rdest
takes arguments in registers rsrc1 and rsrc2, and it com-
putes a result into register rdest. rsrc1 = [abcd], rsrc2 =
[wxyz], and rdest = [pqrs] where a, b, c, d, w, x, y, z, p, q,
r, and s are all unsigned eight-bit values. Then, quadavg
computes the output vector [pqrs] as follows:
p = (a + w + 1) >> 1
q = (b + x + 1) >> 1
r = (c + y + 1) >> 1
s = (d + z + 1) >> 1
The pixel averaging in
Figure 4-5 is evident in the first
statement of each of the four groups of statements. The
rest of the code—adding idct[i] value and performing the
saturation test—can be performed by the dspuquadad-
dui operation. Formally, its function is as follows:
dspuquadaddui rsrc1 rsrc2 -> rdest
takes arguments in registers rsrc1 and rsrc2, and it com-
putes a result into register rdest. rsrc1 = [efgh], rsrc2 =
[stuv], and rdest = [ijkl] where e, f, g, h, i, j, k, and l are
unsigned eight-bit values; s, t, u, and v are signed eight-
bit values. Then, dspuquadaddui computes the output
vector [ijkl] as follows:
i = uclipi(e + s, 255)
j = uclipi(f + t, 255)
k = uclipi(g + u, 255)
l = uclipi(h + v, 255)
The uclipi operation is defined in this case as it is for the
separate TM1100 operation of the same name described
definition is as follows:
uclipi (m, n)
{
if (m < 0) return 0;
else if (m > n) return n;
else return m;
}
To make is easier to see how these operations can sub-
same code rearranged to group the related functions.
Now it should be clear that the quadavg operation can re-
place the first four lines of the loop assuming that we can
get the individual 8-bit elements of the back[] and for-
ward[] arrays positioned correctly into the bytes of a 32-
bit word. That, of course, is easy: simply align the byte ar-
rays on word boundaries and access them with word (in-
teger) pointers.
Similarly, it should now be clear that the dspuquadaddui
operation can replace the remaining code (except, of
course, for storing the result into the destination[] array)
assuming, as above, that the 8-bit elements are aligned
and packed into 32-bit words.
Figure 4-7 shows the new code. The arrays are now ac-
cessed in 32-bit (int-sized) chunks, the loop iteration con-
trol has been modified to reflect the “four-at-a-time” oper-
void reconstruct (unsigned char *back,
unsigned char *forward,
char *idct,
unsigned char *destination)
{
int i, temp;
for (i = 0; i < 64; i += 4)
{
temp = ((back[i+0] + forward[i+0] + 1) >> 1) + idct[i+0];
if (temp > 255) temp = 255;
else if (temp < 0) temp = 0;
destination[i+0] = temp;
temp = ((back[i+1] + forward[i+1] + 1) >> 1) + idct[i+1];
if (temp > 255) temp = 255;
else if (temp < 0) temp = 0;
destination[i+1] = temp;
temp = ((back[i+2] + forward[i+2] + 1) >> 1) + idct[i+2];
if (temp > 255) temp = 255;
else if (temp < 0) temp = 0;
destination[i+2] = temp;
temp = ((back[i+3] + forward[i+3] + 1) >> 1) + idct[i+3];
if (temp > 255) temp = 255;
else if (temp < 0) temp = 0;
destination[i+3] = temp;
}
Figure 4-5. MPEG frame reconstruction code using TM1100 custom operations; compare with Figure 4-4.