APPENDIX A
A
-
1
Look-Ahead Packet Processing
(LAPP) Concept
APPENDIX A: LOOK-AHEAD PACKET PROCESSING
Introduction
A driver for the Am79C976 controller would normally
require that the CPU copy receive frame data from the
controllers buffer space to the applications buffer space
after the entire frame has been received by the control-
ler. For applications that use a ping-pong windowing
style, the traffic on the network will be halted until the
current frame has been completely processed by the
entire application stack. This means that the time be-
tween last byte of a receive frame arriving at the client
’
s
Ethernet controller and the client
’
s transmission of the
first byte of the next outgoing frame will be separated
by:
1. The time that it takes the client
’
s CPU interrupt pro-
cedure to pass software control from the current
task to the driver,
2. Plus the time that it takes the client driver to pass
the header data to the application and request an
application buffer,
3. Plus the time that it takes the application to gener-
ate the buffer pointer and then return the buffer
pointer to the driver,
4. Plus the time that it takes the client driver to transfer
all of the frame data from the controller
’
s buffer
space into the application
’
s buffer space and then
call the application again to process the complete
frame,
5. Plus the time that it takes the application to process
the frame and generate the next outgoing frame,
6. Plus the time that it takes the client driver to set up
the descriptor for the controller and then write a
TDMD bit to CSR0.
The sum of these times can often be about the same
as the time taken to actually transmit the frames on the
wire, thereby, yielding a network utilization rate of less
than 50 percent.
An important thing to note is that the Am79C976 con-
troller
’
s data transfers to its buffer space are such that
the system bus is needed by the Am79C976 controller
for approximately 4 percent of the time. This leaves 96
percent of the system bus bandwidth for the CPU to
perform some of the interframe operations in advance
of the completion of network receive activity, if possible.
The question then becomes: how much of the tasks
that need to be performed between reception of a
frame and transmission of the next frame can be per-
formed before the reception of the frame actually ends
at the network, and how can the CPU be instructed to
perform these tasks during the network reception time
The answer depends upon exactly what is happening
in the driver and application code, but the steps that
can be performed at the same time as the receive data
are arriving include as much as the first three steps and
part of the fourth step shown in the sequence above.
By performing these steps before the entire frame has
arrived, the frame throughput can be substantially in-
creased.
A good increase in performance can be expected when
the first three steps are performed before the end of the
network receive operation. A much more significant
performance increase could be realized if the
Am79C976 controller could place the frame data di-
rectly into the application
’
s buffer space; (i.e., eliminate
the need for step 4.) In order to make this work, it is
necessary that the application buffer pointer be deter-
mined before the frame has completely arrived, then
the buffer pointer in the next descriptor for the receive
frame would need to be modified in order to direct the
Am79C976 controller to write directly to the application
buffer. More details on this operation will be given later.
An alternative modification to the existing system can
gain a smaller but still significant improvement in per-
formance. This alternative leaves step 4 unchanged in
that the CPU is still required to perform the copy oper-
ation, but it allows a large portion of the copy operation
to be done before the frame has been completely re-
ceived by the controller, i.e., the CPU can perform the
copy operation of the receive data from the Am79C976
controller
’
s buffer space into the application buffer
space before the frame data has completely arrived
from the network. This allows the copy operation of
step 4 to be performed concurrently with the arrival of
network data, rather than sequentially, following the
end of network receive activity.