![](http://datasheet.mmic.net.cn/260000/PSD311L_datasheet_15948881/PSD311L_4.png)
PSD3XX – Application Note 020
1-176
16-Bit
Performance
Advantages
It is obvious that a 16-bit bus provides more performance than an 8-bit bus, at least the
data bus bandwidth will double. The following factors contribute to the performance
improvement:
Program Code Fetch
Instructions such as ANDB of the 80C196 consists of 4 bytes. In an 8-bit bus system it
takes 4 bus cycles to fetch the instruction, while in 16-bit bus designs it takes only 2 bus
cycles.
Data Fetch
For applications with high data transfer rate, where indexed or indirect references are
frequently used, a 16-bit bus takes much less time to accomplish the same job.
Queue Flush for Branch/Jump
Instructions
A pre-fetch queue usually speeds up instruction execution time by providing instructions
to the Execution Unit in a timely manner. However there is a penalty which goes with the
queue when a successful branch or jump instruction is executed. The queue has to be
flushed, Program Counter to be reloaded, and new instructions to be fetched. A 16-bit
bus helps to fill up the queue much faster. This is critical to system performance since
Branch/Jump instructions are the most frequently used instructions in general.
Free Up The System Bus
The microcontroller reduces its number of operand fetches in a 16-bit bus, freeing the bus
for other devices which share the same bus. In system which has a DMA Controller
or Slave Processor sharing the same memory space with the microcontroller, the less
usage of the memory bus will enhance system performance.
Let us look at a sample program to calculate the differences in execution time between an 8
and a 16-bit bus. In the typical 16-bit design example above, there is a look-up table
residing in the EPROM. A look-up table is a quick way for the program to provide an output
to an I/O device based on the input value without getting into complex mathematical
operations. The following program, which is published in Intel application note AP-248, does
table look-up and interpolation.
Assuming the 80C196 queue is always full, to execute the following code takes 128 state
times in a 16-bit bus. In an 8-bit bus, it takes 32 more state times just to fetch the codes
and data, not including the time the microcontroller waits for the queue to be filled. The
estimated performance penalty for an 8-bit bus in this application is at least 25%, and will
certainly be more in the actual run time environment. The published statement from Intel is
that it is difficult to measure the 8-bit bus performance penalty, but has shown to be up to
30%, depending on the instruction mix.
The 16-bit bus design will increase the system performance, especially for microcontrollers
which usually don’t have internal program cache or a pre-fetch pipeline queue to lessen
the penalty caused by the bottle neck on the memory bus. The 80C196 has an internal 4
byte queue. This helps execution time but bus width still remains the critical factor.