Guidelines for Using the Optimizer
SC100 C Compiler
5-35
5.4.2 Multisample Techniques
To obtain high performance, a pipelining technique called
“
multisample
”
programming is used to process
multiple samples simultaneously. The multisample programming techniques enable you to obtain high
performance by taking full advantage of the SC100 multiple-ALU architecture.
This following terminology is used throughout this section:
Generic Kernel
: The minimum required operations of the algorithm. The generic kernel is the
theoretical minimum size of the kernel without considering implementation constraints.
Basic Kernel
: The inner loop of a DSP algorithm. This may contain several replications of the
generic kernel or additional code for pipelining. The basic kernel is actually implemented on the
DSP and is subject to implementation constraints.
Operand:
A value used as an input to an ALU.
Delays
: Values stored as a delay line for referencing past values.
Iteration
: The complete execution of a basic kernel.
Loop pass
: The execution of the instructions within the basic kernel. Many loop passes may be
needed to complete a single iteration of the kernel.
To process several samples simultaneously, operands (both coefficients and variables) are reused within
the kernel. Although a coefficient or operand is loaded once from memory, multiple ALUs may use the
value, or the value may be used in a later step of the kernel.
Figure 5-5 illustrates the structure of a single sample and multisample algorithm.
Figure 5-5. Single Sample and Multisample Kernels
In a single sample algorithm (Figure 5-5 A), samples are processed by the algorithm serially
.
The kernel
processes a single input sample and generates a single output sample. For an algorithm such as an FIR,
samples are input to the FIR kernel one at a time. The FIR kernel generates a single output for each input
sample. Blocks of samples are processed using loops and executing the FIR kernel several times.
In contrast, the multisample algorithm (Figure 5-5 B) takes multiple samples at the input in parallel and
generates multiple samples at the output simultaneously. The multisample algorithm operates on data in
small blocks. Operands and coefficients are held in registers, and applied to both samples simultaneously,
resulting in fewer memory accesses.
Multisample algorithms are ideal for block processing algorithms where data is buffered and processed in
groups (such as speech coders). Figure 5-5 B shows two samples being processed simultaneously.
However, the number of simultaneous samples depends on the processor architecture and type of
algorithm.
Single
Sample
DSP
Kernel
x(n), x(n+1)
y(n), y(n+1)
Multiple
Sample
DSP
Kernel
x(n)
x(n+1)
y(n)
y(n+1)
A. Single Sample Algorithm
B. Multiple Sample Algorithm