\documentclass{article}
\input{6111-preamble}
\usepackage{graphicx,epsfig,verbatim,lscape,listings}


\begin{document}
\pagenumbering{roman}
\title{A Digital Signal Processor}
%\subtitle{Using Finite Impulse Response Convolution}
\author{Dan Ports}
%\maketitle
\null \vfil
%\vskip 1em
\begin{center}
\par
\huge{A Digital Signal Processor}
\par
\LARGE{Using Finite Impulse Response Convolution}
\vskip 4em
{\lineskip .75em
\Large{Dan Ports}\par
\Large{October 28, 2003}\par
\vskip 5em
\large{6.111 Lab 3}\par
\large{TA: Frank Honore}\par
}\end{center}
\newpage
\begin{abstract}
\normalsize{Digital signal processing can be performed using the method of finite impulse response convolution. This implementation performs signal processing by sampling the input signal at 44.287 kHz using an analog-to-digital converter, stores the sample values in a circular buffer in RAM, computes an output value by convolving the most recent 16 sample values with coefficients stored in ROM, and outputs the computed value using a digital-to-analog converter. The system is controlled by a synchronous finite state machine. Convolution is performed by a minor FSM and an 8-bit sequential multiplier. The design is realized in hardware using three chips: a FPGA, an ADC, and a DAC. Testing and debugging issues are discussed. The digital signal processor was successfully implemented and passed all tests.}
\end{abstract}
\newpage

\tableofcontents
\newpage
\listoffigures
%\listoftables
\newpage
\pagenumbering{arabic}
\section{Overview}
This report describes a signal processor that processes an input signal based on one of sixteen selectable, predefined filters. The signal processor is implemented digitally, by converting the input signal to a digital representation, processing it using a FPGA, and outputting it through a digital-to-analog converter. Computations are performed using the method of finite impulse response convolution, with impulse responses of sixteen 8-bit samples.

\section{Operation}
This device accepts a differential voltage input (V$_\mathrm{in}$) with amplitude at most $\pm 1.28$V and processes it. The output is generated with a $1.28$V offset, i.e. a range of 0V -- 2.56V referenced to ground. Sampling is performed at approximately 44.287 kHz, so input signals can contain frequency components up to approximately 22.1 kHz without causing aliasing.

The RESET button should be pressed after powering on the system to ensure that all internal components are in valid states. Afterwards, the system will automatically begin sampling and processing the input signal. The filter to be applied may be chosen from the sixteen available filters using the SELECT switches.

A BYPASS switch (active high) is provided. It should be left disabled for normal operation. When enabled, the output from the ADC is passed directly to the DAC, bypassing the convolution process. This function is provided for testing purposes.

\section{Design}
\subsection{Overview}
The design of this device is represented by the block diagram shown in Figure~\ref{blockdiagram}. All synchronous components are connected to the standard NuBus clock. An analog-to-digital converter (ADC) and a digital-to-analog converter (DAC) are used to digitize the analog input signal and generate an output signal. These are controlled by the central Control FSM, which begins a sample and processing cycle every time it receives a signal from the Divider module. The Divider outputs a pulse once every 226 clock cycles, to maintain the appropriate sample rate. The Control FSM uses the DAC to output the most recently computed filtered sample, then simultaneously starts the analog-to-digital conversion cycle and the convolution arithmetic. When these are completed, the stores the newly-read value from the ADC in RAM, then waits for the next sample pulse from the divider.

Mathematical computations are performed by the Convolver FSM, which is controlled by the Controller FSM, and makes use of a Multiplier module. The Convolver FSM waits for a START signal from the Controller FSM, and then records the starting address given to it by the Controller FSM. It performs convolutions by summing sixteen partial products in an accumulator. Each partial product is computed by multiplying a sample stored in RAM (address given by StartAddr - i) by a coefficient stored in ROM (address given by i). The convolver FSM outputs the addresses to the RAM and ROM, then starts the multiplier. The data outputs from the RAM and ROM are connected to the multiplier. The multiplier is a sequential multiplier, requiring eight cycles to perform a multiplication. Once the multiplier finishes computation, the convolver adds the result to its accumulator and prepares for the next multiplication. When all 16 partial products have been summed, the convolver returns the highest-order 8 bits of the result. Numeric conversion modules are used throughout the design to convert sign-magnitude representations of numbers to two's complement, and vice versa.

\begin{center}
\begin{landscape}
\begin{figure}[bpht]
\begin{center}
\includegraphics[scale=.5]{logicdiagram}
\end{center}
\caption{System block diagram}
\label{blockdiagram}
\end{figure}
\end{landscape}
\end{center}

All components except for the clock, digital-to-analog converter, and analog-to-digital converter are implemented using an Altera FLEX 10K10 FPGA, as depicted in the wiring diagram in Figure~\ref{wiringdiagram}. The VHDL files used to program the FPGA are attached in the appendix, and the design of the individual modules are described in this document. The clock is a standard 10 MHz NuBus clock, and the ADC and DAC are Analog Devices AD670 and AD558 chips respectively. A common 8-bit data bus is used for the ADC and DAC. nputs are provided using debounced switches for the SELECT and BYPASS inputs, and a debounced push-button for the RESET signal.

\begin{center}
\begin{landscape}
\begin{figure}[pbht]
\begin{center}
\includegraphics[scale=.5]{wiringdiagram}
\end{center}
\caption{Hardware wiring diagram}
\label{wiringdiagram}
\end{figure}
\end{landscape}
\end{center}


\subsection{Clock}
All synchronous components in this system operate on the rising edge of a single, global clock signal. This clock signal is generated by the standard 10 MHz 25\% duty cycle NuBus clock integral to the lab kit.

\subsection{Control FSM}
The Control FSM is the major FSM in this system. It operates the analog-to-digital and digital-to-analog converters to read and write the output signals, and starts the Convolver FSM to perform convolutions. The six possible states, their outputs, and the transitions between them are shown in the state diagram in Figure~\ref{statediagram}. All outputs not listed in a state's node in the diagram are set to their default inactive value (0 for active high signals and 1 for active low signals).
\begin{figure}[pbht]
\begin{center}
\includegraphics[scale=.5]{fsm-state}
\caption{Control FSM state diagram}
\label{statediagram}
\end{center}
\end{figure}

The FSM is defined by four processes that operate concurrently. One process determines the next state asynchronously as a function of the current state. The next two processes generate the outputs as a function of the current state and read the input data bus in the ADCRead state. The final process performs state transitions on the rising edge of the clock.

The system starts up in the WaitForTimer state. It remains in this state until a pulse is received from the Divider module that serves as the sample timer.

When this signal is received, the FSM transitions to the DACWrite state. In this state, the DAC's enable signal is set low (active), and the most recently computed output value is output on the data bus. The FSM stays in this state for three clock cycles. An internal variable (i) is decremented on each clock cycle to allow this interval to be measured.

After the counter reaches zero, the FSM moves to the ADCEnable state. The ADCEnable and ADCRead signals are set low in order to begin the ADC's conversion process. The ConvolverStart signal is also asserted high in order to simultaneously begin the computation of the next output value. 

When the ADCStatus and ConvolverBusy input signals are asserted to indicate that the ADC and Convolver have received their start signal, the Controller FSM moves to the ADCWait state, disabling the ADCRead and ConvolverStart signals. It remains in this state until the ADC and the Convolver complete their processing, as indicated by the ADCStatus and ConvolverBusy signals both low.

The FSM then transitions to the ADCRead state, in which the ADC data bus and convolver result are read into the input and output buffers respectively. The FSM remains in this state for one clock cycle, then moves to the RAMWrite state. The most recently read input value from the ADC is stored in the RAM by setting the RAM address select signal to the value of the FSM's address counter, outputting the contents of the input buffer on the RAM data bus, and enabling the RAM's write enable signal (RAMWE). 

After one cycle, the FSM returns to the WaitForTimer state and waits for the next pulse from the divider.

\begin{figure}[pbht]
\begin{center}
\includegraphics[scale=.4]{fsm-timing}
\end{center}
\caption{Timing diagram for Control FSM}
\label{fsmtiming}
\end{figure}

\subsection{Convolver FSM}
The Convolver FSM performs convolutions using the data values stored in the RAM and the impulse response coefficients stored in ROM. Since the impulse responses are 16 samples in length, the convolution is performed according to the equation
\[y[n] = \frac{1}{128} \sum_{i=0}^{15} x[n-i] h[i]\]
which is, using the RAM and ROM implementation,
\[output = \frac{1}{128} \sum_{i=0}^{15} \left(RAM\left[base-i\right]\right) \, \left(ROM\left[i\right]\right)\]

The division by 128 is performed by dropping the lowest-order seven bits, and is done to scale the output by the maximum area under an impulse response (as specified in the impulse coefficients provided). This scaling ensures that the output is a proper 8-bit integer with appropriate magnitude relative to the input signal.

The convolver is implemented as a FSM with five states and a 4-bit counter variable $i$, in addition to a 15-bit accumulator. The five states and their transitions are depicted in the state diagram in Figure~\ref{convolver-state}.

The convolver FSM is implemented using two processes that operate concurrently. One generates the next state as a function of the current state, and the other performs state transitions and updates the outputs on a rising edge of the clock.

The FSM is triggered by a start signal from the control FSM, and outputs a busy signal whenever it is currently performing a convolution. When the busy signal returns to its inactive low state, the result is available on the 8-bit output signal.

\begin{figure}[bhpt]
\begin{center}
\includegraphics[scale=.5]{convolver-state}
\caption{Convolver FSM state diagram}
\label{convolver-state}
\end{center}
\end{figure}

 The convolver starts up in the Idle state. When the convolver receives the START signal from the control FSM, it also receives the memory offset of the current sample; this address is stored in the BaseAddr register. The convolver FSM then moves to the AddrSelect state, which outputs the appropriate values on the RAM and ROM address lines. For the RAM, this address is the value of BaseAddr - $i$ - 1 (since BaseAddr is in fact the memory location of the sample currently being converted by the ADC while the convolver is running, we must subtract 1). The ROM address is simply $i$. The convolver remains in this state for one cycle to allow for the RAM and ROM access time, then moves to the StartMult state. In this state, the RAM and ROM address lines maintain their values, and the multiplier start signal is set high. When the multiplier starts and asserts its busy signal, the convolver FSM moves to the WaitMult state, and waits for the multiplier busy signal to return low, indicating that the multiplication is complete. When this signal is received, the value of the 15-bit multiplier result (in two's-complement format) is added to the accumulator. If the value of the $i$ counter is 15, the FSM moves to the Output state; otherwise, $i$ is incremented and the FSM returns to the AddrSelect state to compute the next partial product. In the Output state, the convolver truncates the value in the accumulator, returning the highest-order eight bits and sending them to the result output signal (again, in two's-complement format).
\begin{figure}[bhpt]
\begin{center}
\includegraphics[scale=.4]{convolver-timing}
\end{center}
\caption{Timing diagram for Convolver FSM}
\label{convolver-timing}
\end{figure}

\subsection{Multiplier}
An 8-bit sequential multiplier is used to generate the partial products in the convolution. The multiplier takes in two eight-bit integers in sign-magnitude format (1 sign bit followed by 7 magnitude bits), and generates a 15-bit sign-magnitude result. Each multiplication requires eight cycles to perform. The multiplier is triggered by a start signal from the convolver FSM, and outputs a busy signal whenever it is currently multiplying. When the busy signal returns to its inactive low state, the result is available.

Internally, the multiplier operates by maintaining two fifteen-bit shift registers, SREG and HREG, and a fifteen-bit accumulator. When the start signal is received, the A input is copied into the SREG register and the B input is copied into HREG. Each clock cycle, SREG shifts to the left and HREG shifts to the right. If the least significant bit of HREG is 1, SREG is added to the accumulator. After 8 cycles, the accumulator contains the product of A and B (ignoring sign). The sign bit is generated by taking the XOR of the sign bits from A and B.

\begin{figure}[bhpt]
\begin{center}
\includegraphics[scale=.5]{multiplier-timing}
\end{center}
\caption{Multiplier timing diagram}
\label{multiplier-timing}
\end{figure}

\subsection{Divider}
The purpose of the divider is to generate a 44.287 kHz sample timer signal from the 10 MHz clock signal. The output of the divider is a signal that is high for one pulse of the system clock, each time a sample needs to be performed. This is done by maintaining an 8-bit counter that is incremented each clock pulse. When the counter reaches 225, the output is set high. On the next clock pulse (when the counter is 226), the counter is reset to zero and the output is set low again.

The divider also has a reset signal that allows the counter to be reset to zero. This is connected to the RESET input.

The divider was originally designed for use in Lab 2; it was adapted for use in this lab.
 
\begin{figure}[bthp]
\begin{center}
\includegraphics[scale=.5]{divider-timing}
\end{center}
\caption{Divider timing diagram (ratio changed from 226:1 to 10:1)}
\label{dividertiming}
\end{figure}

\subsection{Numeric Conversion}
While the ADC and the convolver perform their computations with two's complement representations of numbers, the multiplier uses sign-magnitude representations for its input and output because this makes multiplication easier to implement. Hence, it is necessary to convert between the two representations. This is the purpose of the numeric conversion module. This module converts two's complement numbers to sign-magnitude, and vice versa.

Conversion is performed asynchronously by inverting all bits of the number and adding one, if the sign bit (most significant bit) is one; otherwise, the output is the same as the input. A VHDL generic construct is used to allow the width of the number to be specified.

An 8-bit numeric conversion module is used to convert the two's complmenet numbers stored in RAM into sign-magnitude representations to be input into the multiplier, and a 15-bit module is used to convert the sign-magnitude output of the multiplier to two's complement. The impulse coefficients stored in ROM are already in sign-magnitude format, so no conversion is necessary.

\begin{figure}[bthp]
\begin{center}
\includegraphics[scale=.4]{numconv-timing}
\end{center}
\caption{Numeric conversion timing diagram)}
\label{numconvtiming}
\end{figure}

\subsection{Analog-to-Digital Converter}
An analog-to-digital converter (ADC) is required to convert the incoming analog signal to a digital representation. The Analog Devices AD670 chip is used to fulfill this requirement. The chip accepts a differential voltage input, and is configured for a bipolar $\pm$1.28V range. The two's complement output format is selected, using the wiring configuration shown in Figure~\ref{wiringdiagram}.

The control FSM handles the task of starting ADC conversions when appropriate. Because an ADC conversion requires over a hundred clock cycles to perform, it is necessary for the convolution arithmetic to be performed at the same time as the ADC conversion.

\subsection{Digital-to-Analog Converter}
The digital-to-analog converter (DAC) converts the computed output value to an analog voltage. This is implemented with an Analog Devices AD558 chip. The V$_\mathrm{out}$, V$_\mathrm{out}$ Sense, and V$_\mathrm{out}$ Select outputs are tied together to select the +2.56V range. The DAC is powered by the lab kit's +5 Analog supply, and referenced to ground. The output range is 0 -- +2.56V, so the zero value has a 1.28V offset relative to ground.

The DAC and ADC share a data bus. Because both chips are controlled by the Controller FSM, only one is active at any given time, and bus contention can never occur.

\subsection{RAM}
A 16 location by 8 bit memory is used for storing the 16 most recent input sample values from the ADC, in two's complement representation. A single port asynchronous RAM is used, using the Altera lpm\_ram\_dq predefined component. The D input is connected to the Controller FSM via an 8-bit bus, and the Q output is connected to the multiplier's B input via a numeric conversion module that converts the two's complement number read from RAM to sign-magnitude.

The write-enable (WE) line is connected to the RAMWE output on the Controller FSM. The WE line is gated by taking the logical AND of the RAMWE output from the Controller FSM with the inverse of the clock. Thus, the RAM is only enabled for writing when the clock is low and the Controller FSM has requested a write. This ensures that glitches on the RAMWE line caused by static hazards in the implementation of the Controller FSM do not cause spurious writes to overwrite data in the RAM.

A multiplexer is used to control access to the 4-bit RAM address bus. When the Convolver FSM's BUSY output is low, the multiplexer selects the Controller FSM's RAMADDR output; when the convolver is busy, the multiplexer instead selects the Convolver FSM's RAMADDR output. This ensures that whichever FSM is currently active controls the RAM address input. This is an effective scheme because only one FSM ever accesses the RAM at the same time; moreover, the Controller FSM only writes to the RAM and the Convolver FSM only selects addresses for the Multiplier module to read.

A dual-port RAM was originally selected in order to simplify this implementation. However, the FLEX10K10 FPGA could not implement a dual-port RAM without synthesizing it from discrete latches, and there was not enough space on the chip for this implementation. Hence, the RAM was replaced with a single-port RAM, and the multiplexer described above was added.

The Controller FSM maintains a counter to keep track of which location is to be filled next, and passes this offset to the Convolver FSM for use in computation. Because a 4-bit counter is used, it allows only 16 values. The counter is allowed to overflow, so address 15 is followed by address 0. This makes the memory function essentially as a circular buffer, beginning with the offset specified by the counter. In a similar manner, 4-bit integer overflow is allowed when the Convolver FSM calculates RAM addresses, so that the correct address will be determined.

\subsection{ROM}
A 256 location by 8 bit ROM is used to store the impulse response coefficients used in convolution calculations. These predefined values are provided by a file of hex values, in sign-magnitude format. The ROM operates asynchronously, with the address lines connected to the convolver's ROMADDR output, and the data bus connected directly to the multiplier's A input.

The ROM is implemented on the FPGA using Altera's lpm\_rom library.

\section{Testing and Debugging}
All modules of the system were tested first in simulation using MAX+PLUS II software, then in hardware once the FPGA was programmed. 

The numeric conversion module and multiplier were tested by providing them with a representative sample of input values, including both positive and negative numbers, and verifying that they produced the correct output valus.

To simplify testing the divider, it was modified slightly so that it generated a pulse every 10 cycles of the clock rather than every 226 (as in Figure~\ref{dividertiming}). This made the behavior easier to visualize and validate in simulation. A clock signal was applied to the input, and the output and state of the internal counter were verified. The reset signal was also tested.

The two FSMs were tested in simulation using a waveform that tested the possible states and transitions. The values of the state variables and internal counters were used to verify that the transitions were being performed correctly, in addition to verifying that the FSMs produced the correct outputs at the correct times.

Prior to testing the arithmetic modules, the analog-to-digital and digital-to-analog converters and the control FSM and divider required to operate them were tested for the analog checkoff. This testing used a simplified version of the control FSM (listed in the appendix), and no convolver or other computational components. The FSM simply inverted the input signal and output it on the data bus to the DAC. The ADC was configured in offset binary mode rather than two's complement; this simplified implementation because the DAC accepts an offset binary input. The input signal was inverted to ensure that the FSM was actually controlling the DAC, and the data bus was not simply holding its value due to parasitic capacitance on a tristate bus.

The top-level structure combining all of the modules was next tested in simulation to verify that it produced the correct sequence of output values. After this simulation testing was completed, the FPGA was programmed and tested in hardware.

In addition to verifying the correct operation of the FSMs through the logic analyzer, testing included connecting a function generator to the input and monitoring the input and output on the digital oscilloscope. Filters were tested with sine and square waves of various frequencies. In particular, the first four impulse responses were used for testing, along with a square wave generated by the function generator. The input and output of these filters is shown in Figures~\ref{test-0},~\ref{test-1},~\ref{test-2},~and~\ref{test-3}. 
\begin{figure}[pbht]
\begin{center}
\includegraphics[scale=.5]{3_0}
\end{center}
\caption{Input and output of test filter 0 (single positive impulse)}
\label{test-0}
\end{figure}

\begin{figure}[pbht]
\begin{center}
\includegraphics[scale=.5]{3_1}
\end{center}
\caption{Input and output of test filter 1 (single negative impulse)}
\label{test-1}
\end{figure}

\begin{figure}[pbht]
\begin{center}
\includegraphics[scale=.5]{3_2}
\end{center}
\caption{Input and output of test filter 2 (boxcar filter)}
\label{test-2}
\end{figure}

\begin{figure}[pbht]
\begin{center}
\includegraphics[scale=.5]{3_3}
\end{center}
\caption{Input and output of test filter 3 (exponential filter)}
\label{test-3}
\end{figure}

During testing, a small bug was discovered that caused occasional output samples to have radically different values than expected. Testing revealed that this discrepancy was caused by a bug in the numeric conversion module that caused invalid results to be generated when the sign of a sign-magnitude number was negative but the magnitude was zero (the ``minus-zero'' case). Logic was added to identify this case and produce the correct output value. Once this fix was implemented, the signal processor produced the correct values.

\section{Conclusion}
A digital signal processor can be implemented using the method of finite impulse response convolution. It can be built using an analog-to-digital converter, digital-to-analog converter, two synchronous finite state machines, and a sequential multiplier, implemented using a FPGA and AD558 and AD670 chips. The processor described above was successfully implemented and tested.

\appendix
\newpage
\section{Appendix: VHDL Source}
The following code was compiled with MAX+PLUS II and used to program the FPGA to implement this signal processor, along with the pin assignments represented in Figure~\ref{blockdiagram}.

\subsection{top.vhd}
\lstinputlisting[language=vhdl]{top.vhd}

\newpage
\subsection{fsm.vhd}
\lstinputlisting[language=vhdl]{fsm.vhd}

\newpage
\subsection{convolver.vhd}
\lstinputlisting[language=vhdl]{convolver.vhd}

\newpage
\subsection{multiplier.vhd}
\lstinputlisting[language=vhdl]{multiplier.vhd}

\newpage
\subsection{numconv.vhd}
\lstinputlisting[language=vhdl]{numconv.vhd}

\newpage
\subsection{divider.vhd}
\lstinputlisting[language=vhdl]{divider.vhd}

\newpage
\subsection{ram.vhd}
\lstinputlisting[language=vhdl]{ram.vhd}

\newpage
\subsection{rom.vhd}
\lstinputlisting[language=vhdl]{rom.vhd}

\newpage
\subsection{impulses.hex}
\lstinputlisting{impulses.hex}

\newpage
\section{Appendix: VHDL Source for Analog Checkoff}
The following two additional source files were used to program the FPGA for the analog checkoff, in which the digital-to-analog and analog-to-digital converters were tested, along with a simple control FSM in the FPGA. Note also that the ADC was wired in the offset binary rather than two's complement configuration for the analog checkoff.

\subsection{top-analog.vhd}
\lstinputlisting[language=vhdl]{top-analog.vhd}

\newpage
\subsection{fsm-analog.vhd}
\lstinputlisting[language=vhdl]{fsm-analog.vhd}

\end{document}