Key-Words: DDFS, Direct-Digital Frequency Synthesizer, Nonuniform Segmentation

Direct Digital Frequency Synthesizer
ABSTRACT
This project investigates novel direct digital frequency synthesizer architecture. The new approach allows reducing the total number of segments with respect to the wellknown uniform segmentation. In this way the size of the coefficient ROM is also reduced with beneficial effects in terms of speed and power. We show that the optimal nonuniform segmentation (that maximizes the spurious-free dynamic range for a given number of nonuniform segments) can be obtained as them solution of a mixed-integer linear programming problem. Three simple, suboptimal, nonuniform segmentation schemes (which lend themselves to efficient hardware implementation) are proposed in this paper. We present also several design examples and VLSI implementation results, which demonstrate the effectiveness of the developed technique. Key-Words: DDFS, direct-digital frequency synthesizer, nonuniform segmentation, piecewise linear approximation, polynomial interpolation.
E&C
CHAPTER 1
INTRODUCTION
A frequency synthesizer is defined as a system that generates one or many frequencies derived from a single time base (frequency reference), in such a way that the ratio of the output to the reference frequency is a rational fraction. The frequency synthesizer output frequency preserves the long-term frequency stability (the accuracy) of the reference and operates as a device whose function is to generate frequencies that are multiples of the reference frequency (multiples by a single or many numbers). These multiples may be whole or fractions, but since only linear operations are used (in the frequency domain), these numbers can only be rational.
Three main, conventional techniques are being used currently for sine-wave synthesizers and are common throughout the industry. The most common and most popular technique uses the phase - locked loop synthesis. PLL synthesizers can be found in the most sophisticated radar systems or the most demanding satellite communications terminals as well as in car radios and stereo systems for home entertainment. The PLL is a feedback mechanism locking its output frequency to a reference. PLL synthesizers gained popularity for their simplicity and economics.
Another synthesizer technique is known as direct analog (DA) frequency synthesis. In this technique, a group of reference frequencies is derived from the main reference; and these frequencies are mixed and filtered, added, subtracted, or divided according to the required output. However, there are no feedback mechanisms in the basic technique. The DA frequency synthesis technique offers excellent spectral purity, especially close to the carrier, and excellent switching speed, which is a critical parameter in many designs and determines how fast the synthesizer can hop from one frequency to another.
The DA technique is usually much more complicated than PLL to execute and is therefore more expensive. DA synthesizers found applications in medical imaging and spectrometers, fast-switching anti-jam communications and radar, electronic warfare
E&C
Direct Digital Frequency Synthesizer (EW) simulation, automatic test equipment (ATE), radar cross-section (RCS) measurement, and such uses where the advantages of the DA technique are a must at a premium cost.
The third technique, is direct digital synthesis (DDS), which is a digital signal processing (DSP) discipline and uses digital circuitry and techniques to create, manipulate, and modulate a signal, digitally, and eventually convert the digital signal to its analog form by using a digital-to-analog converter (DAC).
Although the direct digital synthesizer [sometimes referred to as numerically controlled oscillator (NCO)] was invented almost 30 years ago it started to attract attention only in the last 10 to 12 years. Due to the enormous evolution of digital technology and its tools, the technique evolved remarkably into an economical, high-performance tool and is now a major frequency synthesis method used by almost all synthesizer designers from instrument makers to applications like satellite communications, radar, medical imaging, and cellular telephony and amateur radios (most of which are anything but amateur). Direct digital synthesizers offer fast switching speed, high resolution (the step size of the synthesizer), small size and low power, good economics, and the reliability and producibility of digital designs. In addition, since the signal is manipulated digitally, it is easy to modulate and achieve accuracies not attained by analog techniques and to conveniently interface with the computing a chines that usually control the synthesizer. A direct digital frequency synthesizer (DDFS) uses digital signal processing to generate frequency and phase tunable output signals. The generated output frequency is a division of the reference clock frequency. The division factor is set in a binary tuning word. The DDFS has the advantages of fast frequency switching, fine frequency resolution, direct digital phase and frequency modulation in the digital domain and low phase noise. DDFS has a variety of applications from instrumentations and measurements to modern digital communication systems. For example, they can be utilized as a clock generator, which produces output frequencies with N the resolution of its phase accumulator. This characteristic is useful for the systems that need multiple clock frequencies with no integer relationship between them and they need to be changed rapidly and frequently. In modern communication systems, DDFS seems to be an alternative to phase-locked loops (PLL).
E&C
Direct Digital Frequency Synthesizer Fast switching speed is becoming more and more important in todays wireless communication systems, such as in spread spectrum communication systems. The limitation of the tuning speed of the PLL comes from the produced delay due to its internal feedback. Aside from these advantages, DDFS is only capable of producing the exact integer division of the reference clock frequency when the FCW is 2 to the power of an integer. However, PLL has the ability to lock its output to the input phase of a reference clock. Moreover, PLL is capable of producing higher output frequencies. In order to take advantages of both PLL and DDFS, some applications use a hybrid frequency synthesizer, combining PLL and DDFS. Moreover, conventional direct digital frequency synthesizers are considered power hungry systems due to the use of ROM look up table in their architecture. Consequently, ROM-Less architectures has been introduced. The first approach in ROM-Less DDFS architecture was to use all thermometer sine-weighted DAC. However, this approach needed a huge number of current cells. Therefore, to decrease the number of current cells segmentation algorithm for nonlinear DAC was proposed. The segmentation of nonlinear DAC is more complicated than the linear ones and this architecture suffers from more complexity. The second approach in ROM-Less DDFS design is to use the triangle to sine wave conversion. This method uses the parabolic approximation, and utilizes the exponential current-voltage relationship of the transistors to implement it electronically. This method shows a moderate precision in triangle to sine wave conversion.
E&C
CHAPTER 2
DDFS PRINCIPLES AND ARCHITECTURES

As it was stated earlier, Direct Digital Frequency Synthesizer, DDFS, uses digital signal processing to generate frequency and phase tunable output signals. In order to change the frequency of the output signal, frequency control word (FCW) or the frequency of the reference clock can be changed. In this chapter the DDFS principles are described through explaining conventional DDFS architecture. Also, the most common DDFS architectures will be presented.
2.1 CONVENTIONAL DDFS

The block diagram of a conventional DDFS is shown in figure 2.1. The DDFS consists of a phase accumulator, a phase to sinusoid amplitude converter (PAC) and a digital to analog converter (DAC) followed by a filter. The phase accumulator consists of a counter and a register. The register restores the frequency control word (FCW), which is the jump size of the counter. With each clock cycle, the over flow of the counter is added to the FCW. The result of this counting is the production of the phase information of the sine wave. The output of the phase accumulator will be fed to PAC, which converts the phase information of the sine wave to amplitude. The discrete-time, discrete-amplitude information of the sine will be converted to analog by passing through a DAC. The final block of the system is an ant-aliasing filter. The functionality of each block is described in more details in the following sections.
Fig 2.1 Block Diagram of Conventional DDFS.
E&C
2.1.1 PHASE ACCUMULATOR

The phase accumulator is basically a counter which has the responsibility of generating the phase information of the sine wave. In order to understand how the frequency is synthesized using a phase accumulator, consider the phase wheel in figure 2.2.
Fig 2.2 Phase Wheel.
E&C
Direct Digital Frequency Synthesizer Each point on the phase wheel is correspondent to an equivalent phase of the sine wave. A complete rotation of the phase wheel with constant speed will generate one complete period of a sine wave. In every clock cycle, the over flow of the counter is added to the FCW which is stored in the phase accumulator register. Consequently, FCW determines how fast the counter travels around the phase wheel. As a result of a higher jump size, the counter completes one rotation around the phase wheel faster, and consequently a higher output frequency will be synthesized. The resolution of the phase accumulator (N) determines how many phase points the phase wheel contains, and consequently it determines the resolution of the synthesized output frequency. For example, if N is taken to be 32, then the FCW of 00000001 will result the counter to overflow after reference clock cycles (a complete rotation) and gives the lowest possible output frequency. The FCW of 01111111 will result the counter to overflow after only two reference clock cycles (a complete rotation). The output of phase accumulator is shown in figure 2.3. The relation between the reference clock frequency, output frequency the FCW and resolution of the phase accumulator is given in equation 2-1.
fout = (Pfclk)/2j fout <= fclk / 2
(2.1)
According to Nyquist theorem, we need at least two samples per cycle in order to reconstruct the sine wave; consequently, the highest output frequency that we can achieve is equal to fclk / 2. The frequency resolution of the synthesizer (f) is found when the FCW is set to one:
f = fclk / 2j
(2.2)
Fig 2.3 Phase Accumulator Output.
E&C
2.1.2 The phase to amplitude converter

After the phase information is generated by the phase accumulator, it will be fed to the phase to amplitude converter, which is a ROM look up table in the conventional DDFS. The look up table contains the amplitude information correspondent with each of the phase points of the phase wheel. In order to avoid a very large look up table, it is common to use only a fraction of the most significant bits of the phase accumulator information In order to produce a sine wave. In this case we say that the DDFS is truncated from j bits to k bits, for example from 32 bits to 12 bits. The truncation results in spurs in the output spectrum of the DDFS, which will be discussed in the next chapter. However, 12 bits still results in a large look up table. A large look up table decreases the speed of the synthesizer and increase the power consumption and die area, moreover a high resolution DAC will be needed to design. Therefore, a tremendous work has been done to reduce the size of the look up table. A very basic one is to use the quarter wave symmetry of the sine wave. The block diagram of this method is shown in the figure 2.3. In this case only the amplitude information of the 0 to /2 of the sine wave is stored in the ROM, and the two most significant bits of the phase accumulator output are used to distinguish the quarter of the sine wave. The most significant bit illustrates the sign of the sine wave amplitude and the second most significant bit is used to determine whether the amplitude is increasing or decreasing. The output of the phase to amplitude converter is shown in figure 2.4.
Other ROM compression techniques include the Sunderland architecture, Nicholas architecture, polynomial approximation and CORDIC algorithm. In the Sunderland architecture the large look up table is divided in to two smaller memories. The Nicholas architecture has improved the Sunderland architecture and hence has achieved a higher ROM compression. In the Polynomial approximations, the coefficient of the polynomial is stored in the ROM. In this method the interval of [0, is divided in smaller divisions and the sine/cosine is produced in for each of them. The CORDIC algorithm has its advantage over ROM when the needed accuracy is more than 9 bits. Using this algorithm the needed hardware is not growing exponentially when the output word size is increasing.
E&C
Fig 2.4 Phase to Amplitude Converter.
2.1.3 Exploitation of sine function symmetry:

A well-known technique is to store only / 2 radians of sine information and to generate the sine look-up table samples for the full range of 2 by exploiting the quarter wave symmetry of the sine function, as mentioned earlier. The decrease in the look-up table capacity is paid for by the additional logic necessary to generate the complements of the accumulator and the look-up table output. The details of this method are shown in figure 2.5. The two most significant bits are used to decode the quadrant, while the remaining k-2 bits are used to address a one quadrant sine look-up table. The most significant bit determines whether the amplitude is increasing or decreasing. The accumulator output is used as is for the first and the third quadrants. The bits must be complemented so that the slope of the saw-tooth is inverted for the second and fourth quadrant. As shown in figure 2.5, the sampled waveform at the output of the look-up table is a full wave rectified version of the desired sine wave. The final output sine wave is then generated by multiplying the full wave rectified version by -1, when the phase is between and 2. In most practical DFS digital implementations, the numbers are represented in 2s complement format. Therefore 2s complement must be used to invert the phase and
E&C 9
Direct Digital Frequency Synthesizer multiply the output of the look-up table by -1. However, it can be shown that if a LSB offset is introduced into a number that is to be complemented, then a 1s complement may be used in the place of the 2s complement without introducing error. This provides savings in hardware since a 1s complement may be implemented as a set of simple Exclusive-OR gates. This LSB offset is provided by choosing look-up table samples such that there is a LSB offset in both the phase and the amplitude of the samples, as shown in figure 2.6. In figure 2.6, the phase offset has been used to reduce the address bits by two. If there is no phase offset, 0 and / 2 have the same phase address, and one more address bit is needed to distinguish between these two values.
Fig 2.5 Logic to exploit quarter wave symmetry of sine wave.
Fig 2.6 Phase addresses with LSB phase offset.
E&C
10
2.1.4 The Digital to Analog Converter

As it is shown in the figure 2.1, the discrete-time, discrete-amplitude information of the sine wave is fed to a digital to analog converter to be converted to a continuous-amplitude, continuous-time sine- wave. The current steering DACs are the best choice for high speed applications because of their fast switching speed. They can be implemented in binary weighted, thermometer coded and segmented architectures. The segmented architecture combines the binary weighted and thermometer coded architectures to take advantage of the benefits of both architectures. It uses thermometer coded for its most significant bits (MSB) and binary weighted for its least significant (LSB) bits. The binary weighted architecture has the advantage of small area and low power consumption. However, it suffers from differential nonlinearity (DNL) and the presence of glitches, degrades its dynamic performance. On the other hand, thermometer coded architecture has more complexity and higher power consumption, but it has improved DNL, low glitches and small switching errors. In this architecture, all the current sources are equal. The digital input code is first fed to a thermometer decoder, and the thermometer code turns on the switches accordingly. The segmented architecture uses the thermometer coded for its most significant bits, which are more responsible for the dynamic performance, and binary weighted for its least significant bits. A dummy decoder should be used for the binary weighted part to compensate for the delay of the thermometer decoder of the thermometer decoded part. It has to be noted that in the DDFS the dynamic performance of the DAC plays a significant role in the spectral purity of the output spectrum.
2.1.5 Anti-aliasing Filter

As it will be discussed in more details in the next chapter, the DDFS is a sampling system. Therefore, there will be images at the frequencies of of the output
spectrum, with fo the output frequency and fclk the sampling clock. As the result of the zero order hold functionality of the DAC, the amplitude of the images are weighted by the function. For most applications, these images are undesirable. In order to remove these images, a filter by an inverse function called anti-aliasing filter is used at
the end of the system. Ideally, this filter should have unity response over the Nyquist bandwidth and zero beyond that. However, designing such a filter is not practical; consequently, some percentage of available bandwidth will be unusable. Therefore, the
E&C
11
Direct Digital Frequency Synthesizer synthesized output frequency of DDFS is usually limited to less than 3/8 of sampling frequency. Figure 2-8(a) shows the spectrum of the DDFS, taking in to account only the image replicas. Figure 2-8 (b) and (c) show the effect of the ideal and non-ideal filter on the output spectrum of the synthesizer. The design of this filter is beyond the scope of this project.
Fig 2.7 Frequency response of DDFS (a) Ideal anti-aliasing filter (b), Realistic anti-aliasing filter (c)
E&C
12
2.2 ROM-Less Direct Digital Synthesizers

As it was stated earlier, the ROM look up table is the speed, power and area bottleneck of direct digital synthesizers. Although a tremendous work has been done to compress the ROM look up table, direct digital synthesizers using this method still have high power consumption and limitations in higher frequency operations. Consequently, ROM-Less architectures has been introduced. The two most common ones are described briefly in the following section.
2.2.1 Direct digital synthesizer using a sine weighted DAC

In order to reduce the power consumption of direct digital synthesizers, ROM-Less architectures based on sine weighted DACs has been proposed. The block diagram of DDFS using a sine weighted DAC is shown in the figure 2-5. In this architecture the sine/cosine mapping and the digital to analog conversion are performed in a same block, called sine weighted DAC. The design challenges of the sine weighted DAC is mostly the same with the linear DAC. The main difference between the sine weighted DAC and linear DAC is that in the linear DAC the current sources are identical with each other or they are a power of two weighted. However, in the sine weighted DAC the current sources are weighted according the amplitude of the sine wave. In this architecture, for each phase of the sine wave the sine weighted DAC switches the corresponding amount of current to the output. The most two significant bits are used to exploit the quarter wave symmetry of the sine wave. Initially, these architectures used all thermometer sine weighted DACs. In order to reduce the number of DAC cells, segmentation techniques were proposed. Segmentation techniques for nonlinear DACs are more complicated than for linear ones, and these architectures suffer from complexity when the resolution is high.
E&C
13
Fig 2.8 DDFS Block Diagram using sine weighted DAC
2.2.2 Direct digital synthesizer using triangle to sine wave converter

The block diagram of a DDFS using triangle to sine wave converter is shown in the figure 2-6. This architecture uses the most significant bit to exploit the half wave symmetry of the sine wave; consequently, it decreases the truncation error. The output of the complementor will then fed to a linear DAC. The linear DAC produces a triangle wave which contains the analog phase information of the sine wave. The triangle wave is then converted to a sine wave using an analog sine-mapping methodology. This methodology uses the parabolic approximation.
Fig 2.9 DDFS Block Diagram using triangle to sine wave converter
E&C
14
CHAPTER 3
NOISE ANSLYSIS OF DDFS OUTPUT SPECTRUM

The direct digital frequency synthesizer has four sources of spurs, which is shown in the figure 3-1. These error sources include the truncation error of the phase accumulator, the phase to amplitude conversion error, the errors due to the nonlinearity of the DAC and also the phase noise. In this chapter these error sources and their effect on the output spectrum of the DDFS are discussed.
Fig 3.1 DDFS Spur Sources
3.1 Spurious related to the phase truncation error

As it was stated earlier, in order to have fine frequency resolution we would like to increase the resolution of the phase accumulator. However, this would result in large circuits that are needed to convert the phase data to amplitude data. Therefore, the output of the phase accumulator is usually truncated from J bits in to K bits. This truncation will result in a phase error between the generated phase by the accumulator, and the phase that is used by the PAC for amplitude generation; consequently, there will be an error in the generated amplitude. This error is periodic in the time domain and hence shows itself as spurs in the frequency domain. The periodic nature of the error is due to the fact that after sufficient
E&C
15
Direct Digital Frequency Synthesizer rotation of the phase wheel the accumulator phase and the truncated phase will coincide and there will be no phase error. The pattern will continue as the phase accumulator continues to count. However, certain frequency control words result in the maximum level of the phase truncation spurs while some result in no error. The control words that yield the maximum spurs level should satisfy the following equation 3.1: GCD (FCW,2J-K) = 2(J-K-1)
(3.1)
Where, GCD denotes the greatest common divisor between the two variables in the parentheses. Hence, any control word with 1 in the bit position of 2(J-K-1) and 0 in all other least significant bit positions will result in the maximum truncation spurs level. Moreover, the control word that yield to no truncation error should satisfy the following equation 3.2: GCD (FCW,2J-K) = 2(J-K)
(3.2)
Hence, any control word with 1 in the bit position of 2(J-K) and 0 in all other least significant bit positions will result in no phase truncation spurs. The generated spurs due to the phase truncation are the most significant spurs, if we consider the DAC ideal. They will be mixed by the DDFS output frequency, and will generate spurs at multiples of the output frequency, which is calculated by the following equation:
fspurs = fclk . [GCD (FCW,2J-K)]/2(J-K)
(3.3)
3.2 Spurious related to the DACs finite resolution

The finite resolution of the DAC and consequently the finite number of quantization levels of the DAC will result in an error, called the quantization error. The quantization error is basically the difference between the amplitude of the reconstructed sine wave and the ideal sine wave, which is due to the limited resolution of the. This error will show itself as spurs in the output spectrum of DDFS. The quantization error can be decreased by increasing the resolution of the DAC. The relationship between the resolution of the DAC and the amount of quantization distortion can be quantified with the following equation:
SQR = 1.76+6.02P
E&C
(3.4)
16
Direct Digital Frequency Synthesizer Where, P is the number of bits of the DAC and SQR is the ratio of the signal power to quantization noise power. It should be noted that this equation does not provide any information about the total SFDR of the system, and only considers the spurs due to the quantization error.
3.3 Spurious related to the nonlinearities of the DAC

The most dominant spurs in the output spectrum of the DDFS is the spurs related to the nonlinearities of the DAC. Both static and dynamic nonlinearities will be discussed in the following section; however, in high sampling rates circuits the dynamic nonlinearities play the significant role and being statically linear is the prerequisite for the DAC to have a good dynamic linearity.
3.3.1 Static performance

The static specifications of a digital to analog converter include offset error, gain error, integral nonlinearity (INL) and differential nonlinearity (DNL). These errors will result a nonlinear relation between the actual output level produced by the DAC and the ideal output level that the designer expects; consequently, there will be harmonic distortions at the output spectrum of the digital to analog converter. Figure 3.2 shows the ideal and actual transfer functions of a three bit DAC, together with the correspondent static nonlinearities.
Fig 3.2 Transfer Characteristics of a DAC.
E&C
17
Direct Digital Frequency Synthesizer Offset error: offset error is the shift in the transfer function of the DAC on the vertical axis, and it shows that for an input value of zero, the DAC will output an analog value, not equal to zero.
Gain error: In the transfer function of the DAC, the difference between the actual slope and the ideal slop is defined as the gain error. The gain error is not of a big concern when a single converter is being used, because rather than the absolute accuracy, the relative accuracy is of concern.
Monotonicity: The monotonicity of a digital to analog converter is its ability to decrease or increase in the same direction of its input signal.
Integral nonlinearity (INL) and differential nonlinearity (DNL): If we consider a line that passes through the end points of the transfer function of the DAC, the integral nonlinearity (INL) would be the maximum deviation between that line and the actual analog output of the DAC. The differential nonlinearity (DNL) is the difference between the actual step size and the ideal one least significant bit step size in the transfer function of the DAC. These errors are shown in the figure 3-2.
3.3.2 Dynamic performance

The dynamic errors of the digital to analog converter include glitches, settling time and feed through effects. These errors are shown in the figure 3-4. Dynamic errors have a significant impact on the performance of the DAC and they even become more critical for higher output frequencies and sampling rates. These errors are presented in the following section.
E&C
18
Fig 3.3 DACs Full Scale Transition.
Glitches: Glitches happen as a result of an unmatched switching time between different bits, which can be due to skew between bits in the digital part or the timing mismatch in the switches of the DAC. The result is a signal dependant error from the inputs to the output of the DAC during the code transitions. For example, consider the case that the input code is changing from 0111 to 1000. If the switching time of all the current cells do not be synchronized, it is possible that we get the analog converted of 111 for a very short period in the output; consequently, a glitch will be occurred in the output. This phenomenon is much severe in high frequencies. Careful layout and using thermometer decoding can be used to degrade this effect.
Settling time: is defined as the time which is needed for the analog output to settle between the accepted error band of its final value and is due to the parasitic capacitances of the circuit. The settling time should be kept as small as possible to have a low distortion on the analog output signal.
Feed through effects: feed through effects have two sources in a DAC cells. The first one is the feed through of the digital signal through or of the switch transistors, which actually results in distortion in the Nyquist bandwidth of the output spectrum, since its a code
E&C
19
Direct Digital Frequency Synthesizer dependent error. This error can be minimized by a careful layout and switches sizing. The second one is the feed through of the clock to the analog output, which also can be reduced by minimizing the size of the switches and hence reducing the capacitive coupling of the switches to the output. All the dynamic nonlinearities associated with the switches can be addressed by using return to zero (RTZ) technique, which can be implemented both with analog or digital solutions. In analog return to zero technique the output of the current cells is forced to zero when the clock is low and their current is switched to the output only when the clock is high; consequently, the switching transients do not appear in the DACs output. As it was stated earlier, finite output impedance of the DAC will also result in dynamic nonlinearities.
3.3 The phase noise of the DDFS

The dominant contributor to the DDFS phase noise is the phase noise of the reference clock. In fact, because DDFS is a divider of the sampling clock, the purity of its output spectrum is directly affected by the purity of its reference clock. However, DDFS has a great advantage over PLL regarding to its phase noise. This is because PLL multiplies the phase noise of the reference clock in its feedback loop, but DDFS is a feed forward system, which its output is a fractional division of the reference clock; consequently, the phase noise which presents in the output spectrum of DDFS decreases by 20 log (N), where N is the division ratio. Moreover, as DDFS is a sampling system and the time interval between the samples are important, and the jitter of the reference clock will have an important role on the output spectral purity.
E&C
20
CHAPTER 4
DESIGNED DIRECT DIGITAL FREQUENCY SYNTHESIZER

4.1 Concept of the architecture used
Instead of a ROM LUT, a hardware-optimized phase-to-sine amplitude converter approximates the first quadrant of the sine function with eight equal length piecewise linear segments. The main goal is to maintain low system complexity and reduce power consumption and chip area requirements. The second aim is to achieve a specified spectral purity, which is defined as the ratio of the power in the desired frequency to the power in the greatest harmonic, across the synthesizers tuning bandwidth. Spectral purity is an essential design parameter for synthesizer used in communication systems, ensuring that undesired in-band signals remain below a given threshold and are not detected.
In order to achieve the first goal, we approximate a sinusoid as a series of eight equallength piecewise continuous linear segments si, where
Si(x) = mi * (x- i/8) + yi , i [0 , 7]
(4.1)
is the slope of each segment and is carefully selected to eliminate the requirement for multiplication by representing each one as a sum of at the most two powers of two. This is well known and often used technique. We also restrict the precision of slope representation, i.e., the difference between the smaller and the largest powers of two used, in effect putting an upper bound on the adders width. Equal length segments are selected to reduce the control system circuitry costs. In order to achieve a desired spectral purity, different sets of mi and yi coefficients are evaluated and the best one meeting the requirements is selected.
The first important feature of our architecture is that we constrain the quantization of the segment slopes such that they are represented with at most two non-zero binary digits. We exploit the well-known principle that multiplication by a factor of two can be
E&C 21
Direct Digital Frequency Synthesizer accomplished with a trivial bit shift, and that multiplication by a factor equal to a sum of two powers of two can be accomplished with at most two trivial bit shifts and one addition. Consequently, implementing the multiplication in equation (4.1) requires at most one addition.
The second important feature of our architecture is that we limit the dynamic range of each slope mi so that each can be expressed with four bits. This implies 16 possibilities, however we use only a subset of 12. We discard those slopes with more than two nonzero digits but accept -1 as a valid digit. We scale first quadrant angles from the interval [0, p/2] to the interval [0, 1], in order to represent them as a binary fraction. Hence, the first derivative of the sine function in the first quadrant is scaled from the range [0,1] to [0, p/2] [0, 1.57]. Consequently, we select segment slopes in the following set:
{1.5, 1.25, 1.125, 1, .875, .75, .625, .5, .375, .25, .125, 0}.
As mentioned previously, common wisdom in designing a DDF Synthesizer says that one should minimize the amplitude error on sinusoid amplitudes calculated for any phase angle. While this may be an important performance parameter for a sine function block, it is not necessarily so for a DDF Synthesizer. Spectral purity, which is defined as the ratio of the power in the desired frequency to the power in the greatest harmonic across the synthesizers tuning bandwidth, is much more important. Spectral purity is an essential design parameter for synthesizers in communications systems, ensuring that undesired inband signals remain below a given threshold and are not detected.
In order to achieve a desired spectral purity, we evaluate different sets of eight pairs of mi and yi coefficients, and select the best one meeting our requirements. We solve this optimization problem with a Genetic Algorithm, with the fitness function equal to the spectral purity. All calculations are done taking finite bitwidth effects into account.
Equation (4.2) below gives the slopes and y approximations that we have used for this architecture. They meet the requirement of 60 dBc spectral purity. Figure 4.1 shows the corresponding output for angles in the first quadrant.
E&C
22
(4.2)
Fig 4.1 First Quadrant Sine Approximation.
E&C
23
Direct Digital Frequency Synthesizer In Figure 4.1, the 8 segments are noticeable, as are the amplitude quantization effects for each angle. Discontinuities at quadrant transitions may also be observed. The maximum amplitude is equal to 123/128 or 0.9609375. Taking the Discrete Fourier Transform of a full period of data reveals that the amplitude of the fundamental is approximately 123.1/128. This reduction from a maximum of 127/128 in full-scale output is inconsequential from a system perspective.
4.2 SYSTEM ARCHITECTURE

The system architecture is shown in Figure 4.2. The phase accumulators 16 bits are truncated to 12. This limits spurs due to phase truncation to approximately -72 dBc. The two MSBs are used for quadrant symmetry. The first MSB determines the sign of the output data. It controls a format converter block which modifies the sign and magnitude format to the twos complement format required by the DAC. The second MSB controls a 1s complement block, which inverts the remaining phase accumulator bits for angles in quadrants 2 and 4. The consequence is that the ramp output from the phase accumulator is converted to a triangular wave of equal frequency and twice the amplitude.
The next three MSBs identify one of eight linear segments, and thus they control the multiplexers that implement equation (4.2), which is defined in 8 parts. The remaining 7 bits identify different sub-angles, or positions along any of the 8 segments. In equations (4.1) and (4.2), these 7 bits are equal to the quantity (x - xi), so this operation does not require any processing.
The two upper multiplexers select shifted versions of the 7 least significant phase bits, passing them to the three-operand adder according to the corresponding segment. In the figure, the notation {>>n} signifies a right shift by n bits, or division by 2n. The addition of two shifted versions of an angle x realizes the multiplication operation of an angle x by a slope mi in equation (4.2).The bottom multiplexer selects one of eight initial approximations and also passes it to the three operand adder.
E&C
24
Direct Digital Frequency Synthesizer The output from the multiplexers is shown to be 13 bits wide, in order to properly align the three terms to be added. In actual fact, the first three bits of the two upper multiplexers are 0, as are the last three bits of the lower multiplexer.
The three-operand adder adds the multiplexer outputs together and rounds the result to 7 bits. The rounding operation is accomplished by adding the 8th bit to the truncated 7-bit sum.
This architecture is significantly less complex than all those listed in section 2 for a similar output spectral purity performance. It does not include a ROM. No multipliers nor squaring circuits are required. Equal-length segments are used to simplify control circuitry. Only 3 integers need to be added, and the multiplexers shown in Figure 4.2 can be optimized by combining similar inputs, and be implemented with combinational logic.
Fig 4.2 Proposed Architecture.
E&C
25
4.3 IMPLEMENTATION DETAILS

The system was described in VHDL with less than 200 lines of code. During placement and routing with automated tools, a clock constraint of 125 MHz was easily met without having to add pipelining registers. Pipelining would increase this maximum clock rate, but at the expense of a longer latency when changing the synthesizers output frequency. Power consumption is estimated at under 10 mW for a 100 MHz clock, or 0.1 mW/MHz.
The Frequency Control Word is 16 bits wide, yielding a frequency resolution of approximately 1526 Hz for a 100 MHz reference clock. The 8-bit wide output data is in twos complement format, compatible with most commercial DACs. As stated above, this design is severely IO bound. This is a direct consequence of the tremendous reduction in complexity when compared to other previously reported designs for similar spectral purity. Due to limited allocation of silicon area, it was decided not to increase the phase accumulator width to 32 bits, as is common. This would have added 16 pins to the chip and approximately 300 mm to each side of the die. The phase control word input could also have been serialized, but that would have increased the tuning latency.
If system frequency resolution requirements called for a 32 bit wide accumulator and the same 125 MHz clock rate, a modest increase in system complexity would follow. This is because several pipelining registers would be required. Alternatively, a more efficient adder configuration would have to be used with a corresponding increase in the number of cells. In any case, the present core is very small, which makes it an ideal building block in a System On a Chip digital receiver.
E&C
26
Fig 4.3 Simulation Results.
E&C
27
CHAPTER 5
CONCLUSION
We have presented a low-power sine-output Direct Digital Frequency Synthesizer (DDFS) realized in 0.18 mm CMOS that achieves 60 dBc spectral purity from DC to the Nyquist frequency. It includes no ROM and no multipliers but requires an external DAC if an analog output is desired. Power consumption is 10 mW for a 100 MHz clock, which is significantly less than figures reported previously. System complexity is greatly reduced by using an efficient linear interpolation scheme to approximate a sinusoid function.
E&C
28
APPENDIX A
INTRODUCTION TO PLATFORMS
What is an FPGA?
Before the advent of programmable logic, custom logic circuits were built at the board level using standard components, or at the gate level in expensive application-specific (custom) integrated circuits. The FPGA is an integrated circuit that contains many (64 to over 10,000) identical logic cells that can be viewed as standard components. Each logic cell can independently take on any one of limited set of personalities. The individual cells are interconnected by a matrix of wires and programmable switches. A user's design is implemented by specifying the simple logic function for each cell and selectively closing the switches in the interconnect matrix. The arrays of logic cells and interconnect form a fabric of basic building blocks for logic circuits. Complex designs are created by
combining these basic blocks to create the desired circuit.
What does a logic cell do?

The logic cell architecture varies between different device families. Generally speaking, each logic cell combines a few binary inputs (typically between 3 and 10) to one or two outputs according to a boolean logic function specified in the user program. In most families, the user also has the option of registering the combinatorial output of the cell, so that clocked logic can be easily implemented. The cell's combinatorial logic may be
physically implemented as a small look-up table memory (LUT) or as a set of multiplexers and gates. LUT devices tend to be a bit more flexible and provide more inputs per cell than multiplexer cells at the expense of propagation delay.
So what does 'Field Programmable' mean?

Field Programmable means that the FPGA's function is defined by a user's program rather than by the manufacturer of the device. A typical integrated circuit performs a particular function defined at the time of manufacture. In contrast, the FPGA's function is defined
E&C
29
Direct Digital Frequency Synthesizer by a program written by someone other than the device manufacturer. Depending on the particular device, the program is either burned in permanently or semi-permanently as part of a board assembly process, or is loaded from an external memory each time the device is powered up. This user programmability gives the user access to complex integrated designs without the high engineering costs associated with application specific integrated circuits.
How are FPGA programs created?

Individually defining the many switch connections and cell logic functions would be a daunting task. Fortunately, this task is handled by special software. The software translates a user's schematic diagrams or textual hardware description language code then places and routes the translated design. Most of the software packages have hooks to allow the user to influence implementation, placement and routing to obtain better performance and utilization of the device. Libraries of more complex function macros (eg. adders) further simplify the design process by providing common circuits that are already optimized for speed or area.
Gates

1987: 9,000 gates, Xilinx 1992: 600,000, Naval Surface Warfare Department Early 2000s: Millions Market size
1985: First commercial FPGA technology invented by Xilinx 1987: $14 million ~1993: >$385 million 2005: $1.9 billion 2010 estimates: $2.75 billion
E&C
30
CPLDs and FPGAs

The primary differences between CPLDs and FPGAs are architectural. A CPLD has a somewhat restrictive structure consisting of one or more programmable sum-of-products logic arrays feeding a relatively small number of clocked registers. The result of this is less flexibility, with the advantage of more predictable timing delays and a higher logicto-interconnect ratio. The FPGA architectures, on the other hand, are dominated by interconnect. This makes them far more flexible (in terms of the range of designs that are practical for implementation within them) but also far more complex to design for. Another notable difference between CPLDs and FPGAs is the presence in most FPGAs of higher-level embedded functions (such as adders and multipliers) and embedded memories, as well as to have logic blocks implements decoders or mathematical functions.
Security considerations
With respect to security, FPGAs have both advantages and disadvantages as compared to ASICs or secure microprocessors. FPGAs' flexibility makes malicious modifications during fabrication a lower risk. For many FPGAs, the loaded design is exposed while it is loaded (typically on every power-on). To address this issue, some FPGAs support bit stream encryption.
Applications
Digital signal processing, radio, aerospace and defence systems, ASIC prototyping, medical imaging, speech recognition, cryptography, bioinformatics, computer hardware emulation, radio astronomy, metal detection and a growing range of other areas. FPGAs especially find applications in any area or algorithm that can make use of the massive parallelism offered by their architecture. One such area is code breaking, in particular brute-force attack, of cryptographic algorithms. FPGAs are increasingly used in conventional high performance computing applications where computational kernels such as FFT or Convolution are performed on the FPGA instead of a microprocessor.
E&C
31
APPENDIX B
FPGA: SPARTAN II
INTRODUCTION
The Spartan-II Field Programmable Gate Array family gives users high performance, abundant logic resources, and a rich feature set, all at an exceptionally low price. The six-member family offers densities ranging from 15000 to 200000 system gates. System performance is supported up to 200 MHz. Features include block RAM (to 56K bits), distributed RAM (to 75264 bits), 16 selectable input-output standards, and four DLLs. Fast predictable interconnect means that successive design iterations continue to meet timing requirements. The Spartan-II family is a superior alternative to mask-programmed ASICs. The FPGA avoids the initial cost, lengthy development cycles, and inherent risk of conventional ASICs. Also, FPGA programmability permits design upgrades in the field with no hardware replacement necessary (impossible with ASICs).
FEATURES
Second generation ASIC replacement technology Densities as high as 5,292 logic cells with up to 200,000 system gates. Streamlined features based on Virtex FPGA architecture. Unlimited reprogrammability. Very low cost. Cost-effective 0.18 micron process.
System level features Select RAM hierarchical memory 16 bits/LUT distributed RAM. Configurable 4K bit block RAM. Fast interfaces to external RAM. Fully PCI compliant. Low-power segmented routing architecture.
32
E&C
Direct Digital Frequency Synthesizer Full readback ability for verification/observability. Dedicated carry logic for high-speed arithmetic. Efficient multiplier support. Cascade chain for wide-input functions. Abundant registers/latches with enable, set, reset. Four dedicated DLLs for advanced clock control. Four primary low-skew global clock distribution nets. IEEE 1149.1 compatible boundary scan logic.
Versatile I/O and packaging Pb-free package options. Low-cost packages available in all densities. Family footprints compatibility in common packages. 16 high-performance interface standards. Hot swap Compact PCI friendly. Zero hold time simplifies system timing
Core logic powered at 2.5V and I/Os powered at 1.5V, 2.5V, or 3.3V. Fully supported by powerful Xilinx ISE development system Fully automatic mapping, placement, and routing.
Table Spartan-II FPGA Family Members
E&C
33
GENERAL OVERVIEW
The Spartan-II family of FPGAs have a regular, flexible, programmable architecture of Configurable Logic Blocks (CLBs), surrounded by a perimeter of programmable Input-Output Blocks (IOBs). There are four Delay-Locked Loops (DLLs), one at each corner of the die. Two columns of block RAM lie on opposite sides of the die, between the CLBs and the IOB columns. These functional elements are interconnected by a powerful hierarchy of versatile routing channels. Spartan-II FPGAs are customized by loading configuration data into internal static memory cells. Unlimited reprogramming cycles are possible with this approach. Stored values in these cells determine logic functions and interconnections implemented in the FPGA. Configuration data can be read from an external serial PROM (master serial mode), or written into the FPGA in slave serial, slave parallel, or Boundary Scan modes. Spartan-II FPGAs are typically used in high-volume applications where the versatility of a fast programmable solution adds benefits. Spartan-II FPGAs are ideal for shortening product development cycles while offering a cost-effective solution for high volume production. Spartan-II FPGAs achieve high-performance, low-cost operation through advanced architecture and semiconductor technology. Spartan-II devices provide system clock rates up to 200 MHz. In addition to the conventional benefits of high-volume programmable logic solutions, Spartan-II FPGAs also offer on-chip synchronous single-port and dual-port RAM (block and distributed form), DLL clock drivers, programmable set and reset on all flip-flops, fast carry logic, and many other features.
E&C
34
Fig 5.1 Basic Spartan-II Family FPGA Block Diagram
SPARTAN-II PRODUCT AVAILABILITY

The below table shows the maximum user I/Os available on the device and the number of user I/Os available for each device/package combination. The four global clock pins are usable as additional user I/Os when not used as a global clock pin. These pins are not included in user I/O counts.
Table Spartan-II FPGA User I/O Chart
E&C
35
ARCHITECTURAL DESCRIPTION
SPARTAN-II FPGA ARRAY
The Spartan-II field-programmable gate array, is composed of five major configurable elements: IOBs provide the interface between the package pins and the internal logic. CLBs provide the functional elements for constructing most logic. Dedicated block RAM memories of 4096 bits each. Clock DLLs for clock-distribution delay compensation and clock domain control. Versatile multi-level interconnect structure.
The CLBs form the central logic structure with easy access to all support and routing structures. The IOBs are located around all the logic and memory elements for easy and quick routing of signals on and off the chip. Values stored in static memory cells control all the configurable logic elements and interconnect resources. These values load into the memory cells on power-up, and can reload if necessary to change the function of the device.
INPUT/OUTPUT BLOCK
The Spartan-II FPGA IOB, features inputs and outputs that support a wide variety of I/O signalling standards. These high-speed inputs and outputs are capable of supporting various state of the art memory and bus interfaces. Table lists several of the standards which are supported along with the required reference, output and termination voltages needed to meet the standards.
E&C
36
Fig Spartan-II FPGA Input-Output Block (IOB) The three IOB registers function either as edge-triggered D-type flip-flops or as level-sensitive latches. Each IOB has a clock signal (CLK) shared by the three registers and independent Clock Enable (CE) signals for each register, this signal can be independently configured as a synchronous Set, a synchronous Reset, an asynchronous Preset, or an asynchronous Clear. A feature not shown in the block diagram, but controlled by the software, is polarity control. The input and output buffers and all of the IOB control signals have independent polarity control.
LOOK-UP TABLES
Spartan-II FPGA function generators are implemented as 4-input look-up tables (LUTs). In addition to operating as a function generator, each LUT can provide a 16 x 1bit synchronous RAM. Furthermore, the two LUTs within a slice can be combined to create a 16x2-bit or 32 x 1-bit synchronous RAM, or a 16 x 1-bit dual port synchronous RAM.The Spartan-II FPGA LUT can also provide a 16-bit shift register that is ideal for capturing high-speed or burst-mode data. This mode can also be used to store data in applications such as Digital Signal Processing.
E&C
37
BOUNDARY SCAN
Spartan-II device support all the mandatory boundary scan instructions specified in the IEEE standard 1149.1. A Test Access Port (TAP) and registers are provided that implement the EXTEST, SAMPLE/PRELOAD, and BYPASS instructions. The TAP also supports two USERCODE instructions and internal scan chains. The TAP uses dedicated package pins that always operate using LVTTL. For TDO to operate using LVTTL, the VCCO for bank 2 must be 3.3V. Otherwise, TDO switches rail-to-rail between ground and VCCO. TDI, TMS, and TCK have a default internal weak pull-up resistor, and TDO has no default resistor. Bitstream options allow setting any of the four TAP pins to have an internal pull-up, pull-down, or neither. Boundary-scan operation is independent of individual IOB configurations, and unaffected by package type. All IOBs, including unbounded ones, are treated as independent 3-state bidirectional pins in a single scan chain. Retention of the bidirectional test capability after configuration facilitates the testing of external interconnections. The public boundary-scan instructions are available prior to configuration, the public instructions remain available together with any USERCODE instructions installed during the configuration. While the SAMPLE and BYPASS instructions are available during configuration, it is recommended that boundary-scan operations not be performed during this transitional period. In addition to the test instructions outlined above, the boundary-scan circuitry can be used to configure the FPGA, and also to read back the configuration data. To facilitate internal scan chains, the User Register provides three outputs (Reset, Update and Shift) that represent the corresponding states in the boundary-scan internal state machine. The table lists the boundary-scan instructions supported in Spartan-II FPGAs. The Internal signals can be captured during EXTEST by connecting them to unbounded or unused IOBs. They may also be connected to the unused outputs of IOBs defined as unidirectional input pins.
E&C
38
Direct Digital Frequency Synthesizer Boundary-Scan Command EXTEST Binary Code [4:0] 00000 Description Enables boundary-scan EXTEST operation Enables boundary-scan SAMPLE operation Access user-defined register 1 Access user-defined register 2 Access the configuration bus for configuration Enables boundary-scan INTEST operation Enables shifting out USER code Enables shifting out of ID code Disables output pins while enabling the Bypass Register Clock the start-up sequence when StartupClk is TCK Clock the start-up sequence when StartupClk is TCK Enables BYPASS Xilinx reserved instructions
SAMPLE
00001
USR1
00010
USR2
00011
CFG_OUT
00100
CFG_IN
00101
INTEST
00111
USRCODE
01000
IDCODE
01001
HIZ
01010
JSTART
01100
BYPASS RESERVED
11111 All other codes
Table Boundary-Scan Instruction set
E&C
39
CONFIGURATION
Configuration is the process by which the Bitstream of a design, as generated by the Xilinx software, is loaded into the internal configuration memory of the FPGA. Spartan-II devices support both serial configuration, using the master/slave serial and JTAG modes, as well as byte-wide configuration employing the Slave Parallel mode.
CONFIGURATION FILE
Spartan-II devices are configured by sequentially loading frames of data that have been concatenated into a configuration file. The table shows how much non-volatile storage space is needed for Spartan-II devices. It is important to note that, while a PROM is commonly used to store configuration data before loading them into the FPGA, it is by no means required. Any of a number of different kinds of under populated non-volatile storage already available either on or off the board (i.e., hard drives, FLASH cards, etc.) can be used.
Device
Configuration File Size (Bits) 197,696 336,768 559,200 781,216 1,040,096 1,335,840
XC2S15 XC2S30 XC2S50 XC2S100 XC2S150 XC2S200
Table Spartan-II Configuration File Size
E&C
40
5.7 MODES
Spartan-II devices support the following four configuration modes:
Slave Serial mode. Master Serial mode. Slave Parallel mode. Boundary-scan mode.
The Configuration mode pins (M2, M1, M0) select among these configuration modes with the option in each case of having the IOB pins either pulled up or left floating prior to the end of configuration. The selection codes are listed in table. Configuration through the boundary-scan port is always available, independent of the mode selection. Selecting the boundary-scan mode simply turns off the other modes. The three mode pins have internal pull-up resistors, and default to a logic High if left unconnected.
Table Configuration Modes
SLAVE SERIAL MODE

In slave serial mode, the FPGAs CCLK pin is driven by an external source, allowing FPGAs to be configured from other logic devices such as microprocessors or in a daisy-chain configuration. A Spartan-II device in slave serial mode should be connected as shown for the third device from the left. Slave Serial mode is selected by a <11x> on the mode pins (M0, M1, M2).
E&C
41
Fig Master-Slave Serial Configuration Circuit Diagram The serial Bitstream must be setup at the DIN input pin a short time before each rising edge of an externally generated CCLK. Multiple FPGAs in Slave Serial mode can be daisy-chained for configuration form a single source. The maximum amount of data that can be sent to the DOUT pin for a serial daisy chain is 220-1 (1,048,575) 32-bit words, or 33,554,400 bits, which is approximately 25 XC2S200 bitstreams. The configuration bitstream of downstream devices is limited to this size. After an FPGA is configured, data for the next device is routed to the DOUT pin changes on the rising edge of CCLk. Configuration must be delayed until INIT pins of all daisy-chained FPGAs are High.
E&C
42
Fig Slave Serial Mode Timing
5.8 PIN TYPES

Most pins on a Spartan-II FPGA are general-purpose, user-defined Input-Output pins. There are, however different functional types of pins on Spartan-II FPGA packages.
Table Spartan-II Family Package Options
E&C
43
Fig XC2S100TQ144 DEVICE
Fig DIP Switch & DAC Interface
E&C
44
XC2S100TQ144 DEVICE PINOUTS

XC2S100 Pad Name Function GND TMS I/O I/O I/O, VREF I/O I/O, VREF I/O GND I/O I/O I/O, VREF I/O I/O I/O, IRDY GND VCCO VCCO I/O, TRDY VCCINT I/O Bank 7 7 7 7 7 7 7 7 7 7 7 7 7 6 6 6 TQI44 P143 P142 P141 P140 P139 P138 P137 P136 P135 P134 P133 P132 P131 P130 P129 P128 P127 P127 P126 P125 P124 XC2S100 Pad Name Function VCCO M2 I/O I/O, VREF I/O I/O, VREF I/O GND VCCINT I/O I/O I/O, VREF I/O VCCINT I, GCK1 VCCO VCCO GND I, GCK0 I/O I/O Bank 5 5 5 5 5 5 5 5 5 5 5 5 4 4 4 4 TQ144 P107 P106 P103 P102 P101 P100 P99 P98 P97 P96 P95 P94 P93 P92 P91 P90 P90 P89 P88 P87 P86
E&C
45
Direct Digital Frequency Synthesizer I/O I/O,VREF I/O I/O GND I/O I/O, VREF I/O I/O, VREF I/O I/O I/O M1 GND M0 VCCO 6 6 6 6 6 6 6 6 6 6 6 6 6 P123 P122 P121 P120 P119 P118 P117 P116 P115 P114 P113 P112 P111 P110 P109 P108 I/O, VREF I/O I/O VCCINT GND I/O I/O, VREF I/O I/O, VREF I/O I/O I/O GND DONE VCCO VCCO 4 4 4 4 4 4 4 4 4 4 3 4 3 P85 P84 P83 P82 P81 P80 P79 P78 P77 P76 P75 P74 P73 P72 P71 P70
XC2S100 Pad Name Function PROGRAM I/O (INIT) I/O (D7) I/O I/O, VREF I/O Bank 3 3 3 3 3 TQI44 P69 P68 P67 P66 P65 P64
XC2S100 Pad Name Function TDO GND TDI I/O (CS) I/O(WRITE) I/O Bank 2 1 1 1 TQ144 P34 P33 P32 P31 P30 P29
E&C
46
Direct Digital Frequency Synthesizer I/O, VREF I/O (D6) GND I/O (D5) I/O I/O, VREF I/O (D4) I/O VCCINT I/O, TRDY VCCO VCCO GND I/O, IRDY I/O I/O (D3) I/O, VREF I/O I/O (D2) GND I/O (D1) I/O, VREF I/O I/O, VREF I/O 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 P63 P62 P61 P60 P59 P58 P57 P56 P55 P54 P53 P53 P52 P51 P50 P49 P48 P47 P46 P45 P44 P43 P42 P41 P40 I/O, VREF I/O, VREF I/O GND VCCINT I/O I/O I/O, VREF I/O I/O I, GCK2 GND VCCO VCCO I, GCK3 VCCINT I/O I/O, VREF I/O I/O VCCINT GND I/O I/O, VREF I/O, VREF 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 P28 P27 P26 P25 P24 P23 P22 P21 P20 P19 P18 P17 P16 P16 P15 P14 P13 P12 P11 P10 P9 P8 P7 P6 P5
E&C
47
Direct Digital Frequency Synthesizer I/O (DIN, D0) I/O (DOUT, BUSY) CCLK VCCO VCCO 2 P39 I/O 0 P4
P38
I/O
P3
2 2 1
P37 P36 P35
TCK VCCO VCCO
0 7
P2 P1 P144
Pins P104, P105 are not connected pins.
E&C
48
APPENDIX C
CODES
Phase Accumulator:
library IEEE; use IEEE.STD_LOGIC_1164.ALL; use ieee.std_logic_arith.all; use ieee.std_logic_unsigned.all; entity phase_accumulator is Port ( clk : in STD_LOGIC; rst : in STD_LOGIC; freq_offset : in STD_LOGIC_VECTOR (5 downto 0); dout : out STD_LOGIC_VECTOR (19 downto 0); comp1: out std_logic); end phase_accumulator; architecture Behavioral of phase_accumulator is signal temp : std_logic_vector(19 downto 0); begin process(clk,rst,freq_offset) begin
E&C
49
Direct Digital Frequency Synthesizer if(rst = '1') then temp <= (others =>'0'); elsif(clk'event and clk = '1') then temp <= temp + freq_offset; end if; end process; dout <= temp; comp1 <= temp(18); end Behavioral;
Explanation of Code:
Entry: clk, rst, freq_offset
Out: dout
The phase accumulator is nothing but a counter If the rst = 1, output is reset to zero. Else, It counts from minimum to maximum changing its state at every clock cycle
The input is of 6 bits (freq_offset) The input is chosen by the user. It can vary from 000000 to 111111 The given value is taken as the minimum value. Maximum is always 111111
The phase accumulator generates a ramp wave
E&C
50
Complimenter:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use ieee.std_logic_unsigned.all;
use ieee.std_logic_arith.all;
entity complimenter is
Port ( clk : in std_logic;
phase_out : in STD_LOGIC_VECTOR (9 downto 0);
comp : in STD_LOGIC;
comp_out : out STD_LOGIC_VECTOR (9 downto 0));
end complimenter;
architecture Behavioral of complimenter is
begin
process(phase_out,comp,clk)
begin
if(comp = '0') then
E&C
51
Direct Digital Frequency Synthesizer comp_out <= phase_out;
else
for i in 0 to 9 loop
comp_out(i) <= not(phase_out(i));
end loop;
end if;
end process;
end Behavioral;
Entry: clk, phase_out, comp
Out: comp_out
The complimenter as the name says compliments the output of the phase accumulator
Only the selected portion of the output of phase accumulator is complimented This is done based on the second MSB taken as the reference If comp = 1, then the phase accumulator output is complimented. Else, The output is retained as it is As a result we will get a triangular wave shape at the output end of the complimenter
E&C
52
Phase accumulator and Complimenter:

library IEEE;
entity phase1 is
Port ( clk : in STD_LOGIC;
rst : in STD_LOGIC;
freq_offset : in STD_LOGIC_VECTOR (5 downto 0);
comp_out : out STD_LOGIC_VECTOR (9 downto 0);
phase_out : out std_logic_vector(19 downto 0));
end phase1;
architecture Behavioral of phase1 is
component phase_accumulator is
rst : in STD_LOGIC;
dout : out STD_LOGIC_VECTOR (19 downto 0);
E&C
53
Direct Digital Frequency Synthesizer comp1: out std_logic);
end component;
component complimenter is
phase_out : in STD_LOGIC_VECTOR (9 downto 0);
comp : in STD_LOGIC;
comp_out : out STD_LOGIC_VECTOR (9 downto 0));
end component;
signal dout : std_logic_vector(19 downto 0);
signal comp1: std_logic;
signal comp_in : std_logic_vector(9 downto 0);
begin
phase_out <= dout;
comp_in <= dout(17 downto 8);
D1 : phase_accumulator port map(clk,rst,freq_offset,dout,comp1);
D2 : complimenter port map(clk,comp_in,comp1,comp_out);
end Behavioral;
E&C
54
Entry: clk, rst, freq_offset
Out: comp_out, phase_out The codes for phase accumulator and compimenter are combined using port mapping technique The inputs are those of phase accumulator The outputs are of both the blocks
Mux Tree:
library IEEE;
entity mux_tree is
Port ( din : in STD_LOGIC_VECTOR (12 downto 0);
sel : in STD_LOGIC_VECTOR (2 downto 0);
dout1 : out STD_LOGIC_VECTOR (12 downto 0);
E&C
55
Direct Digital Frequency Synthesizer dout3 : out STD_LOGIC_VECTOR (12 downto 0));
end mux_tree;
architecture Behavioral of mux_tree is
component mux is
Port (din1 : in STD_LOGIC_VECTOR (12 downto 0);
din2 : in STD_LOGIC_VECTOR (12 downto 0);
sel : in STD_LOGIC_vector(2 downto 0);
dout : out STD_LOGIC_vector(12 downto 0));
end component;
signal shift1 : std_logic_vector(12 downto 0);
E&C
56
Direct Digital Frequency Synthesizer signal shift2 : std_logic_vector(12 downto 0);
signal shift3 : std_logic_vector(12 downto 0);
signal y0,y1,y2,y3,y4,y5,y6,y7 : std_logic_vector(12 downto 0);
constant zero : std_logic_vector(12 downto 0):="0000000000000";
begin
y0 <= "0000000010000";
-- decimal value 16
y1 <= "0010111111000";
-- decimal value 1528
y2 <= "0110000000000";
y3 <= "1000101000000";
y4 <= "1010111001000";
y5 <= "1100110011000";
y6 <= "1110001101000";
y7 <= "1111001011000";
shift1 <= '0' & din(12 downto 1);
shift2 <= "00" & din(12 downto 2);
shift3 <= "000" & din(12 downto 3);
E&C
57
Direct Digital Frequency Synthesizer Mux1 : mux port map(din,din,din,din,din,shift1,shift1,zero,sel,dout1);
Mux2 : mux port map(shift1,shift1,shift2,shift2,shift3,shift3,zero,zero,sel,dout2);
Mux3 : mux port map(y0,y1,y2,y3,y4,y5,y6,y7,sel,dout3);
end Behavioral;
Entry: din, sel Out: dout1, dout2, dout3 The mux tree contains a combination of three 8:1 Muxs The inputs to mux are all of 12 bits For the first mux the inputs are, din,din,din,din,din,shift1,shift1,zero For the second one, shift1,shift1,shift2,shift2,shift3,shift3,zero,zero For the third one the inputs are the values from the ROM lookup table i.e. they are y0,y1,y2,y3,y4,y5,y6,y7
Component of Mux used in Mux tree:

library IEEE;
E&C
58
Direct Digital Frequency Synthesizer entity mux is
Port (din1 : in STD_LOGIC_VECTOR (12 downto 0);
sel : in STD_LOGIC_vector(2 downto 0);
dout : out STD_LOGIC_vector(12 downto 0));
end mux;
architecture Behavioral of mux is
begin
process(din1,din2,din3,din4,din5,din6,din7,din8,sel)
begin
if(sel = "000") then
E&C
59
Direct Digital Frequency Synthesizer dout <= din1;
elsif(sel = "001") then
dout <= din2;
dout <= din3;
dout <= din4;
dout <= din5;
dout <= din6;
dout <= din7;
else
dout <= din8;
end if;
end process;
E&C
60
Direct Digital Frequency Synthesizer end Behavioral;
Entry: din1, din2, din3, din4, din5, din6, din7, din8, sel
Out: dout The mux used here is a 8:1 mux All the 8 inputs are of 12 bits The output is chosen based on the select input which varies from 000 to 111, which selects the outputs from din1 to din8 respectively The output is also of 12 bits
Summer:
library IEEE;
entity sum_out is
Port ( dout1 : in STD_LOGIC_VECTOR (12 downto 0);
dout2 : in STD_LOGIC_VECTOR (12 downto 0);
E&C
61
Direct Digital Frequency Synthesizer sum : out STD_LOGIC_VECTOR (14 downto 0));
end sum_out;
architecture Behavioral of sum_out is
begin
sum <= "00" & dout1 + dout2 + dout3;
end Behavioral;
Entry: dout1, dout2, dout3
Out: sum The function of the summer is simple It adds the outputs of the different muxs and gives it to the format converter The 3 inputs are of 12 bits The output is of 14 bits The inputs are added normally and in the end 2 zeros are appended to the MSB This is done by using concatenation operation
E&C
62
Format Converter:
library IEEE;
entity format_converter is
sum : in STD_LOGIC_VECTOR (14 downto 0);
sel : in std_logic;
dfs_out : out STD_LOGIC_VECTOR (14 downto 0));
end format_converter;
architecture Behavioral of format_converter is
begin
process(sum,sel,clk)
begin
if(clk'event and clk = '1') then
if(sel = '1') then
dfs_out<= not(sum);
E&C
63
Direct Digital Frequency Synthesizer else
dfs_out <= sum;
end if;
end if;
end process;
end Behavioral;
Entry: clk, sum, sel
Out: dfs_out
The output of the summer is the half sine wave Format converter is nothing but a complimenter which converts half sine wave to full sine wave
The output is decided based on the select input which is the XOR operation between 1st and 2nd MSB of phase accumulator output
If sel = 1, then the output is complimented. Else, The output is retained as it is
E&C
64
Complete Architecture:
library IEEE;
entity dfs_arch is
Port ( clk,rst : in std_logic;
end dfs_arch;
architecture Behavioral of dfs_arch is
component phase1 is
rst : in STD_LOGIC;
comp_out : out STD_LOGIC_VECTOR (9 downto 0);
E&C
65
Direct Digital Frequency Synthesizer phase_out : out std_logic_vector(19 downto 0));
end component;
component mux_tree is
Port ( din : in STD_LOGIC_VECTOR (12 downto 0);
sel : in STD_LOGIC_VECTOR (2 downto 0);
dout3 : out STD_LOGIC_VECTOR (12 downto 0));
end component;
component sum_out is
Port ( dout1 : in STD_LOGIC_VECTOR (12 downto 0);
sum : out STD_LOGIC_VECTOR (14 downto 0));
end component;
component format_converter is
E&C
66
Direct Digital Frequency Synthesizer Port ( clk : in std_logic;
sum : in STD_LOGIC_VECTOR (14 downto 0);
sel : in std_logic;
end component;
signal phase_out : std_logic_vector(9 downto 0);
signal full_phase : std_logic_vector(19 downto 0);
signal mux_in : std_logic_vector(12 downto 0);
signal dout1,dout2,dout3 : std_logic_vector(12 downto 0);
signal sum : std_logic_vector(14 downto 0);
signal sel_format : std_logic;
begin
sel_format <= full_phase(19) ;--xor full_phase(18);
mux_in <= "000" & phase_out(6 downto 0) & "000";
U1 : phase1 port map(clk,rst,freq_offset,phase_out,full_phase);
U2 : mux_tree port map(mux_in,phase_out(9 downto 7),dout1,dout2,dout3);
U3 : sum_out port map(dout1,dout2,dout3,sum);
E&C
67
Direct Digital Frequency Synthesizer U4 : format_converter port map(clk,sum,sel_format,dfs_out);
end Behavioral;
Entry: freq_offset, clk, rst
Out: dfs_out
The RTL schematic is shown in the figure below All the above explained codes are combined into dfs architecture using port mapping technique
The input is that of the phase accumulator. It is 6 bit binary input The output is 15 bit binary bit stream which contains digital values that make up the sine wave when converted to analog using a DAC
E&C
68

Key-Words: DDFS, Direct-Digital Frequency Synthesizer, Nonuniform Segmentation

Uploaded by

Copyright:

Available Formats

Key-Words: DDFS, Direct-Digital Frequency Synthesizer, Nonuniform Segmentation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Key-Words: DDFS, Direct-Digital Frequency Synthesizer, Nonuniform Segmentation

Uploaded by

Copyright:

Available Formats

Direct Digital Frequency Synthesizer

Direct Digital Frequency Synthesizer

Direct Digital Frequency Synthesizer

DDFS PRINCIPLES AND ARCHITECTURES

2.1 CONVENTIONAL DDFS

Fig 2.1 Block Diagram of Conventional DDFS.

Direct Digital Frequency Synthesizer

2.1.1 PHASE ACCUMULATOR

Fig 2.2 Phase Wheel.

fout = (Pfclk)/2j fout <= fclk / 2

Fig 2.3 Phase Accumulator Output.

Direct Digital Frequency Synthesizer

2.1.2 The phase to amplitude converter

Direct Digital Frequency Synthesizer

Fig 2.4 Phase to Amplitude Converter.

2.1.3 Exploitation of sine function symmetry:

Fig 2.5 Logic to exploit quarter wave symmetry of sine wave.

Fig 2.6 Phase addresses with LSB phase offset.

Direct Digital Frequency Synthesizer

2.1.4 The Digital to Analog Converter

2.1.5 Anti-aliasing Filter

Direct Digital Frequency Synthesizer

2.2 ROM-Less Direct Digital Synthesizers

2.2.1 Direct digital synthesizer using a sine weighted DAC

Direct Digital Frequency Synthesizer

Fig 2.8 DDFS Block Diagram using sine weighted DAC

2.2.2 Direct digital synthesizer using triangle to sine wave converter

Direct Digital Frequency Synthesizer

NOISE ANSLYSIS OF DDFS OUTPUT SPECTRUM

Fig 3.1 DDFS Spur Sources

3.1 Spurious related to the phase truncation error

fspurs = fclk . [GCD (FCW,2J-K)]/2(J-K)

3.2 Spurious related to the DACs finite resolution

3.3 Spurious related to the nonlinearities of the DAC

3.3.1 Static performance

Fig 3.2 Transfer Characteristics of a DAC.

3.3.2 Dynamic performance

Direct Digital Frequency Synthesizer

Fig 3.3 DACs Full Scale Transition.

3.3 The phase noise of the DDFS

Direct Digital Frequency Synthesizer

DESIGNED DIRECT DIGITAL FREQUENCY SYNTHESIZER

Si(x) = mi * (x- i/8) + yi , i [0 , 7]

Direct Digital Frequency Synthesizer

Fig 4.1 First Quadrant Sine Approximation.

4.2 SYSTEM ARCHITECTURE

Fig 4.2 Proposed Architecture.

Direct Digital Frequency Synthesizer

4.3 IMPLEMENTATION DETAILS

Direct Digital Frequency Synthesizer

Fig 4.3 Simulation Results.

Direct Digital Frequency Synthesizer

Direct Digital Frequency Synthesizer

combining these basic blocks to create the desired circuit.

What does a logic cell do?

So what does 'Field Programmable' mean?

How are FPGA programs created?

Direct Digital Frequency Synthesizer

CPLDs and FPGAs