Archive for the ‘Audio Theory’ Category

Audio Fundamentals Part Three: Additive Synthesis

February 10th, 2009

Audio Fundamentals – Index
Part One: The Nature of Sound and its Electronic Representation
Part Two: The Digital Representation of Sound
Part Three: Additive Synthesis (this post)

In part two of this series you saw how a waveform could be represented digitally. In this part, you will see how simple waveforms can be combined to create more complex waveforms, and be introduced to some of the waveforms most commonly used in sound synthesis.

Audio waveforms can be either periodic or aperiodic. A periodic waveform consists of the same shape repeated at regular intervals. This type of waveform occurs rarely outside of audio synthesis. An aperiodic waveform has a shape that changes over time. Most acoustic instruments produce aperiodic waveforms.

As well as describing a waveform by its shape, we can also describe it by its harmonic content. Musically useful waveforms have a strong fundamental frequency (this is the frequency that we identify as the pitch of the note being played), but it can also contain other frequencies that are less strong. These are known as overtones and are what gives a particular sound its character, or timbre.

The simplest type of audio waveform is the sine wave.

 

An oscilloscope trace of a sine wave.

An oscilloscope trace of a sine wave.

 

Why is the sine wave the simplest type of audio waveform?

  1. Because it is periodic.
  2. Because its harmonic content consists of just the fundamental frequency. In other words, it has no overtones.

If you were to plot the sine wave as a frequency spectrum, you can see that only the fundamental frequency is present (shown by the yellow line at about 2Khz).

 

The yellow line shows the presence of the fundamental frequency only.

The yellow line shows the presence of the fundamental frequency only.

By comparison, take a look at the waveform and frequency spectrum produced by a phrase played on the clarinet. You can see that the waveform is aperiodic and that the frequency spectrum contains many overtones. These characteristics are what give the clarinet its unique sound.

 

The waveform generated by a clarinet.

The waveform generated by a clarinet.

 

The frequency spectrum generated by a clarinet.

The frequency spectrum generated by a clarinet.

As you might expect, a sine wave on its own sounds pretty boring.

Select this link to hear a sine wave.

However, raw sine waves do have their uses, for example, adding a bit of sub-sonic kick to a bass line. Where sine waves really come into their own though, is when you start to combine them to make new, more complex waveforms.

When two sine waves are combined, their signals are added together. When both sine waves are in the positive or negative parts of their cycle at the same time then the second sine wave serves to amplify the first. We say that the signals are in phase with each other. You can see this in the diagram below. The first waveform is the original sine wave. The second waveform shows what happens to it when it is combined with an identical sine wave. The amplitude of the waveform doubles. This has the effect of making the sound louder.

 

Combining two sine waves of the same frequency maintains the waveform but increases the amplitude.

Combining two sine waves of the same frequency maintains the waveform but increases the amplitude.

If one of the sine waves is in the positive part of its cycle, while the other is in the negative part of its cycle, then the second sine wave serves to attenuate the first. In this case the signals are out of phase. When the signals are out of phase by exactly 180 degrees (i.e. the first sine wave reaches its positive peak as the second sine wave reaches its negative peak) then they will cancel each other out entirely and no sound will be heard. This is known as phase cancellation and is shown in the diagram below.

 

Two sine waves of the same frequency that are out of phase by 180 degrees will cancel each other out.

Two sine waves of the same frequency that are out of phase by 180 degrees will cancel each other out.

Select this link to listen to an example of two identical sine waves being moved out of phase with each other. At the beginning of the sound, only the first sine wave is heard. Then the second sine wave is introduced in phase with the first. You can hear it amplifying the first signal. The sine waves are then moved increasingly out of phase with each other. This results in the sound becoming softer as the two waves begin to cancel each other. Eventually the sound dies away altogether as the sine waves become out of phase by 180 degrees.

In the experiments you have seen so far, both sine waves have had identical frequencies. However, even more complex waveforms can be created by combining sine waves of different frequencies.

A more musically useful experiment than phase shifting sine waves, is to detune one of the sine waves just a little from the other. Because the two waves are no longer synchronised, this will result in the second wave amplifying the first wave for a short period of time and attenuating it for another period of time, as shown in the diagram below.

 

Combining two sine waves at slightly different frequencies will cause a beat.

Combining two sine waves at slightly different frequencies will cause a beat.

This regular amplification and attenuation of the original signal is known as beating. It can be used to obtain a tremolo effect.

Select this link to hear an example of beating sine waves. At first the two sine waves are in sync, then the second sine wave is slightly detuned from the first to introduce a tremolo effect.

When sine waves of different frequencies are combined to create a complex waveform, each sine wave is known as a partial. Adding partials at random frequencies and amplitudes will often result in the sound becoming discordant, or inharmonic.

Select this link to hear an example of a wave that has had random partials introduced.

To make a more musical waveform requires the combination of partials that are whole number multiples of the fundamental frequency. So, the first harmonic is twice the fundamental frequency, the second harmonic is three times the fundamental frequency, the third harmonic is four times the fundamental frequency and so on. These are known as the harmonic series. The table below shows the first 10 harmonics of middle C.

Harmonic Frequency Approximate Pitch
Fundamental 261.626 Hz C4
1st Harmonic 523.252 Hz C5
2nd Harmonic 784.878 Hz G5
3rd Harmonic 1046.504 Hz C6
4th Harmonic 1308.13 Hz E6
5th Harmonic 1569.756 Hz G6
6th Harmonic 1831.382 Hz A#6
7th Harmonic 2093.008 Hz C7
8th Harmonic 2354.634 Hz D7
9th Harmonic 2616.26 Hz E7
10th Harmonic 2877.886 Hz F#7

You can see that, although the frequency of the harmonics increases at a constant rate mathematically, the perceived musical interval between subsequent harmonics gets smaller.

Generally speaking, the more harmonics that are introduced as partials, the more complex the sound is. You can put this knowledge to good use to create four of most commonly used waves in synthesis: the triangle wave, the square wave, and the sawtooth wave.

The triangle wave consists of all the odd-numbered harmonics, with each subsequent harmonic attenuated by the inverse square of its frequency relative to the fundamental frequency.

The diagrams below show a triangle wave and its frequency spectrum.

 

An oscilloscope trace of a triangle wave.

An oscilloscope trace of a triangle wave.

 

The frequency spectrum of a triangle wave.

The frequency spectrum of a triangle wave.

The triangle wave still sounds quite pure, because of the scarcity of overtones, so it is often used as the basis for simulating instruments such as the flute and piccolo.

Select this link for an example of a triangle wave.

The square wave also uses only the odd-numbered harmonics but these are not attenuated as sharply as they are for the triangle wave.

The diagrams below show a square wave and its frequency spectrum.

 

An oscilloscope trace of a square wave.

An oscilloscope trace of a square wave.

 

The frequency spectrum of a square wave.

The frequency spectrum of a square wave.

The square wave has a richer timbre than the triangle wave, but is still quite hollow sounding, so it is often used as the basis for simulating woodwind instruments such as the clarinet and oboe.

Select this link for an example of a square wave.

Finally, the sawtooth wave contains every harmonic.

The diagrams below show a sawtooth wave and its frequency spectrum.

 

An oscilloscope trace of a sawtooth wave.

An oscilloscope trace of a sawtooth wave.

 

The frequency spectrum of a sawtooth wave.

The frequency spectrum of a sawtooth wave.

The sawtooth wave has a very rich timbre and is often used as the basis for simulating string instruments, such as the violin, or brass instruments, such as the trumpet.

Select this link for an example of a sawtooth wave.

In the next part of this series you will look at noise waveforms and a variant of the square wave: the pulse wave.

Tags: , , , ,
Posted in Audio Theory | Comments (28)

Audio Fundamentals Part Two: The Digital Representation of Sound

August 12th, 2008

Audio Fundamentals – Index
Part One: The Nature of Sound and its Electronic Representation
Part Two: The Digital Representation of Sound (this post)
Part Three: Additive Synthesis

In the first part of this tutorial you learned what sound was and saw how a sound could be represented in analogue equipment. In this part of the tutorial you will see how a sound can be represented in a digital medium.

First though, it is necessary to cover some fundamental theory of waveforms so that you have a full understanding of the terminology being used.

Here is a simple waveform.

An oscilloscope trace of a sine wave.

An oscilloscope trace of a sine wave.

The strength of an audio signal is normally measured in decibels relative to a 0db level at which no sound is heard. The 0db reference level is usually represented by a line running horizontally through the waveform. The further the waveform deviates above or below this line, the louder the signal is. The distance between the 0db level and the peak of a waveform is known as the peak amplitude of the waveform. The distance between the peak of a waveform and the corresponding trough is the peak-to-peak amplitude.

A waveform showing how amplitude is measured relative to the 0db level.

A waveform showing how amplitude is measured relative to the 0db level.

The period in which the waveform passes from the 0db level to its peak, through the 0db level to its trough and back to the 0db level again is a single cycle of the waveform. The number of times the waveform cycles every second is the waveform’s frequency. Finally the distance between the same point on two consecutive cycles of the waveform is known as the wavelength.

A waveform showing how wavelength and frequency are measured.

A waveform showing how wavelength and frequency are measured.

The wavelength is inversely proportional to the frequency. So as the wavelength decreases the frequency increases. The oscilloscope trace below shows the wavelength getting shorter as the frequency increases.

As the wavelength decreases, the frequency increases.

As the wavelength decreases, the frequency increases.

Having established those basic principles, it is now possible to look at how the representation of sound in analogue and digital systems differs.

To begin with it is important to understand the difference between analogue and digital systems. An analogue system has a continuously variable state. For example, in an electrical circuit the state of the circuit might vary continuously between -10 and +10 volts. A digital system’s state is defined as a series of discrete numbers. For example, a pocket calculator’s memory might be able to store 1024 numbers between 0 and 65536.

You will recall that when a waveform was represented in an analogue system, the shape of the wave could be reproduced very closely by, for example, having the circuit’s voltage change to match the changes in the waveform. If you want to store a waveform in a digital system then it has to be represented as a series of discrete numbers. This is achieved by a device known as an analogue-to -digital convertor (often abbreviated to ADC or A/D).

This works by sampling the waveform at regular intervals and representing the level of the waveform at each point with a number. Let’s say, for the sake of argument, that a particular digital system has a numerical range of -32767 to 32768. When the waveform has a small non-zero amplitude, this might be represented by +6553 or -6553, while a larger non-zero amplitude might be represented by +26214 or -26214, as the following diagram shows.

In a digital system, the amplitude of the waveform is represented numerically.

In a digital system, the amplitude of the waveform is represented numerically.

The process of taking an analogue waveform and converting it into a series of numbers is called sampling. The frequency with which the ADC works is known as the sampling rate. For example, music intended to be reproduced on a compact disc is sampled at 44.1 Khz (i.e. the ADC samples the waveform 44,100 times every second).

An ADC samples the waveform at regular intervals.

An ADC samples the waveform at regular intervals.

To reproduce a sound, once it has been stored digitally, the series of numbers representing the waveform must be converted back to a continuously variable (analogue) form, for example a variable voltage that can be used to drive the cone in a speaker. This is achieved with another device known as a digital-to-analogue converter (often abbreviated to DAC or D/A).

The DAC takes each number and uses it to set the state of the circuit to an appropriate level. For example, +6553 might be converted to +2 volts, while -26214 might be converted to -8 volts. As you might expect, for the results to be meaningful the DAC should convert the samples back at the same frequency (or sampling rate) that the ADC has used.

A point to note here is that the DAC cannot simply hold the voltage of the circuit at the relevant level until it reads the next sample. If it did so there would be sudden jumps in the voltage between one sample and the next and this would be heard as distortion when the sound was reproduced by the speaker.

 To overcome this, the DAC uses a process known as interpolation. This is the process whereby the DAC applies a mathematical algorithm to create a smooth curve between data points. This will result in the voltage in the circuit transitioning smoothly and continuously between one data point and the next, which results in a distortion free reproduction of the original sound.

The DAC uses an interpolation algorithm to smooth the signal.

The DAC uses an interpolation algorithm to smooth the signal.

The quality of the interpolation algorithm will affect the quality of reproduction. However, the quality of reproduction is also affected by two other factors:

  1. The sampling rate (how frequently the waveform is sampled);
  2. The resolution (how many discrete values can be used to represent the amplitude of the waveform).

Consider these in turn.

Sampling Rate

Generally speaking the faster you are able to sample a waveform, the better the quality of the reproduction will be. As the sampling rate is lowered, less of the detail of the original waveform is able to be captured, so when the data is converted back, the resulting waveform will be cruder than the original waveform. What happens is that the higher frequency components of the signal become degraded, which makes the sound less bright. Furthermore, when the signal is converted back additional high frequencies not present in the original waveform are introduced, thus adding unwanted noise to the signal. This is known as aliasing.

The waveform becomes deformed when the sampling rate is too low.

The waveform becomes deformed when the sampling rate is too low.

Select this link to listen to the original sound sampled at 44.1Khz.

Now listen to the sound again with increasingly lower sample rates. You will be able to hear the quality of the signal degrade as more noise is introduced and the original high frequency sounds become attenuated.

Select this link to hear the sound sampled at 22.05Khz.

Select this link to hear the sound sampled at 11.025Khz.

Select this link to hear the sound sampled at 8Khz.

The engineer Harry Nyquist came up with a theorem for avoiding aliasing. In a nutshell the theorem requires that the sampling frequency is at least twice that of the highest frequency in the waveform to be sampled. This is known as the Nyquist Rate. You can see that this results in at least two data points being taken for each cycle in the waveform, which is sufficient to reproduce all the frequencies accurately when the waveform is recreated.

The Nyquist rate is at least twice the highest frequency to be sampled.

The Nyquist rate is at least twice the highest frequency to be sampled.

If you consider that the range of human hearing is approximately 20Hz to 20Khz you can understand why the sampling rate selected for compact disc technology was 44.1Khz. This rate allows all the frequencies normally heard by humans to be accurately reproduced.

Resolution

Resolution refers to the number of discrete values that can be used to represent the waveform’s amplitude range.  As most digital systems are binary, this is usually measured in binary digits or bits. The resolution of a compact disc, for example, is 16 bits. This allows for 65536 discrete values to represent the entire amplitude range of the signal. Generally speaking, the greater the resolution the more faithfully the dynamic range of the original waveform can be reproduced. As the resolution is decreased the reproduction of the waveform will lose much of the dynamic subtlety of the original.

Listen again to the original recording (link above).

Now select this link to listen to the same recording with a resolution of 8-bits.

The difference is more subtle in this case, but through a good quality sound card and speakers you can hear that the dynamic reproduction is poorer.

In the next part of this series you will look at audio waveforms in more detail and see how complex waveforms can be created from simple waveforms using a process known as additive synthesis.

Tags: , , ,
Posted in Audio Theory | Comments (11)

Audio Fundamentals Part One: The Nature of Sound and its Electronic Representation

August 11th, 2008

Audio Fundamentals – Index
Part One: The Nature of Sound and its Electronic Representation (this post)
Part Two: The Digital Representation of Sound
Part Three: Additive Synthesis

This is the first part of a series of tutorials on audio fundamentals. This first part looks at what sound is and how it is represented electronically.

Sound is a phenomenon by which the vibration of physical objects causes sympathetic vibrations in the surrounding media. When these vibrations reach the ear of a person that person’s ear drum also vibrates sympathetically, and this is translated into signals that the brain interprets as sound. For example, a guitarist in a band plays a note on her guitar which causes the cone of a speaker to vibrate rapidly back and forth. This vibration changes the air pressure around the cone and these changes of pressure propagate outwards as waves. The molecules in the air are packed more densely in the areas of high pressure and less densely in the areas of low pressure. When the waves reach the ears of the people listening and dancing to the band, their ear drums move back and forwards in time with the changes in air pressure and the brain interprets this as the music that the dancers are listening to.

Changes in air pressure propagate outwards and cause movements in the ear drums of listeners

Changes in air pressure propagate outwards and cause movements in the ear drums of listeners.

If you were to draw a graph showing the amount of vibration in the air on the y-axis against time in the x-axis, you would see a representation of the sound wave, like this:

This sound wave shows just a few fractions of a second of an electric guitar lick.

This sound wave shows just a few fractions of a second of an electric guitar lick.

If you want to record and reproduce the sound you can do so with a microphone. This works in a similar way to an ear drum, because a diaphragm inside the microphone moves back and forth sympathetically as the air pressure changes and these movements are converted to a changing voltage in an electrical circuit. If the circuit has a range of + or -10 volts then a quiet sound might oscillate the circuit between -2 and +2 volts, while a louder sound would move it between -8 and +8 volts. If you represented the voltage changes in the circuit on the y-axis of a graph against time in the x-axis, then it would look the same as the sound wave shown above.

A graph representing changing voltage over time would look similar to the graph representing changing air pressure.

A graph representing changing voltage over time would look similar to the graph representing changing air pressure.

The voltage could then be passed into a speaker, causing the cone to vibrate back and forth, thus setting up the same pattern of vibrations in the air and thereby reproducing the original sound. Alternatively, the voltage could be converted into a changing magnetic field on a length of audio tape thereby making a recording of the sound, which, at a later point, could be converted back into a voltage to play the sounds back.

The circuit with its changing voltage and the audio tape with its patterns of magnetism are known as analogue devices because they represent the sound with a continuously variable form. In the next part of this series you will see how a sound can be represented digitally.

Tags: , , ,
Posted in Audio Theory | Comments (3)