Appendix A: An Introduction to Digital Audio

Digital Audio Attributes

Digital audio is composed of thousands of pieces of data, called
samples. Each sample holds the loudness, or amplitude, of a
sound at a given instant in time. This is similar to computer
graphics where each point of light (pixel) has a certain
brightness and location. All these points combine to make a
picture. In digital audio, all the samples combine to make a
sound. There are several attributes that determine the quality
and quantity of digital sound. They are the sampling rate, the
number of bits, and the number of channels.

The Sampling rate is the number of times, per second, that a
sample is recorded. It is measured in Hertz (seconds-1, Hz). A
high sampling rate will yield a high quality of digital sound in
the same manner that high graphics resolution will show better
picture quality. Compact disks, for example, use a sampling rate
of 44100Hz, whereas telephone systems use a rate of only 8000Hz.

The rate to use depends upon the type of sound and the amount of
memory and disk space you have available on you system. Higher
rates consume larger quantities of storage. In the above
example, the compact disk requires over 5 times the amount of
storage as the telephone system for the same digital sound.
Certain types of sounds can be recorded at lower rates without
loss of quality. Some standard rates are listed in Table A.1 at
the end of this section.

The number of bits determines how accurately the amplitude of a
sample is recorded. The two most common are 8-bit and 16-bit
formats. In an 8-bit sample, there are 256 different levels of
amplitude. 16-bit samples have 65,536 levels. To compare the
difference, let's say that you are a teacher grading tests and
you can use one of two marking schemes (figure 12). In scheme
#1, the mark is out of 10. In scheme #2, the mark is out of
1000. All marks must be rounded off (no decimals allowed). If
a student gets two thirds of the questions right, then in scheme
#1, the grade will be 7 out of 10. In scheme #2, the grade will
be 667 out of 1000. Obviously, scheme #2 is much more accurate.
In digital sound, low levels of accuracy can cause noise due to
quantization errors, as discussed in the next section.

The number of channels also affects the quality and quantity of
digital sound. Single channel sound, referred to as a monaural
(or mono) sound, contains information for only one speaker and is
similar to AM radio. Two channel sound, or stereo sound,
contains data for two speakers, much like FM stereo. Stereo
sounds can add depth, but they require twice as much storage as
mono sounds.

-----------------------------------------------------------------
Table A.1: Sound attributes
Attributes Quality and Sound type. Storage / second,
Storage / minute
11025Hz, 8-bit, mono Fair quality. Good for speech and low
pitch sounds. 11025 bytes, 662,000 bytes
11025Hz, 8-bit, stereo Fair quality stereo. 22050 bytes,
1,323,000 bytes
11025Hz, 16-bit, mono Less noise. 22050 bytes, 1,323,000 bytes
11025Hz, 16-bit, stereo Stereo, less noise. 44100 bytes,
2,646,000 bytes
22050Hz, 8-bit, mono Good quality. Good for music and
relatively complex sounds. 22050 bytes,
1,323,000 bytes
22050Hz, 8-bit, stereo Good quality stereo. 44100 bytes,
2,646,000 bytes
22050Hz, 16-bit, mono Very good quality. Less noise. 44100
bytes, 2,646,000 bytes
22050Hz, 16-bit, stereo Very good quality stereo. Less noise.
88200 bytes, 5,292,000 bytes
44100Hz, 8-bit, mono High quality. Good for all sounds.
44100 bytes, 2,646,000 bytes
44100Hz, 8-bit, stereo High quality stereo. 88200 bytes,
5,292,000 bytes
44100Hz, 16-bit, mono Excellent quality. Less noise. 88200
bytes, 5,292,000 bytes
44100Hz, 16-bit, stereo Excellent quality stereo (CD quality).
Large storage requirements. 176400
bytes, 10,584,000 bytes
-----------------------------------------------------------------

Problems with Recording

There are five potential problems when recording sound:
aliasing, clipping, quantization, internal noise, and system
configuration.

Aliasing occurs when the sampling process does not get enough
data to correctly determine the shape of the sound wave. The
recorded sound will have missing tones (figure 13, top) or new
tones that never existed in the original sound (figure 13,
bottom). These problem can be eliminated by using higher
sampling rates or by using anti-aliasing filters.

Higher sampling rates increase the number of sampling points.
To see how this works, try adding a few points between each
sampling point in the figure and redraw the graph. The recorded
sound will more closely resemble the input.

Anti-aliasing filters remove all tones that cannot be sampled
correctly. They prevent high pitched tones from being aliased
to low pitch. Many sound cards include anti-aliasing filters in
hardware.

Clipping errors occur when the sampled amplitude is outside the
range of valid values. If, for example, the range is -1.0 to
1.0, and a value of 1.2 is sampled, then the value must be
clipped to 1.0 (see figure 14). This generates distortion. To
eliminate clipping, adjust the input volume before recording.
By using the Device Controls' monitor feature, you can analyse
the input to determine a suitable volume. The volume is low
enough when the red LEDs remain off.

Quantization errors occur when the sample is rounded to the
nearest level of amplitude. This can be explained by using the
"marking schemes" example in the previous section. The number
two thirds (2/3) is represented by 7/10 in scheme #1. This gives
a quantization error of:

| 7/10 - 2/3 | = 1/30

Similarly, in scheme #2, the quantization error is:

| 667/1000 - 2/3 | = 1/3000

Clearly, scheme #2 has the smallest error. Therefore, using 16
bits instead of 8 bits is a good way to reduce quantization
errors.
-----------------------------------------------------------------

The other two recording problems deal with computer hardware. To
minimize internal noise, make sure your audio card is installed
as far away from your graphics/monitor adaptor card as possible.
If you use a microphone, keep it away from your monitor and
computer fan. Remember to use shielded cables.

System configuration can also affect audio quality. Recording to
a compressed drive (DriveSpace) is not recommend. Compression
ratios on audio are generally poor and the CPU overhead can cause
gaps during recording. When recording 16-bit, 44100 Hz, you
should resize the Device Controls windows to hide the
oscilloscopes. This also reduces CPU overhead.

Periodic defects can often be heard when playing pure tones (sine
waves). With most audio hardware, these defects occur during DMA
updates and are unavoidable.

Frequency Spectrums

GoldWave features built-in frequency spectrum analysers in the
Device Controls window. Essentially, they allow you to see what
frequencies (or pitches) are present in a sound. A rainbow is an
example of a frequency spectrum of visible light. The sun's
light is broken down into a set of fundamental colours.
GoldWave's spectrum analysers do the same thing for sound.

GoldWave generates the spectrum by using a radix-2 fast Fourier
transform (FFT). FFTs require intensive computations, making
them somewhat unsuitable for real-time applications. To speed up
these computations, GoldWave makes extensive use of 32-bit 386
assembly language instructions. For accuracy, 64-bit temporary
results are used.

GoldWave optionally applies a windowing function to the data
before performing the FFT (see Setup in the Device Controls
Overview section). This reduces errors that occur when dividing
data into small chunks. The Hamming window, as defined below, is
used.
To make the spectrum more realistic to human hearing, magnitudes
are scaled logarithmically. This means that if one frequency
"sounds" twice as loud as another, it will be graphed with twice
the height (or the corresponding colour for the spectrogram).

During playback with a spectrum oscilloscope, the following
operations are performed each time the oscilloscope is updated:
1) The current position is obtained.
2) The position is drawn on the Sound window's graph.
3) The sample data is windowed.
4) The FFT is performed.
5) The logs of the magnitudes are calculated.
6) The result is converted to screen coordinates or
colours.
7) The graph is drawn.

All this requires a significant amount of CPU time. Under some
circumstances, this may prevent dialogs from being displayed.
If you notice that a dialog is taking an unusually long time to
appear, press the pause or stop button to free up some CPU time
or hide the oscilloscopes by resizing the Device Controls window.