Sound replay: storage and formats

Author: Pery Pearson

Aural cues can enhance the realism of immersive virtual reality by providing sounds synchronized to a given situation. Properly coordinated background music, for instance, can set the general mood or tension of a simulation in the same way that motion pictures do. Storage and reproduction of these sounds are the topics of this article.

Raw sampled sound can require a large amount of storage space. For instance, 16-bit CD-quality sound sampled at 44.1KHz takes 2 Bytes * 44.1K samples/second which is 88.2KBytes to store one second of sound. There are many sound formats which compress sampled sound data or vary the sample rate or A/D conversion method to reduce the required storage and transmission bandwidth. For example, the digital telephony standard is 8-bit samples with an 8KHz sampling frequency while compact discs use 16-bit samples at 44.1KHz. [ARL91]

Formats like Mu-law take advantage of certain types of sound. Mu-law is used by most telephone companies in North America. Mu-law has an 8KHz sampling rate by 8-bits of data per sample. this requires a serial transmission speed of 64000 bits per second. Mu-law differs from standard A/D conversion techniques in that the steps defined by the 256 quantizing levels are not uniform. This allows a wide range of speech amplitudes with the same signal-to-noise ratio. Refer to EVE topic I.B.3.a on Sound Sampling for more A/D conversion information.

Listed below are some file extension conventions used on HP-UX systems; e.g. a Sun-format sound file might be 'HandClap.au'. [HP91]

.u MuLaw One of two common types of pulse code modulation (PCM) standardized by the Consultative Committee for International Telephone and Telegraph (CCITT). The other type is ALaw (or A-Law).

.al ALaw One of two common types of pulse code modulation (PCM) standardized by the Consultative Committee for International Telephone and Telegraph (CCITT). The other type is MuLaw (or Mu-Law).

.au Sun (NeXT) Standard sampled sound format used on Sun Microsystem workstations

.wav Microsoft RIFF waveform Standard sampled sound format used on Microsoft Windows

.snd Mac Standard sampled sound format used on Apple Computer Macintosh computers

.l16 Linear16 (16-bit signed samples)

.l8 Linear8 (8-bit signed samples)

.lo8 Linear8Offset (8-bit unsigned samples)

Many operations can be performed on digitally sampled data to produce various sound effects like three-dimensional sound, reverberation, and changes to pitch, timbre, and tempo to name a few. Some of these are discussed in other related EVE articles.

Sampled sound replay

The process of converting digital data into an analog signal which can be amplified to generate sound from a speaker is known as digital-to-analog (D/A) conversion. Previously sampled or synthesized digital data are sent as a stream of words to the D/A converter which produces varying voltage potentials corresponding to the binary value of the data words. The D/A converter output is amplified to drive a speaker which generates audible sound.

A typical D/A conversion circuit consists of an amplifier with a network of weighted resistors on its inverting input [see Figure]. The resistors are weighted with resistance values in powers of 2 times R-ohms; e.g. resistor values might be R, 2*R, 4*R, etc. The lowest resistor, R, is controlled by the most significant bit of the data word, and the highest resistor, 2^(n-1)*R, is controlled by the least significant bit. Basically, the sum total of current through all weighted resistors flows through the feedback resistor, Rf, which results in a voltage potential across Rf which is equivalent to the sum of voltages across the weighted resistors; since resistor values are in powers of 2, the sum voltage is also equivalent to the value of the digital binary data word.[FLOYD84]

A binary zero, 0000, will produce a voltage potential of approximately 0 Volts, and a binary (2^n)-1, 1111, will produce a maximum voltage potential. This value essentially translates to a movement of the speaker diaphragm; 0 Volts is a speaker at rest and a maximum voltage properly amplified and matched to the speaker causes a maximum speaker movement. Changing the speaker voltage rapidly causes the speaker diaphragm to vibrate and produce audible sound. Larger speaker movements or voltage changes produces louder sound, and faster speaker vibrations or frequencies produce sound of a higher pitch. A sampled sound may also be replayed at a rate different from the sampled rate; a faster replay increases the pitch and tempo of the replayed sound, and a slower replay decreases pitch and tempo.

Since a D/A conversion by itself will create a stair-stepped output waveform, one other step is needed to help approximate the original sound and improve the sound quality [see Figure]. The steep vertical slopes of the steps produce undesirable high frequency noise. To alleviate this, the output voltage is filtered to smooth out the steps; a low-pass filter which only allows frequencies less than one-half of the sampling rate to pass works well for this. A simple low-pass filter example is a series resistor and a capacitor to ground on the D/A converter output.[HOLT78]

Courses which would be useful in sound sampling and replay include an electrical or computer engineering study of microprocessor peripherals, operational amplifiers, and digital and analog circuits. Sound storage and formats implies data manipulation which covers a broader range of courses to study including digital signal processors, new data manipulation algorithms, compression hardware and algorithms, and a musical background is desirable.

References

[FLOYD84]: Floyd, Electronic Devices, Charles E. Merrill Publishing Co., 1984, pp.765-770.

[HOLT78]: Holt, Charles A., Electronic Circuits: Digital and Analog, John Wiley & Sons, Inc., 1978, pp.789-796.

[ARL91]: The ARL Handbook for Radio Amateurs, The American Radio Relay League, 1991, 68th edition, pp.7.1-7.11, 8.20-8.23.

[HP91]: HP S700 Workstation Audio Users Guide, Hewlett-Packard Co., 1991, Order No. A1991-90609, pp.1.8, G.1

Human Interface Technology Laboratory