An Exploration of Virtual Auditory Shape Perception

[Previous Chapter][Table of Contents][Next Chapter]


2. The Auditory System

Auditory perception is complex and involuted with multiple levels of redundancy and sheer deceit. The auditory perceptual dimensions are many, and often almost arbitrarily defined. Nonetheless, they are often quite robust and compelling. The subliminal message in this chapter is: with the profusion of recognized dimensions available to us, it is not too outrageous to expect that we might be able to create new ones.

2.1. The Multidimensionality of the Auditory System

Perceptual dimensions are not like the spatial dimensions. The main difference is that perceptual dimensions are seldom truly orthogonal. Mathematically they might more properly be called perceptual subspaces. For the purposes of this paper, I define a perceptual dimension as a perceptual continuum that is consistently quantifiable within individual perceivers.

Before I attempt to explain where one might draw form out of chaos, I want to give a brief description of the chaos.

Figure 2.1 The Cochlea [From Williams, 1988]

All of audition relies on a sensory transducer with fundamentally two dimensional output. This transducing mechanism, the cochlea, lies curled up inside of the inner ear. The cochlea is physically coupled at one end to the tympanic membrane (eardrum) by the bones of the middle ear: the mallaeus (hammer), the incus (anvil), and the stapes (stirrup). When sound pressure waves from the outside world vibrate the eardrum, the bones of the middle ear initiate sympathetic motion in the fluid that fills the upper chamber of the cochlea. Waves in this fluid travel along the length of the cochlea. This in turn causes motion of the basilar membrane which is part of a long partition that divides the interior of the cochlea into two channels. The motion of the basilar membrane reaches maximum amplitudes at positions corresponding to each frequency component of the external sound. Hair cells along the basilar membrane measure the amplitude of motion. In this way the cochlea is a mechanical Fourier analysis machine measuring the intensity of each frequency component of external sounds that are within the range of human hearing.

Figure 2.2

Frequency Sensitivity Along The Basilar Membrane

(After Carterette, 1978)

From just these two physical dimensions, frequency and intensity[2], we build a vast perceptual cosmos containing numerous perceptual dimensions. In order to establish the context for yet more potential complexity to our auditory experience, I will conduct a tour of some of the perceptual attributes within audition:

n 2.1.1.

Loudness

Loudness is "the subjective intensity of a sound" [Scharf, 1978, p. 227]. It is the perceptual dimension that is probably most familiar. If you are a member of my generation, you probably have at least once, found yourself yelling: "DAD, TURN DOWN THE STEREO, IT IS TOO LOUD!" at 3:00 A.M.

The perceptual scale of loudness is the sone scale. One sone is defined as 40 phons (or dB) of a 1000 Hz tone. The rest of the sone scale is defined in terms of that reference point. A tone which is perceived to be twice as loud as the reference tone is defined to be 2 sones, and a tone that is perceived half as loud is 1/2 sone. For 1000 Hz tones, this perceptual scale relates to the physical dimension of intensity roughly following a power law:

where L is loudness in sones, and P is the pressure in micropascals (uPa).

Or

since intensity changes as the square root of pressure. A better match is obtained at low sound pressure levels using:

where Po is the effective threshold (~45 uPa).

Another potentially important function to know regarding loudness is the relative difference limen or just noticeable difference (JND). The JND of loudness varies from ~2dB at the threshold to about 0.5 dB up around 80 dB.Look for ref from Wessel's class on bark scale.

The above rules apply to a sustained tone at 1000 Hz. The behavior becomes somewhat more complicated with other types of sounds. Loudness varies in a non-uniform fashion with frequency, tapering off at both the low and high ends of the perceivable spectrum. This is just the first of many interactions of perceptual dimensions that we will see.

With complex tones, loudness increases with bandwidth, to a certain extent, irrespective of the overall energy. Surprisingly this works for both line spectra and continuous spectra. The reason involves the concept of critical bandwidth.

One method of evaluating the loudness of a narrow-band tone is to determine its perceptual threshold against a background of noise, using the known intensity of the noise as a reference. The threshold intensity of the tone is the point where the noise and the tone are equally loud. Experiments have shown[3] that narrowing the bandwidth of the noise, down to a specific limiting width, does not change the threshold level of the tone. Likewise, if the tone is broadened out to a limiting width, the threshold remains unchanged. This limiting bandwith is called the critical bandwidth. The absolute loudness of broadband sounds is partially determined by the number of critical bandwidths that they span. The critical bandwidth is constant at about 100 Hz for center frequencies of up to 500 Hz, and thereafter is 10-15% of the center frequency [Scharf & Buus, p. 14.35].

There is also time dependence in loudness. As the duration of a tone increases up to ~200ms the threshold loudness decreases: the ear integrates on time scales < 200ms.

With regard to all perceptual dimensions, it is important to keep in mind that there are individual differences. This means that variation from person to person is not merely likely, but guaranteed. In fact Levelt, et al. [1972][4] has shown that the sone scale, which is among the more consistent perceptual scales, varies not just from person to person, but between the ears of individuals. The basic structure of a perceptual attribute should, however, remain the same across many perceivers.

2.1.2. Pitch

2.1.2.1. Monochromatic Pitch

With the very purest of tones, pitch is the perceptual analog of frequency. Human's can detect frequencies from 20 Hz to above 20,000 Hz, although only up to about 5000 Hz is a true sensation of pitch available [Rash & Plomp, 1982, p. 7]. The perceptual scale for pitch is the mel scale which, like the sone scale, is defined with reference to a standard: a sine wave at 1000 Hz. The 1000 Hz sine wave is defined as 1000 mel. A sound perceived to have a pitch that is half of the standard is 500 mel, and one at twice the standard pitch is 2000 mel.

The just noticeable difference in frequency is dependant both on the frequency and the method of measurement. The JND has traditionally been measured in two ways: change in frequency and difference in frequency. At low frequencies, modulations of 2-5 Hz are detectable. If, however, two pitches are played sequentially, the JND is in the 1-3 Hz range. At high frequencies this bias reverses, and the JND for modulation is just below 100 Hz, while it is just above 100 Hz for a sequence.

2.1.2.2. Polychromatic pitch

With tones other than sinusoids, the perceptual mapping becomes more complicated. From Fourier's theorem, we know that any waveform can be expressed as the superposition of a potentially infinite series of sine waves. We also know that the cochlea performs a mechanical Fourier analysis of incoming sound waves. If, however, we assume that the evolutionary purpose of hearing is to associate sounds with objects, then it would make little sense for us to perceive sounds as potentially infinite series of frequencies. Some kind of grouping system or mechanism is necessary. These mechanisms do exist. In a later section I will discuss the concept of auditory streams, but for now I will restrict the discussion to the manner in which grouping of frequency components impact on the perception of pitch. In the natural world, a common configuration for a sound with tonal qualities is a harmonic complex. This type of sound consists of the superposition of a fundamental frequency f and any number of f's harmonics: 2f, 3f, 4f, 5f, and so on. Tones consisting of such a complex are perceived to have a pitch corresponding to that of the fundamental frequency.

When sounds deviate in structure from the harmonic complex, a generalized form of this rule for perceived pitch applies. If the fundamental frequency is weak or even missing, but some number of the harmonics are present, then the pitch perceived is still that of the fundamental. A good example of this phenomenon is the lowest key on the piano keyboard. There is no measurable energy at the fundamental frequency of this note (22 Hz), and yet it is still useful in a musical scale.[5]

Missing Fundamentals
There is also flexibility in the pitch perception system such that even if some of the components are off from perfect harmonics, the same fundamental pitch is perceived. As components begin to deviate more radically from the harmonic series, then fundamental pitch from the most likely harmonic series is perceived. If the series differs even more, then eventually the tone may split into several simultaneous pitches.

Pitchspace
We have already seen that the perception of pitch is a complex and non-linear process. Thus far, the role of our sensory and perceptual systems has been a process of simplification: In simple (sine wave) tone perception, we have a monotonic, although non-linear, map of frequency on to perceived pitch. Complex tones are generally reduced to one or a small number of pitch objects. Our biological frequency analysis is performing a useful function: reducing a barrage of acoustic input to a manageable scale. The trouble begins when, instead of considering tones in isolation, we form meaningful relationships between them. This opens up a vast and complicated perceptual domain. A domain which forms one of the main foundations of field of Music.

There is something special about certain relationships between musical notes. One of the most important relationships is the octave. Octaves, notes that have integral frequency ratios,[6] seem to go well together. This is not surprising, as octaves form the harmonic sets that we saw were so important to pitch perception. Octaves therefore are perceptually similar. Sometimes more similar, in fact than certain sets of notes at less than octave intervals [Shepard, 1982, p. 346]. We cannot stick to a unidimensional scale unless we are willing to consider a non-monotonic, multi-valued perceptual mapping for frequency. If we step into a higher dimensional representation of pitch, a simple spatial analog for a monotonic scale that preserves the similarity relationship of the octave interval is a helix [Shepard, 1982, p. 352]. The pitch proximity of adjacent notes (tone chroma) is expressed as we travel around the coil, and the similarity of the octave is expressed in the proximity of successive winds.

There are other frequency ratios that are important, and hence other sets of significant tonal relationships, for example major thirds, and perfect fifths. Shepard [1982] proposes (among other structures) a seven dimensional manifold that can be partially described as a double helix wrapped around a helical cylinder.

2.1.3. Timbre

Timbre is artificially the most complex of auditory attributes. This is mostly because timbre has classically been defined in the negative: "Timbre is that attribute of auditory sensation in terms of which a listener can judge that two sounds similarly presented and having the same loudness and pitch are dissimilar"[7]. In addition to more its more descriptive aspects, this over-general definition encompasses all of the spatial attributes of audition. I will save spatial attributes for later.

Brightness is probably the most well known of the timbral dimensions. Brightness is a measure of acoustic energy distribution roughly quantified as the centroid of the perceivable auditory spectrum. As noted previously, it is possible to generate equal pitched harmonic complexes with varying numbers of partials and even missing fundamentals. These can all be perceived as the same pitch, however, they are as of different brightnesses.

Taking this process a bit further uncovers yet another less established auditory dimension. Since there are many possible sounds with the same spectral centroid, it is correspondingly also possible to construct many different sounds of the same pitch and brightness. These only differ in the density of spectral energy. Thus density joins the ranks of auditory dimensions.

Many more esoteric timbral dimensions have been suggested. In addition to the ones mentioned above, Carterette [1978] cites tonality and vocality. Other qualities that have been studied are roughness, vowel, beating, bite, and spectral flux. These dimensions may seem to be pushing the boundary between perceptual significance and arbitrary classification. However, several of these timbral dimensions are sufficiently robust as to allow the construction of multidimensional subspaces with interesting properties. David Wessel [1979] describes a two dimensional timber space in which analogical relationships are perceivable. Wessel performed an experiment in which subjects were asked to rate the appropriateness of different tonal analogies: tone A is to tone B as tone C is to tone D. Several different combinations were supplied, and Wessel found a significant tendency toward favoring analogies that preserved relationships within the timbre space.

It is important to note that time variation of properties has creeped in during our discussion of timbre. When I stated before that every one of the perceptual dimensions was built out of the two dimensions frequency and intensity, I left out the influence of time. Frequency is properly a combination of time and intensity (and so it might be better to say that the auditory perceptual dimensions are all based on time and intensity). For perceptual purposes, however, this occurs on scales at or below 50 milliseconds which corresponds to the minimum perceivable frequency of 20 Hz. Time (on larger scales) should therefore be considered one of the fundamental building blocks of the auditory perceptual dimensions.

2.2. Perceptual Grouping

2.2.1. Auditory Streams

Rather than continuing on in this taxidermitological fashion, let us examine some of the reasons for this stunning array of sensory attributes. I have already alluded to the idea of perceptual grouping. This is a pivotal concept for understanding audition in general. In this discussion I will draw heavily on a very large book, Auditory Scene Analysis, written by A. S. Bregman [1990].

Since our brains can only juggle so many details at once[8], we need a system for distilling content from cacophony. Bregman addresses such questions as: "How do we separate the speech of two people talking at once, or the voice of the singer from the orchestra?" He introduces the concept of auditory streams as the critical phase in the perceptual process of interpreting the auditory environment. We separate the sound in our environment into streams, and these streams can then be associated with objects.

One of the reasons that Bregman chose the word "stream" to describe this phase of perception is because a single auditory object may involve a series of sounds, and may take place over a period of time [Bregman, 1990, p. 10]. The basic assumption that the auditory system makes is that the characteristics of sound producing objects remain the same or change slowly over time.

At a given point in time, stream segregation involves proximity on various perceptual dimensions: space, pitch, loudness, brightness, etc. Due to its physical properties, the sounds made by a single object are frequently compact on some of these dimensions. A person's voice has a characteristic pitch and degree of roughness. So do the "voices" of biting insects or hungry predators.

One simple example of stream segregation involves the series of pitches shown in figure 2.1.

Figure 2.3

Auditory Stream Alternatives

If the difference in pitch between temporally successive sounds is small enough then the sequence will be perceived of as one stream (figure 2.3A). If the differences are too extreme then the series splits into two streams (figure 2.3B).

The characteristics of a given stream are allowed to change slowly over time. If, however, the changes occur too rapidly then the streams will fragment. If, for example, the time scale of sequence in figure 2.3A is compressed, eventually the streams will split and take on the configuration of 2.3B. Temporal factors generalize to a larger class of streaming phenomena: sounds tend to group into one stream if they start together, stop together, or vary cohesively in some fashion over time.

There are several reasons that a singer is distinguishable from the orchestra. One is that singers (notably opera singers) have a formant in their voice which corresponds to a peak in the frequency spectrum somewhere in the 2.5-3k Hz range. The average spectrum of the orchestra peaks out at about 500 Hz [Sundberg, 1982, p. 69]. As long as this "singer's formant" is present, then the vocalist can stick out above the sound of the orchestra. Another tactic that singers use, not only applies to singing, but also to any soloists: When musicians use vibrato, they are in a sense providing a carrier wave for the listeners' ears. All of the various aspects of the instruments sound (be it a violin or a vocal tract) vary together, and hence are easier to pick out against the background.

Bregman describes two classes of auditory stream segregation processes: primitive processes and schema based processes. Primitive processes are innate, they work for broad classes of stimuli, and tend to be associated with assigning auditory information to sound sources. Schema-based processes involve learned information, and tend to apply to a more limited set of acoustic events.

Schema based segregation involves the application of cognitive effort to pick out an auditory stream. As an example, in figure 2.3, especially when the time/pitch scale is such that neither grouping is particularly favored, if one applies effort, it is possible to force the sequence into either grouping. If a sequence is less ambiguous, then more effort is often required. One consequence of the necessity for cognitive effort is that where primitive processes improve with speed, schema-based ones get worse.

Another difference between primitive and schema based segregation is that primitive segregation is symmetrical and schema-based processes are not [Bregman, 1990, p. 669]. Whereas it may be possible to pick out the Gilligan's Island theme from a mixed sequence of notes, it is not likely that you would subsequently be able to hum the residual tune.

Grouping is of primary importance in our perception of the auditory portion of the environment. There is a rich variety of cues that can be used in auditory grouping. I suggest that this is one of the reasons why there are so many auditory perceptual attributes: This way we have a large number of potential dimensions on which an object might cohere.

2.2.2. Auditory Localization[9]

Even the spatial perceptual dimensions in audition are not very "space-like" in their properties. The auditory spatial dimensions are neither orthogonal, nor isotropic, and they rely on a large number of heterogeneous cues. Nonetheless, auditory localization is perhaps the most important factor in auditory grouping.

This is a paradox: in order to locate an auditory object, is it not first necessary to determine what portion of the ambient sound belongs uniquely to that object? Not a whole lot is known about the mechanics of such an operation. What is known is that localization is synergistically at its strongest as a grouping cue in the presence of other corroborating grouping factors [Bregman, 1990, p. 645].

2.2.2.1. Static Localization

Horizontal
Just as there are many factors that contribute to grouping sounds, there are also many perceptual cues associated with localizing sounds. The main cues are binaural, in that they involve differences between the signals at each ear. They are the interaural intensity difference and the interaural phase difference. Both of these cues come as a result of the manner in which sounds semicircumnavigate the head. The head casts a frequency dependent "acoustic shadow" which creates loudness differences. High pitched sounds with wavelengths that are small compared to the size of the head are largely blocked. Low pitched sounds proceed largely unimpeded. The additional distance that a sound may need to travel to reach the far ear causes a phase difference. This cue, once again is highly dependent on the frequency of the incident sound.

These cues alone do not uniquely indicate the 3-space position of a sound source. Any given point in space is only one member of a locus of points that have the same interaural characteristics. This "cone of confusion" is disambiguated by another set of cues that are due to the effect of the external part of the ear: the pinnæ.

Since the pinnæ are irregularly shaped, they have an asymmetric effect on incoming sound waves. This effect not only serves to resolve the cone of confusion, but is also the main factor in vertical localization. Pinnæ cues are generally represented as slight modifications to the spectrum of incoming sounds.

2.2.2.2. Depth

There is very little data available on auditory depth perception. Depth perception appears to based on two factors: relative intensity and ratio of direct sound strength to echo strength [von Bekesy, 1960, p. 301-313]. Doubling distance decreases signal by 6 dB. Additional timbral factors come into play at large distances. Beyond 15m the timbre deepens due to a differential attenuation of frequencies by air [Boff & Lincoln, 1988].

2.2.2.3. Dynamic Localization

The perception of the motion of sound sources is yet another multifaceted process. Like static localization, dynamic localization makes use of interaural phase differences and interaural intensity differences. There is some debate in the sparse literature on this topic as to whether or not dynamic localization is simply a matter of detection of changes in position [Middlebrooks & Green, 1991, p. 151].

The third cue in auditory motion perception makes it clear that such perceptions are not based solely on change. Because objects in our environment frequently have velocities that are a significant fraction of the speed of sound in air (343 m/s at 20deg.C in dry air at 1 atmosphere of pressure) they are subject to Doppler shifts in frequency: where is the velocity of the sound source away from the ear, v is the initial frequency, is the shifted frequency, and c is the speed of sound.

Rosenblum, et al. performed a study of listeners ability to identify the time when a moving sound passed directly in front. Their data showed that each of these cues in isolation is sufficient perform this task. The precision was greatest with intensity differences, followed by phase differences (actually time differences in this case), and finally Doppler shifts.

2.2.2.4. Inhomogeneity and Anisotropy

The Overview

in Auditory Space

A plethora of different cues, perceived in different permutations make up our spatial hearing system. It is hardly surprising that this system is both inhomogeneous and anisotropic in its assessment of space. There are three types of errors in localization. The first is simply an issue of the precision of the various localization processes. The second type of error is a manifestation of the ambiguity of certain cues (the "cone of confusion"). The third type of error is a complete lack of externalization of a sound [Durlach, et al., 1992]. This occurs most frequently with synthetically spatialized sounds.

Azimuth
In the plane of the horizon (assuming an egocentric planet), spatial acuity is best straight ahead. Under ideal situations, we can resolve horizontal spatial differences at this position of about 1deg.. This acuity gradually degrades as we go left or right and then improves again in behind [Strybel, 1992; Blauert, 1983]. Blauert reports that the acuity is worst approximately 90deg. left or right (acuity =+/-10deg.). See table 2.1 for some experimental values.

Table 2.1

Summary of Studies of Horizontal Acuity in the Horizontal Plane

             Study                0deg.  20deg  40deg  80deg  90deg  120de  180de  
                                         .      .      .      .      g.     g.     
Preibisch-Effenberger, 1966;      3.6de                       9.2de         5.5de  
Haustein & Schirmer, 1970[10]          g.                         g.            g.     
Oldfield & Parker, 1984           4deg.  6deg.  6deg.  6deg.  12deg  20deg  10deg  
                                                              .      .      .      

Values are mean Absolute error

Off the horizontal plane, azimuthal acuity stays about the same, (or perhaps improves a little in the 20-30deg. range [Oldfield & Parker, 1984]) and then begins to get worse at elevations in the 70-80deg. range [Strybel, 1992].

The main determinant for the accuracy of localization at a given position is the spectral content of the stimulus. Sinusoids are the worst, especially at low frequencies, and impulses of broad-band noise are best. This is even more the case with vertical localization.

Front-Back Confusions
A common error in localization is the front-back confusion. This is error is likely due to interaural ambiguity (i.e. cone of confusion). Typical frequencies of this error are in the 3-10% range.[11] These errors most often have been reported as front to back (as opposed to back to front)[12]. A possible explanation for such asymmetry is the absence of visual cues: if you can't see what is making the sound, it must be behind you.

During a recent informal experiment with a Macintosh-based auditory localization system [13] I observed people making a preponderance of back-front reversals. My suspicion about this is that the subjects experience in sitting at the Macintosh monitor is quite analogous to sitting in front of a television. Generally sounds associated with a television issue forth from the television, and not from behind you.[14] I call this effect "television ventriloquism". Another common error in human behavior attests to the extent that televisions can dominate the environment as a spatial focus: No matter where the VCR is located, people usually point the remote-control at the TV.

Effect of head motion

One possible method of resolving the cone of confusion is the use of head motion. Thurlow & Runge [1967] observed that induced head rotation causes large reductions in the frequency of reversals (in one instance from 90% down to zero). Thurlow & Runge observed an overall reduction in localization error even after the data was corrected for reversals. This is likely due to the fact that head rotations bring sound sources through the higher-acuity potions of auditory space. Shelton, Rodger, & Searle [1982] noted that an important determinant of the effect of head motion is the presence of visual cues. They observed that the benefits of head motion are minimal unless there are also visual stimuli present. This is likely due to the augmentation of proprioceptive feedback that visual context provides.Ref Furness

Elevation
Elevation acuity is not quite as well mapped out as Azimuthal acuity. Blauert [1984] reports on studies performed with various stimuli. He cites a study of the localization of white noise which reported a vertical acuity of 4deg. at a position of 0deg. Azimuth and 0deg. elevation. Blauert also cites another study conducted with the speech of a familiar speaker which is summarized in table 2.2 below. Note the double value at 90deg.. The first number appears to be a for judgments of the elevation with reference to the forward horizon, and the second from one with reference to the rear horizon. Blauert does not elaborate on the mechanism of measurement used, but it is likely a reflection of the fact that subjects tend to systematically underestimate elevations [Blauert, 1983].

Table 2.2

Summary of Studies of Vertical Acuity in the Median Plane

 Elevation       Acuity (Damaske &       Acuity (Oldfield &    
                  Wagener, 1969)            Parker, 1984)      
  -40deg.                                       4deg.          
  -30deg.                                       2deg.          
  -20deg.                                       6deg.          
  -10deg.                                      10deg.          
   0deg.               9deg.                   10deg.          
  10deg.                                       10deg.          
  20deg.                                        4deg.          
  30deg.                                        6deg.          
  36deg.              10deg.                                   
  40deg.                                        6deg.          
  90deg.          13deg., 22deg.                               
  144deg.             15deg.                                   

Values are mean Absolute error

Vertical Confusions
Vertical confusions, although not as common as front-back reversals, are another type of observed error in localization. These errors are much more frequent with synthetic localization cues.[15]
Depth
Auditory depth perception is generally quite poor, and highly dependent on specific sounds and environments. Familiar sounds are easier to place than unfamiliar sounds, and sounds are easier to place in familiar environments. One set of experimental values for distance acuity ranged from .25-.5 meters [Blauert, 1983, p. 47].

2.3. Head-Related Transfer Functions and Virtual Sound

In order to portray a realistic synthetic environment, virtual sound systems must be able to faithfully reproduce all of the various cues that allow spatial hearing. Until recently, most of the work in spatial sound addressed the cues independently. This approach is relatively straight forward for gross interaural differences in phase and intensity associated horizontal position. Attempting to algorithmically account for the host of more subtle cues due to individual anatomies, would be much more difficult. Wightman & Kistler [1989a,b] solved this problem[16] empirically, by returning to the properties on which all the auditory perceptual dimensions are based.[17]

Wightman and Kistler simply measured the effect of different positions on a stimulus containing equal amounts of all perceivable frequencies. They used tiny probe microphones to make recordings of the ear-canal perspective on impulses of white noise from 144 positions surrounding the listener. This record of the frequency domain effects of each of those positions is called a head-related transfer function (HRTF).

To reproduce the effect of the position with an arbitrary sound, all that is necessary is to perform a Fourier transform to break the sound into its frequency components, apply the HRTF, and then perform an inverse transform to return the sound to the time domain representation. Wightman & Kistler [1989b] verified the efficacy of this procedure by testing subjects localization both with these synthetic cues, and in the free field. With the exception of increased frequency of vertical and front-back confusions, the subjects had performance that was consistent between these cases.

The measurement of an HRTF (also called an "earprint") is a laborious process. It is not currently practical to measure the earprint for every individual desiring to use three dimensional sound. Wenzel, Wightman, & Kistler [1991][18] examined listeners' ability to localize sounds using other people's earprints. They determined that performance is largely dependent on the "quality" of the earprint. There are large variations in auditory spatial acuity from person to person. If the HRTF is measured from a person with generally poor localization ability, then others who use that earprint also will have poor localization, regardless of their initial acuity. If, however, the earprint is taken from a "good" localizer, then other good localizers will tend to retain much of their acuity. There are even some indications that poor localizers can benefit from a good earprint in slightly increased acuity.

28~ The main failing in non-individualized earprints is in hemispheric confusion rates. Using another's earprint tends to increase frequency of front-back confusions by a factor of four and vertical confusions by a factor of seven[19].