An Exploration of Virtual Auditory Shape Perception
![]()
![]()
![]()
All of the experiments in this series used the same HRTF. Wenzel, et al. [1991, 1993] have shown that there is only a minor acuity decrement due to the use of non-individualized HRTF's. The major performance degradation with non-individualized HRTF's comes in the form of increased frequencies of both front-back and vertical reversals. Non-individualized HRTF's were observed to cause a doubling of the frequency of front-back reversals over instances where the subject's own HRTF was used (which were double the frequency of free-field reversals) [Wightman & Kistler, 1989b; Wenzel, et al., 1991, 1993]. These data are of little importance to my experiments, as all stimuli were presented in front of the subjects. However, Wenzel, et al. also observed a sevenfold increase in the frequency of vertical confusions.[28] One would expect the perception of spatial patterns to be most adversely effected by this type of error.
Only a certain percentage of population are "good localizers" even with natural stimuli. Of those, only a portion would be well matched to the earprint. Thus it seemed essential to select for people that both seemed to be good at auditory localization tasks in general, and were specifically good with the particular HRTF employed. I chose as the primary subject screening criterion for my study the frequency of vertical confusions.
The screening experiment was a magnitude estimation task consisting of a judgment of the distance of a virtual sound source above or below a visually presented horizontal line.
Where
is in degrees.
The heights were geometrically spaced in order to reduce the number of levels while still covering the same range. This formula spaces the levels relatively nicely (with perhaps a slight overabundance of values near zero), but it constitutes the first of several minor errors I made in these studies. I had intended to have this spacing become regular on a log-log plot of expected height against judged height. However, this function is only regularly spaced on a log-log plot of expected angle against judged height. Fortunately, regular spacing is merely a convenience, and is not crucial to any analyses I wished to perform. Incidentally, the log of this function is extremely close to linear over the range that I used, so close, in fact, as to be indistinguishable from linear (see figure 7.1).
Figure 7.1
Log plot of spacing function
The end result was the somewhat arbitrary (but usefully spaced) set of 17 height values for the positions where the line of sight of the sound insect the grid displayed on the monitor:
Heights Relative to Horizontal Line
0.00
0.35 inches -0.35 inches
0.58 inches -0.58 inches
0.95 inches -0.95 inches
1.57 inches -1.57 inches
2.59 inches -2.59 inches
4.32 inches -4.32 inches
7.31 inches -7.31 inches
13.05 inches -13.05 inches
At the beginning of an experimental session a text box containing instructions appeared on the screen. The subjects were instructed to estimate the height of each sound in inches above or below the horizontal line on the screen. They were to look straight ahead at the center of the screen when they listened to each tone.
When the subject finished reading the instructions, they were given an opportunity to ask questions. Clicking on a button labeled "Done Reading" button, cleared the screen and started the experiment.
Each experimental trial began with a button labeled "Play Sound" in the middle of the grid. When the subjects clicked on this button it disappeared, there was a one second pause, a sound was played and there was another one second pause before they were asked to enter their height judgment.
The a set of controls appeared on the screen (see figure 7.2).
The numbers that the subjects entered appeared in a text box. They were allowed to correct errors by clearing and re-entering the values, and illegal values were automatically rejected.
Figure 7.2
Input Controls for Screening Experiment
Normally constant error would not be a problem, as it could be easily identified by taking the mean of the data. In this case, however, the goal was to count vertical confusions, and there is little reason to expect that they would occur symmetrically.[29] The confusions would also disrupt regression analysis.
I chose to solve this analysis problem by using a robust regression technique. In order to make this work, I wrote my own weighting function which iteratively resolves vertical confusions (the S+ code for this appears in Appendix A).
Another anomaly in the data was excessive variance near the zero point (see figure 7.3). Wenzel, Wightman, & Kistler [1991] observed that arbitrarily resolving vertical confusions leads to potential biases in regions where the variance is larger than the distance from the zero point. For these reasons, I decided to discard the values with expected distances of less than 1 inch from the zero point.
Image does not translate to HTML. We apologize.
Image does not translate to HTML. We apologize.
The vertical confusion rates ranged from 12.5%-52.5% with an average rate of 34%. This is high compared to the study by Wenzel, Wightman, and Kistler [1991, 1993] which had a range of 7%-32% with an average of 18% with the same HRTF. The main difference between their study and mine is that they used white noise. This strongly indicates that the stimulus I used is confusing. If it were not for the fact that I wished to compare my results to another study (Lakatos, 1993a) that used the 12-partial harmonic complex, then I would consider using another sound.
Of the 21 subjects, nine had vertical confusion rates of less than or equal to 30%. I picked this as my cut-off point. One of these nine had excessive variance in her data, and was eliminated. This left a pool of eight subjects for the rest of the experiments. The results are summarized in table 7.2 below.
Screening Experiment Data Summary
Subject Confusion Slope Adjusted Absolute
Count Mean Residue
RM 12.5 0.24837657 3.52685867
RJ 17.5 0.30750321 3.14275029
KH 20 0.25734472 4.11193305
TM 27.5 0.89000957 3.02234295
SW 27.5 0.39496629 3.72301845
CO 30 0.53811359 2.36984051
CD 30 0.43695416 3.04324691
KS 30 0.39156492 3.94108368
SG[30] 30 0.49905878 5.37504168
SC 32.5 0.318493 4.34736303
PT 32.5 0.08175296 10.423489
JK 35 0.2245698 3.07086686
HR 37.5 0.37689759 4.88152739
CM 37.5 0.19437143 5.02769816
GS 40 0.83635199 2.67562108
SK 42.5 0.52109905 3.42143489
MH 42.5 -0.634634 5.46589525
SF 47.5 0.19742101 4.19195675
DM 47.5 0.46161822 5.59563177
DS 47.5 0.35240334 6.18871595
AK 52.5 -0.0410007 6.43836948
Outliers
are in boldface type