An Exploration of Virtual Auditory Shape Perception

[Previous Chapter][Table of Contents][Next Chapter]


7. Screening Experiment: HRTF Confusion Rating

This series of experiments was designed to determine if virtual auditory shape perception is possible using a few potential display techniques. Since a great deal of inter-subject variability had been observed in other experiments involving virtual sound, I needed to develop a set of screening criteria to ensure that the main effect of shape recognition did not get lost in variability dur to a poo match between subject and head-related transfer function.

All of the experiments in this series used the same HRTF. Wenzel, et al. [1991, 1993] have shown that there is only a minor acuity decrement due to the use of non-individualized HRTF's. The major performance degradation with non-individualized HRTF's comes in the form of increased frequencies of both front-back and vertical reversals. Non-individualized HRTF's were observed to cause a doubling of the frequency of front-back reversals over instances where the subject's own HRTF was used (which were double the frequency of free-field reversals) [Wightman & Kistler, 1989b; Wenzel, et al., 1991, 1993]. These data are of little importance to my experiments, as all stimuli were presented in front of the subjects. However, Wenzel, et al. also observed a sevenfold increase in the frequency of vertical confusions.[28] One would expect the perception of spatial patterns to be most adversely effected by this type of error.

Only a certain percentage of population are "good localizers" even with natural stimuli. Of those, only a portion would be well matched to the earprint. Thus it seemed essential to select for people that both seemed to be good at auditory localization tasks in general, and were specifically good with the particular HRTF employed. I chose as the primary subject screening criterion for my study the frequency of vertical confusions.

The screening experiment was a magnitude estimation task consisting of a judgment of the distance of a virtual sound source above or below a visually presented horizontal line.

7.1. Apparatus

The apparatus was mostly the same as that described in the previous chapter. The only addition was a cardboard border which was placed at the top and bottom of the screen such that the edges of the border were each 13 inches from the centerline of the monitor. The subjects sat in an adjustable chair such that their head was level with the horizontal center of the monitor and the center of their heads were 20 inches from the center of the screen. The subjects were not immobilized. Instead a grid was placed on the screen, and the subjects were asked to look straight ahead at the center of the screen when making their judgments.

7.2. Subjects

Twenty-two subjects were recruited from conferences, Educational Technology classes at the University of Washington, and anyone at the HIT Lab who stood in one place for too long. Two of the subjects were audio engineers, or had extended experience in audio work. Motivation was provided in the form of tours of the HIT Lab for those who do not work there, and in the form of either altruism or guilt for those who do.

7.3. Stimulus Combinations

The stimulus was a twelve-partial harmonic complex with a fundamental frequency of 1000 Hz (as described in the previous chapter). This stimulus was presented at seventeen different elevations. The elevations were spaced geometrically, following the superfluously complicated formula:

Where is in degrees.

The heights were geometrically spaced in order to reduce the number of levels while still covering the same range. This formula spaces the levels relatively nicely (with perhaps a slight overabundance of values near zero), but it constitutes the first of several minor errors I made in these studies. I had intended to have this spacing become regular on a log-log plot of expected height against judged height. However, this function is only regularly spaced on a log-log plot of expected angle against judged height. Fortunately, regular spacing is merely a convenience, and is not crucial to any analyses I wished to perform. Incidentally, the log of this function is extremely close to linear over the range that I used, so close, in fact, as to be indistinguishable from linear (see figure 7.1).

Figure 7.1

Log plot of spacing function

The end result was the somewhat arbitrary (but usefully spaced) set of 17 height values for the positions where the line of sight of the sound insect the grid displayed on the monitor:

Table 7.1

Heights Relative to Horizontal Line

            0.00                  
 0.35 inches    -0.35 inches   
 0.58 inches    -0.58 inches   
 0.95 inches    -0.95 inches   
 1.57 inches    -1.57 inches   
 2.59 inches    -2.59 inches   
 4.32 inches    -4.32 inches   
 7.31 inches    -7.31 inches   
 13.05 inches   -13.05 inches  

7.4. Procedure

Before proceeding with the experiment, each subject was allowed to acquaint themselves with the idea of virtual sound by playing with a demonstration of the spatialization system. This demonstration simply consisted of a grid and a mouse pointer. The subjects would click on a spot on the grid, and a sound would be played as if it was coming from a speaker physically placed on the screen at that location.

At the beginning of an experimental session a text box containing instructions appeared on the screen. The subjects were instructed to estimate the height of each sound in inches above or below the horizontal line on the screen. They were to look straight ahead at the center of the screen when they listened to each tone.

When the subject finished reading the instructions, they were given an opportunity to ask questions. Clicking on a button labeled "Done Reading" button, cleared the screen and started the experiment.

Each experimental trial began with a button labeled "Play Sound" in the middle of the grid. When the subjects clicked on this button it disappeared, there was a one second pause, a sound was played and there was another one second pause before they were asked to enter their height judgment.

The a set of controls appeared on the screen (see figure 7.2).

The numbers that the subjects entered appeared in a text box. They were allowed to correct errors by clearing and re-entering the values, and illegal values were automatically rejected.

Figure 7.2

Input Controls for Screening Experiment

7.5. Results & Discussion

7.5.1. Analysis

The analysis was complicated by some irregularities in the data. In spite of the fact that the height of the subject's chair was adjusted such that their ears were level with the horizontal line on the screen, the subjective zero point (i.e. constant error) varied from subject to subject, and was rarely at the objective zero point. Perhaps this is an indication that the voluntary head immobilization was not sufficient. I did observe that several subjects preferred to close their eyes while listening. This would make it difficult for them to fix their head position.

Normally constant error would not be a problem, as it could be easily identified by taking the mean of the data. In this case, however, the goal was to count vertical confusions, and there is little reason to expect that they would occur symmetrically.[29] The confusions would also disrupt regression analysis.

I chose to solve this analysis problem by using a robust regression technique. In order to make this work, I wrote my own weighting function which iteratively resolves vertical confusions (the S+ code for this appears in Appendix A).

Another anomaly in the data was excessive variance near the zero point (see figure 7.3). Wenzel, Wightman, & Kistler [1991] observed that arbitrarily resolving vertical confusions leads to potential biases in regions where the variance is larger than the distance from the zero point. For these reasons, I decided to discard the values with expected distances of less than 1 inch from the zero point.


7.5.2. Results

The robust regression technique appeared to be successful. Many subjects' data showed clear instances of vertical confusions which were identified and accounted for by the analysis (see figure 7.4). The confusion plots for all subjects can be found in Appendix B.

Figure 7.3

Image does not translate to HTML. We apologize.

Figure 7.4

Image does not translate to HTML. We apologize.

The vertical confusion rates ranged from 12.5%-52.5% with an average rate of 34%. This is high compared to the study by Wenzel, Wightman, and Kistler [1991, 1993] which had a range of 7%-32% with an average of 18% with the same HRTF. The main difference between their study and mine is that they used white noise. This strongly indicates that the stimulus I used is confusing. If it were not for the fact that I wished to compare my results to another study (Lakatos, 1993a) that used the 12-partial harmonic complex, then I would consider using another sound.

Of the 21 subjects, nine had vertical confusion rates of less than or equal to 30%. I picked this as my cut-off point. One of these nine had excessive variance in her data, and was eliminated. This left a pool of eight subjects for the rest of the experiments. The results are summarized in table 7.2 below.

Table 7.2

Screening Experiment Data Summary

  Subject    Confusion        Slope        Adjusted Absolute     
               Count                          Mean Residue       
RM              12.5       0.24837657          3.52685867        
RJ              17.5       0.30750321          3.14275029        
KH               20        0.25734472          4.11193305        
TM              27.5       0.89000957          3.02234295        
SW              27.5       0.39496629          3.72301845        
CO               30        0.53811359          2.36984051        
CD               30        0.43695416          3.04324691        
KS               30        0.39156492          3.94108368        
SG[30]           30        0.49905878          5.37504168        
SC              32.5        0.318493           4.34736303        
PT              32.5       0.08175296                               10.423489         
JK               35        0.2245698           3.07086686        
HR              37.5       0.37689759          4.88152739        
CM              37.5       0.19437143          5.02769816        
GS               40        0.83635199          2.67562108        
SK              42.5       0.52109905          3.42143489        
MH              42.5       -0.634634              5.46589525        
SF              47.5       0.19742101          4.19195675        
DM              47.5       0.46161822          5.59563177        
DS              47.5       0.35240334          6.18871595        
AK              52.5         -0.0410007            6.43836948        

Outliers are in boldface type