An Exploration of Virtual Auditory Shape Perception

[Previous Chapter][Table of Contents][Next Chapter]


5. Preliminary Experiments

5.1. Preliminary Exploration: Drawing Sounds

When I first became interested in auditory shape perception, I conducted an exploratory experiment on two types of displays. This was an unusual type of experiment in that I asked my subjects to draw pictures of what they heard. Ruff & Perret [1976] tried this as well, but dismissed this technique, as they felt that the subjects needed the visual choices to aid their perceptions. Since I was conducting an exploration, and not necessarily trying to prove anything, I thought that it might be illuminating to get a more direct picture of what subjects heard when different sounds were presented to them.

This was my first attempt at auditory shape display (before I had done most of the background research). As such I made some naïve assumptions about auditory shape. Nonetheless, some observations from this exploration are worth relating . The following account is adapted from previous work.

5.1.1. Display Types


Smooth Path Tracing
It is clear that people are sensitive to the motion of sound sources. Lakatos [1993a] and Ruff & Perret [e.g. 1976] have performed many experiments with patterns traced out on speaker arrays. I tried a similar type of display, reducing the granularity as much as possible. I moved sound sources along circular paths that were as smooth as the spatialization hardware could make them.

Spatial Ambiguity
One possible interpretation for "auditory shape" is a shaped ambiguity as to the position of a sound. Perhaps if the apparent location of a sound source moved rapidly and randomly within a shaped volume, the sound would seem to come from the volume as a whole. Thus the shape of the sound would be the shape of the volume from which it emanated.

5.1.2. Apparatus

I used white noise generated by a Sequential Circuits Pro-One analog synthesizer. The noise was then spatialized by a Crystal River Convolvotron installed in an Intel 80486 based IBM compatible personal computer. The portion of the software that was written expressly for this experiment was written in 'C'. A 512 point head-related transfer function [27] was loaded into the Convolvotron(TM).

5.1.3. Shapes

A total of twenty six sound-shapes were presented. Depending on the specific stimulus, the virtual sound source was either moved at random within a mathematically defined volume, or as smoothly as possibly along a circular path. The shapes are listed in table 5.1.

In the process of developing the software for this experiment, several issues were identified as worthy of future study. The Convolvotron(TM) is capable of updating the position of a sound source up to fifty times per second. There is also an interpolation effect generated by cross-fading between successive positions. While this update rate is generally more than acceptable for localization purposes, it is not at all clear that this rate is fast enough to adequately define a volume.

To get some sense of the problem, I made visual representations of the sound source's position. I noticed that, even in the visual modality, it was difficult to recognize the shape of a volume with fewer than a three-thousand points. At fifty positions per second, a full minute would be required to visit that number of points. The graphical representation displayed all of the points simultaneously which would not be the case for the auditory stimuli. The listener would need to rely on memory, or some degree of "persistence of hearing" to build up an image. I have not found any references to persistence of hearing in the literature.

Since volumes seemed to be a problem, as a potential fix, some of the sound shapes presented were shells instead of volumes. Visual inspection of shells indicated that only three seconds worth of points (150) might do.

Table 5.1

Paths and Spatial Ambiguities

 #                           Description                                 Interpolatio  
                                                                         n             
 0       Tall Ellipsoidal Shell, Major Axis 6", Minor Axes 3"                 No       
 1     Tall (z-axis) Cylindrical Shell, R = 0.75", Length = 6"                No       
 2   Oblate Spherical Shell, R = 3", Compressed Vertically By 1/2             No       
 3                     Spherical Shell, R = 3"                                No       
 4                      Cubic Volume, 6"x6"x6"                                No       
 5                Half Height Cubic Volume, 6"x6"x3"                          No       
 6       Square, Facing Front/Back, (Normal to y-axis), 6"x6"                 No       
 7                Circular Path, Axis Vertical, R=3"                         Yes       
 8                     Spherical Shell, R = 3"                               Yes       
 9        Square, Facing Up/Down, (Normal to z-axis), 6"x6"                   No       
10                            Same as #3                                      No       
11     Wide (x-axis) Cylindrical Shell, R = 0.75", Length = 6"                No       
12    Deep (y-axis) , Cylindrical Shell, R = 0.75", Length = 6"               No       
13            Edge-on Square, (Normal to x-axis), 6"x6"                       No       
14     Tall (z-axis) Cylindrical Shell, R = 0.75", Length = 6"               Yes       
15                      Cubic Volume, 6"x6"x6"                               Yes       
16       Square, Facing Front/Back, (Normal to y-axis), 6"x6"                Yes       
17          Square, Horizontal, (Normal to z-axis), 6"x6"                    Yes       
18                            Same as #8                                     Yes       
19                              Point                                         No       
20     Deep (y-axis) Cylindrical Shell, R = 0.75", Length = 6"               Yes       
21            Square, Edge-On, (Normal to x-axis), 6"x6"                     Yes       
22           Horizontal Loop, (Normal to z-axis), R = 3"                     Yes       
23             Edge-On Loop, (Normal to x-axis), R = 3"                      Yes       
24                              Point                                        Yes       
25     Wide (x-axis) Cylindrical Shell, R = 0.75", Length = 6"               Yes       

As stated before, the Convolvotron(TM) interpolates between positions. The effect is similar to connecting successive points by lines. In the visual domain I found the lines confused the graphical image of the shape. In an attempt to alleviate that potential problem, I set some of the stimuli such that there was no interpolation by turning the sound off between successive points. This "fix" unfortunately came at the expense of update rate, because turning the sound off requires a full cycle.

Another question suggested by these considerations is this: Is there any correlation at all between what is confusing to the eyes, and what is confusing to the ears?

5.1.4. Subjects

Eight subjects were recruited from the laboratory and classes at the University of Washington. All subjects reported normal hearing.

5.1.5. Procedure

This experiment was informal in that it took place in a noisy, chaotic environment complete with ringing phones and construction work in the background. Since the nature of the experiment was largely exploratory, the conditions were acceptable, if sub-optimal for current purposes.

The subjects were prompted by the computer: the following message appeared on the screen:

"I am going to play some sounds behind your head.

These sounds are special in that they have shapes.

Please draw a picture of each sound and label the

picture with the number that is given.

You may spend as much time on each sound as you like.

There will be 26 sounds."

The sounds were presented as if from behind the subjects in order to minimize front-back confusion, as invisible sound sources often default to the back [Wenzel et al. 1993]. The sounds were presented in a random order. I allowed the subjects to listen to each shape as long as they desired. They hit a key on the keyboard to start and stop each sound.

Subjects drew their pictures on sheets of paper with boxes for the drawings.

5.1.6. Results & Discussion

This study generated 208 drawings. In the interest of avoiding deforestation, only a few are shown here.

As one might expect, there was a great variety of pictures drawn. Some of these pictures fit the theoretical shapes extremely well (see figure 5.1).

Figure 5.1

"Good" Drawings

The smooth paths frequently inspired drawings that looked like circles, although often the axis did not match that of the presented sound.

There were also quite a few drawings that looked nothing at all like the intended shapes (see figure 5.2).

Figure 5.2

"Bad" Drawings

There also seems to be an effect produced in a few cases where the perceived sounds were collapsed along the depth and height dimensions, and thus a shell would be collapsed to two points on either side of the head. Since the altitude and depth cues in hearing are less pronounced than are the cues for the horizontal plane, this effect is not surprising.

The most prominent effects seemed to be disruption due to the removal of interpolation. Turning the sound on and off produced an irregular pattern that seemed to overwhelm the subjects, dominating their drawings.

Although the instructions explicitly stated that the quality of interest was the shape of the sounds, several of the subjects tended to draw pictures of what made the sounds. Many drawings were pictures of white-noise generating objects: rain, sprinklers, clogged shower-heads, steam-driven machinery, or even sneezes. The irregular patterns produced by the de-interpolation process also appeared in many drawings in the form of breaks or blotches. This suggests that the nature of the sound stimulus may be more important to spatial studies than is generally believed. Most studies of spatial hearing use white noise due to its spectral richness. Perhaps this is not the best procedure because many "white-noise-like" sounds in our natural environment are ambient sounds like rain. It may be worth while to attempt spatial hearing studies with more intrinsically object-like stimuli.

The drawings I found most intriguing were those with highly metaphorical in content. These both described the shape and apparent nature of the sounds (see figure 5,3).

Figure 5.3

"Ugly" Drawings

5.2. Pilot Experiment: Virtual Auditory Extensity

While in the process of building the system that would be used in a later series of experiments, I briefly attempted a short experiment examining auditory extensity. I wanted to confirm that the frequency difference used by Perrott & Buell [1982] was valid for the system and stimuli that I was using. I also hoped to observe evidence of vertical auditory extent, which has not been reported in the literature.

5.2.1. Apparatus

The equipment I used is described in detail in the chapter 6. As this was only a pilot experiment, I will only briefly describe it here.

I used the Crystal River Convolvotron(TM) to virtually spatialize two sound sources which were generated by a 16-bit digital sound card. The stimuli were presented over headphones.

5.2.2. Subjects

I recruited two subjects from the Human Interface Technology Laboratory. Both reported normal hearing, and had some familiarity with virtual sound.

5.2.3. Stimulus Combinations

In each trial, two sounds were played simultaneously at some frequency difference and an angular offset which was either horizontal or vertical. I used 11 frequency differences: from 0 to 100 Hz at 10 Hz intervals, and 6 angular separations: from 5deg. to 30deg. in increments of 5deg.. There were a total of 264 trials.

The sound sources were 12-partial harmonic complexes with fundamental frequencies ranging from 1000 Hz to 1100 Hz.

5.2.4. Procedure

The subject was seated facing an easel that held a large white card. Before each experiment I read the following instructions to the subject:

Imagine if you will that there are one or more sound sources on the white card. These sources may be of varying types and may come from specific points or from extended areas. For each sound you hear, your task is to determine if there are one or two sources, and if the sources are compact, coming from a specific position, or extended, spread out over a region.

In each trial I picked a stimulus combination completely randomized across all variables, and manually set the offset parameters. The subject responded: 1 or 2 sources, and "compact" or "extended".

5.2.5. Results & Discussion

Unfortunately the data from one of the subjects was extremely inconsistent. This subject gave different responses even when the same stimulus was supplied several times in succession.

The other subject fared somewhat better. Unfortunately the data is still relatively ambiguous. In the horizontal condition, two distinct sources were generally reported when there was a frequency difference (with a small number of exceptions). I can say little about the vertical source data other than the subject tended to more often perceive two sources in the middle frequency differences. (See figure 5.4).

The extent data was noisy as well. There might be an agreement between the horizontal extent data, and the data measured by Perrott & Buell [1982] in that the 20-50 Hz difference range was somewhat more likely to be judged as extended.

In order to draw any solid conclusions, many more trials would need to be performed, and more subjects would need to be tested.

                  Horizontal                                            Vertical                         
      100    2   2    2    2    2    2            100   1   1    1    2    2    1    
       90    2   2    2    2    2    2            90    2   1    1    1    1    2    
       80    2   2    2    2    2    2            80    1   1    1    1    1    2    
       70    2   2    2    2    2    2            70    1   2    1    1    1    2    
     60    2   2    2    1    2    1           60    1   1    2    1    1    1    
       50    2   1    2    2    2    2            50    2   1    2    2    1    2    
       40    2   2    2    2    2    2            40    2   2    1    2    2    1    
       30    2   2    2    2    2    2            30    1   1    1    1    1    1    
       20    2   2    2    2    2    2            20    1   1    1    1    2    1    
       10    2   2    2    2    2    2            10    1   1    1    1    1    1    
       0     1   1    1    1    1    1             0    1   1    1    1    1    1    
             5   10   15   20   25   30                 5   10   15   20   25   30   
                                                                                   

Figure 5.4

Number of Perceived Sources

                  Horizontal                                            Vertical                         
      100    C   C    C    C    C    C            100   C   E    C    C    E    C    
       90    C   C    C    E    C    C            90    C   E    C    C    C    C    
       80    E   C    C    C    C    C            80    C   C    C    C    C    C    
       70    C   C    C    E    C    C            70    C   C    C    C    C    C    
     60    C   C    C    C    C    C           60    C   C    C    C    C    C    
       50    C   C    C    E    C    E            50    E   E    C    C    E    C    
       40    C   C    E    E    C    C            40    C   C    C    C    C    C    
       30    C   C    E    C    C    E            30    C   C    C    C    C    C    
       20    C   E    E    E    E    C            20    E   E    C    C    E    C    
       10    E   C    C    C    C    C            10    C   E    E    C    E    C    
       0     C   C    C    C    C    C             0    C   C    C    C    C    C    
             5   10   15   20   25   30                 5   10   15   20   25   30   
                                                                                   

C=Compact, E=Extended

Figure 5.5

Extent of Perceived Sources