An Exploration of Virtual Auditory Shape Perception
![]()
![]()
![]()
This was my first attempt at auditory shape display (before I had done most of the background research). As such I made some naïve assumptions about auditory shape. Nonetheless, some observations from this exploration are worth relating . The following account is adapted from previous work.
In the process of developing the software for this experiment, several issues were identified as worthy of future study. The Convolvotron(TM) is capable of updating the position of a sound source up to fifty times per second. There is also an interpolation effect generated by cross-fading between successive positions. While this update rate is generally more than acceptable for localization purposes, it is not at all clear that this rate is fast enough to adequately define a volume.
To get some sense of the problem, I made visual representations of the sound source's position. I noticed that, even in the visual modality, it was difficult to recognize the shape of a volume with fewer than a three-thousand points. At fifty positions per second, a full minute would be required to visit that number of points. The graphical representation displayed all of the points simultaneously which would not be the case for the auditory stimuli. The listener would need to rely on memory, or some degree of "persistence of hearing" to build up an image. I have not found any references to persistence of hearing in the literature.
Since volumes seemed to be a problem, as a potential fix, some of the sound shapes presented were shells instead of volumes. Visual inspection of shells indicated that only three seconds worth of points (150) might do.
Paths and Spatial Ambiguities
# Description Interpolatio
n
0 Tall Ellipsoidal Shell, Major Axis 6", Minor Axes 3" No
1 Tall (z-axis) Cylindrical Shell, R = 0.75", Length = 6" No
2 Oblate Spherical Shell, R = 3", Compressed Vertically By 1/2 No
3 Spherical Shell, R = 3" No
4 Cubic Volume, 6"x6"x6" No
5 Half Height Cubic Volume, 6"x6"x3" No
6 Square, Facing Front/Back, (Normal to y-axis), 6"x6" No
7 Circular Path, Axis Vertical, R=3" Yes
8 Spherical Shell, R = 3" Yes
9 Square, Facing Up/Down, (Normal to z-axis), 6"x6" No
10 Same as #3 No
11 Wide (x-axis) Cylindrical Shell, R = 0.75", Length = 6" No
12 Deep (y-axis) , Cylindrical Shell, R = 0.75", Length = 6" No
13 Edge-on Square, (Normal to x-axis), 6"x6" No
14 Tall (z-axis) Cylindrical Shell, R = 0.75", Length = 6" Yes
15 Cubic Volume, 6"x6"x6" Yes
16 Square, Facing Front/Back, (Normal to y-axis), 6"x6" Yes
17 Square, Horizontal, (Normal to z-axis), 6"x6" Yes
18 Same as #8 Yes
19 Point No
20 Deep (y-axis) Cylindrical Shell, R = 0.75", Length = 6" Yes
21 Square, Edge-On, (Normal to x-axis), 6"x6" Yes
22 Horizontal Loop, (Normal to z-axis), R = 3" Yes
23 Edge-On Loop, (Normal to x-axis), R = 3" Yes
24 Point Yes
25 Wide (x-axis) Cylindrical Shell, R = 0.75", Length = 6" Yes
As stated before, the Convolvotron(TM) interpolates between positions. The effect is similar to connecting successive points by lines. In the visual domain I found the lines confused the graphical image of the shape. In an attempt to alleviate that potential problem, I set some of the stimuli such that there was no interpolation by turning the sound off between successive points. This "fix" unfortunately came at the expense of update rate, because turning the sound off requires a full cycle.
Another question suggested by these considerations is this: Is there any correlation at all between what is confusing to the eyes, and what is confusing to the ears?
The subjects were prompted by the computer: the following message appeared on the screen:
"I am going to play some sounds behind your head.
These sounds are special in that they have shapes.
Please draw a picture of each sound and label the
picture with the number that is given.
You may spend as much time on each sound as you like.
There will be 26 sounds."
The sounds were presented as if from behind the subjects in order to minimize front-back confusion, as invisible sound sources often default to the back [Wenzel et al. 1993]. The sounds were presented in a random order. I allowed the subjects to listen to each shape as long as they desired. They hit a key on the keyboard to start and stop each sound.
Subjects drew their pictures on sheets of paper with boxes for the drawings.
As one might expect, there was a great variety of pictures drawn. Some of these pictures fit the theoretical shapes extremely well (see figure 5.1).
Figure 5.1
"Good" Drawings
The smooth paths frequently inspired drawings that looked like circles, although often the axis did not match that of the presented sound.
There were also quite a few drawings that looked nothing at all like the intended shapes (see figure 5.2).
Figure 5.2
"Bad" Drawings
There also seems to be an effect produced in a few cases where the perceived sounds were collapsed along the depth and height dimensions, and thus a shell would be collapsed to two points on either side of the head. Since the altitude and depth cues in hearing are less pronounced than are the cues for the horizontal plane, this effect is not surprising.
The most prominent effects seemed to be disruption due to the removal of interpolation. Turning the sound on and off produced an irregular pattern that seemed to overwhelm the subjects, dominating their drawings.
Although the instructions explicitly stated that the quality of interest was the shape of the sounds, several of the subjects tended to draw pictures of what made the sounds. Many drawings were pictures of white-noise generating objects: rain, sprinklers, clogged shower-heads, steam-driven machinery, or even sneezes. The irregular patterns produced by the de-interpolation process also appeared in many drawings in the form of breaks or blotches. This suggests that the nature of the sound stimulus may be more important to spatial studies than is generally believed. Most studies of spatial hearing use white noise due to its spectral richness. Perhaps this is not the best procedure because many "white-noise-like" sounds in our natural environment are ambient sounds like rain. It may be worth while to attempt spatial hearing studies with more intrinsically object-like stimuli.
The drawings I found most intriguing were those with highly metaphorical in content. These both described the shape and apparent nature of the sounds (see figure 5,3).
Figure 5.3
"Ugly" Drawings
I used the Crystal River Convolvotron(TM) to virtually spatialize two sound sources which were generated by a 16-bit digital sound card. The stimuli were presented over headphones.
The sound sources were 12-partial harmonic complexes with fundamental frequencies ranging from 1000 Hz to 1100 Hz.
Imagine if you will that there are one or more sound sources on the white card. These sources may be of varying types and may come from specific points or from extended areas. For each sound you hear, your task is to determine if there are one or two sources, and if the sources are compact, coming from a specific position, or extended, spread out over a region.
In each trial I picked a stimulus combination completely randomized across all variables, and manually set the offset parameters. The subject responded: 1 or 2 sources, and "compact" or "extended".
The other subject fared somewhat better. Unfortunately the data is still relatively ambiguous. In the horizontal condition, two distinct sources were generally reported when there was a frequency difference (with a small number of exceptions). I can say little about the vertical source data other than the subject tended to more often perceive two sources in the middle frequency differences. (See figure 5.4).
The extent data was noisy as well. There might be an agreement between the horizontal extent data, and the data measured by Perrott & Buell [1982] in that the 20-50 Hz difference range was somewhat more likely to be judged as extended.
In order to draw any solid conclusions, many more trials would need to be performed, and more subjects would need to be tested.
Horizontal Vertical
100 2 2 2 2 2 2 100 1 1 1 2 2 1
90 2 2 2 2 2 2 90 2 1 1 1 1 2
80 2 2 2 2 2 2 80 1 1 1 1 1 2
70 2 2 2 2 2 2 70 1 2 1 1 1 2
60 2 2 2 1 2 1
60 1 1 2 1 1 1
50 2 1 2 2 2 2 50 2 1 2 2 1 2
40 2 2 2 2 2 2 40 2 2 1 2 2 1
30 2 2 2 2 2 2 30 1 1 1 1 1 1
20 2 2 2 2 2 2 20 1 1 1 1 2 1
10 2 2 2 2 2 2 10 1 1 1 1 1 1
0 1 1 1 1 1 1 0 1 1 1 1 1 1
5 10 15 20 25 30 5 10 15 20 25 30
Figure 5.4Number of Perceived Sources
Horizontal Vertical
100 C C C C C C 100 C E C C E C
90 C C C E C C 90 C E C C C C
80 E C C C C C 80 C C C C C C
70 C C C E C C 70 C C C C C C
60 C C C C C C
60 C C C C C C
50 C C C E C E 50 E E C C E C
40 C C E E C C 40 C C C C C C
30 C C E C C E 30 C C C C C C
20 C E E E E C 20 E E C C E C
10 E C C C C C 10 C E E C E C
0 C C C C C C 0 C C C C C C
5 10 15 20 25 30 5 10 15 20 25 30
C=Compact,
E=ExtendedFigure 5.5
Extent of Perceived Sources