An Exploration of Virtual Auditory Shape Perception

[Previous Chapter][Table of Contents][Next Chapter]


4. Exploration

In order to enable shape perception in audition, it is necessary to consider competing perceptual factors. The grouping mechanisms in audition sort the complex acoustic environment into discrete objects. In this way, pervasive processes such as streaming and the precedence effect are partially at odds with the idea of spatially extended objects. However, there are also diverse mechanisms in audition that are sensitive to the perception of change, density, and distribution along the various perceptual continua. When sources are smoothly distributed in frequency, Perrott [1984], and von Bekesy [1960] have observed that it is possible to merge two sources into one stream while retaining some of the spatial properties of each. When sources are smoothly distributed in time, Ruff & Perret [1976], and Lakatos [1993a] have shown that spatial pattern recognition is possible in audition. What I hoped to show in the course of my exploration was that people can also perform virtual auditory-spatial pattern recognition.

Virtual environment technology provides a fluid mechanism for controlling these perceptual continua. Since auditory shape perception is an almost entirely unexplored domain, and virtual auditory shape perception is even less so, I elected to examine it by diving in head first: I implemented a few likely techniques, and then evaluated their efficacy. In the process, I identified some important issues, and examined a few of them in more detail. This was a pioneering effort. I did not necessarily expect to discover the "answer" (although one can always hope). What I did accomplish was some charting of the territory which will enable future expeditions.

The approach that I have taken in the synthesis of auditory shape perception has been to distribute sounds simultaneously over two perceptual continua: I distribute them in space, obviously, in order to convey the shape information itself, and I distribute them over either time or frequency in order to protect the spatial information from the scourge of grouping phenomena.

To begin my exploration I conducted two preliminary investigations. I used these to try out some ideas and experimental techniques. Next I conducted a screening experiment that helped to insure I would have subjects that were compatible with the virtual sound system I was using. Finally I ran a series of four experiments in an exploration of two specific types of virtual auditory shape display. These addressed several display techniques and a host of new issues. I will briefly outline these issues and experiments in this chapter before describing each in detail in later chapters.

Figure 4.1

Experiment Space

4.1. Preliminary Investigation: Sketching Sounds

Early in my studies I performed an experiment where I had listeners draw the shape of what they heard. I examined two different presentation techniques this way, and also observed what happens when the mode of response is unconstrained.

4.1.1. Spatial Ambiguity

The first display technique that I examined in this way was "spatial ambiguity". The sound source was moved quickly and randomly within a shaped region. I examined both smooth trajectories, and discontinuous motion.

4.1.2. Simple Smooth Paths

The second technique that I tested with drawings was the use of simple smooth paths for sound sources. Since other experimenters had tried using auditory apparent motion, I was curious to see what would happen with smoother, more continuous motion.

4.2. Pilot Study: Virtual Auditory Extent

This was an informal test that served as a pilot experiment and a guide to the design of the final experimental series. In this test I briefly examined the trade off between spatial separation and frequency separation in auditory extent. I also unsuccessfully attempted to observe vertical extents.

4.2.1. Vertical Extents

In the literature on extended audio images, only horizontal extent has been observed. Vertical extents will be necessary in order to build complex shapes.

Perhaps the reason that vertical extent has not been observed is that only sine tones and broadband noise were used. Sine tones are extremely ambiguous in vertical localization. White noise, although good for localization, does not reside on the frequency continuum. Complex tones are more localizable, and have fundamental frequencies. The use of complex tones might enhance horizontal extent, and enable vertical extent.

4.3. Experiment Series

Using the information that I gathered from the previous experiments, I designed a series of more formal experiments that I ran in two sessions: first a screening experiment with 21 subjects, and then a series of four experiments run in one sitting using seven of those subjects. The apparatus and system I used are described in chapter 6.

4.3.1. Screening Experiment

Since I would be asking my subjects to perform a task that is known to be quite difficult, and I would be using in this task, a virtual sound system that is not tailored to these subjects, I decided it would be necessary to fit my subjects to the system. I ran a screening experiment that weeded out the subjects to whom the system did not provide adequate spatial information.

The largest errors inherent to virtual auditory displays take the form of front-to-back and top-to-bottom confusions. The frequency of these errors appear to vary greatly from person to person. Since the experiments in this series were situated in the forward hemisphere, front-to-back confusions would not be much of an issue. Vertical confusions could present a major problem to shape recognition experiments. Accordingly, I chose the frequency of vertical confusions as the metric for the match between the subjects and the virtual spatialization system. Subjects who exhibited a vertical confusion frequency of greater than 30% were not used in later experiments.

4.3.2. Virtual Concurrent Acuity

In order to ascertain the level of detail possible with a simultaneous shape display technique (the virtual auditory field display) I would need to know subject's acuity with concurrently presented sources. In this experiment I attempted to measure the horizontal and vertical concurrent minimum audible angle for the stimuli that I would use in the shape displays.

4.3.2.1. Concurrent Vertical Acuity

To my knowledge, measurements of concurrent vertical acuity have not been published. The published studies of concurrent acuity have generally used sinusoidal stimuli, which is a poor choice for vertical localization. The use of non-sinusoidal stimuli may make this measurement possible.

4.3.3. Virtual Auditory Vector Display

I conducted two experiments on the presentation of auditory shape information by sequential positioning of sounds using a "virtual speaker array". I call these "virtual auditory vector displays" because they are analogous to vector graphic displays.

Speaker arrays have been successfully used in auditory shape identification experiments, but never before have the experiments used virtual sound sources.

4.3.3.1. Auditory Shape Perception vs. Auditory Driven Visualization

The speaker matrix experiments of Ruff & Perret [1976], and later of Lakatos [1993], occurred on significant time scales and were quite difficult. The Audio Raster display of Karr & Furness required considerable effort in order to recognize the letters it displayed. These factors make one wonder if shapes thus displayed are an example of auditory shape perception, or some cognitively mediated "auditory driven visualization". I examine this question in the virtual auditory vector experiments.

4.3.3.2. Virtual Validation

Virtual audio spatialization techniques are far from perfect. One of the issues that I believe is quite critical is whether virtual audio technology is appropriate for testing auditory shape perception. In these experiments I also address this issue.

4.3.4. Auditory Field Display

In their observations of stereophonic arrays, Perrott and von Békésy used sine tones and white noise. These were successful for horizontal extents, but largely ineffective for vertical extents.