The Influence of Whole-Body Interaction on Wayfinding in Virtual Reality
by Barry Peterson

[Previous Chapter][Table of Contents][Next Chapter]


Chapter 2
Background Review of the Literature

The relevant literature is found in three major fields - human spatial navigation, virtual environment wayfinding and the defining characteristics of virtual reality itself. Research in human navigation has an extensive history, while the birth of virtual reality and the consequent investigation into virtual wayfinding have emerged from the technological advances of the past decade.

2.1 Human Wayfinding in the Real-World

2.1.1 Components of Spatial Knowledge

When we move about in the real world, most of us tend to pay little special attention to the mental processes that enable our navigation. For decades psychologists have studied the cognitive processing necessary to understand this developed skill. Typically, the development of spatial skills is described as a progression from initial declarative knowledge structures, through the formation of proceduralized structures, to more configurational representations (Golledge, 1991; Lindberg & Gärling, 1983). Most researchers in the field have labeled these representations respectively landmark, route and survey knowledge (Thorndyke & Hayes-Roth, 1982).

2.1.2 Landmark Knowledge

During our initial exposure to a new space, we learn to recognize landmarks or salient features in the environment (Golledge 1991). Properties such as the texture, shape and orientation of certain objects are stored in declarative knowledge structures, allowing us to access this knowledge (Bliss, Tidwell, & Guest, 1997; Tlauka & Wilson, 1994). For example, when arriving at a new college campus, students may learn how to identify the library, administrative and important classroom buildings.

As our experience in the new environment increases, we may learn how to identify these landmarks from new perspectives, essentially building our ability to mentally rotate them to visualize how we expect them to look from different viewpoints. However, their initial formation is linked to the perspective from which we are most familiar.

2.1.3 Route Knowledge

Declarative landmark knowledge becomes increasingly valuable as we learn to relate spatially individual landmarks to others in the environment. In so doing, we construct distance and orientation relationships that enable us to identify routes connecting landmarks. In essence, we proceduralize and build upon the declarative knowledge as we learn, thereby forming new knowledge structures in Stimulus-Response pairings or Event-Action formats (Golledge, 1991; Thorndyke & Hayes-Roth, 1982). Using our college campus example, suppose a student wants to navigate from the library to a classroom. The procedure she uses will likely be in the form, "From the library exit, I will turn left and walk two-hundred meters to Stevens Way, cross the street, turn right and walk fifty more meters to the entrance to Loew Hall."

Since navigation is usually purposeful, route knowledge is probably more valuable than landmark knowledge because it helps us accomplish desired tasks. Key features of route knowledge representations are: 1) they are learned in the context of accomplishing specific tasks (e.g. getting from the library to the classroom); 2) they are represented from the egocentric perspective (left and right turns are learned with respect to the body’s orientation and direction of travel); 3) and they are perspective-dependent, meaning that they are most useful when employed from the same viewing perspective as they are learned (usually from the ground-plane for pedestrian travel). Finally, when we are faced with the task of finding alternative routes to destinations, we rely on informal algebraic and geometric computations, based upon the directional changes and distances that describe the known routes.

2.1.4 Survey Knowledge

As we continue to gain familiarity with an environment, we can develop a more flexible, configurational representation of it (Golledge, 1991). This new structure spatially relates landmarks independently of the routes that connect them, converting the mathematical, route-defined representations to more globally-defined relationships, based upon a world coordinate system (Golledge, 1991; Lindberg & Gärling, 1983). Breaking the route knowledge dependencies on ground-based and egocentric perspectives, survey representations of a space are typically described from a "bird’s eye" viewpoint, as if the person builds a cognitive map of the environment. Rather than structuring the relationship between the college library and classroom in terms of the connecting legs of the route between them, the student may regard the spatial relationship between the buildings as, "The classroom is located about two-hundred meters, as a crow flies, to the southwest of the library." Since the knowledge is referenced by world coordinates, the ability to use survey knowledge is often referred to as one’s sense of direction.

This kind of representation can be built through two primary methods, differentiated by the perspective used during learning (Darken, 1996; Golledge, 1991). The first method occurs when spatial relationships are learned through map study, where the viewpoint is not ground-based but from an altitude above the environment. The second method is described by the continued exploration and navigation of the space from the pedestrian’s viewpoint. While both methods result in survey representations, when employed to serve wayfinding tasks, the latter method results in more robust, usable knowledge (Tlauka & Wilson, 1996). This is the method that I will refer to in the remainder of this thesis.

Conceptually, survey knowledge is modeled from the bird’s-eye perspective in the form of a cognitive map. However, more practical descriptions involve a sensation of transparency of worldly obstacles (Golledge, 1991). In this representation, spatial relationships are described with respect to the egocentric, ground-based perspective as with route knowledge. The difference is that the person can sense and communicate the direction of landmarks as if they could see through intervening buildings and obstacles.

2.1.5 Survey Knowledge Development is Desirable in Certain Cases

Oftentimes, it seems that the development of survey knowledge is not required for satisfactory and completely efficient travel; route knowledge suffices. Indeed when the person is travelling in familiar territory, and the learned routes are always accessible, route knowledge may be highly appropriate, making the development of survey representations both less valuable and less likely.

However, because survey knowledge is more flexible than route knowledge, in that its employment is not as rigidly perspective-based, it can be more valuable for certain wayfinding tasks. The practical value of survey representations is evident in cases when the wayfinding task requires the person either to find alternative routes through familiar territory, to find primary routes through unfamiliar territory or when task performance requires route optimization through familiar and unfamiliar territory. In essence, survey representations facilitate spatial inferences that can be quite useful during wayfinding through large spaces (Infield, 1991). The content of large spaces cannot be viewed from one viewpoint, while the content of small spaces can be. All of the spaces referred to in this thesis are large.

Moreover, learned routes certainly vary in complexity, and as this complexity increases so does the mental effort required to utilize the route knowledge. The route complexity increases as each of the following factors increases - the length of each component leg; the magnitudes of the directional changes required at each decision point; the quantity of connecting route legs; the cumulative directional changes required along the route (Bliss, et al., 1997). As the route’s mental effort requirement increases, a survey representation of the same space can be more useful.

So, in essence, the relative value of survey representations compared to route representations depends upon many factors; still, in many cases, survey development is desirable and will not detract from the use of route representations. The development of survey representations is most worthwhile and therefore likely under the following circumstances: 1) the learned routes are very complex, 2) the learned routes are blocked, and 3) the learned routes are sub-optimal.

2.1.6 The Development of Spatial Representation is Effortful

People do not automatically acquire survey knowledge. Researchers have found that even after years of daily navigation through environments, individuals may not develop survey representations (Moeser, 1988). In cases when the learned routes are open, optimal and simple, there is likely little need for the development of a survey representation.

Still, even in those cases where a survey representation would prove beneficial, it may never develop. One possible explanation asserts that its development is not an automatic process. Rather it requires mental effort to convert the route knowledge egocentric representations to either an exocentric, above-ground or egocentric, "translucent" representation (Golledge, 1991; Lindberg & Gärling, 1983; Wickens & Baker, 1995). The ability to make such conversions varies across individuals. Even for an individual who may possess this ability, survey representations may not be formed without the expenditure of sufficient mental effort.

Table 1 summarizes the differences in representation and requisite experience of the three types of spatial knowledge.

2.1.7 Sensory Integration During Navigation

As described previously, information used to build both route and survey knowledge gained via map study, is primarily visual. However, when considering everyday navigation through the world, one naturally integrates incoming information from senses other than our eyes (Presson & Montello, 1994). Vestibular stimulation provides information about changes in directional facing and rate of travel. Likewise, the kinesthetic/proprioceptive modality provides information about level of exertion, distance traveled and body orientation. So, though we are usually not conscious of the source of information, we naturally integrate information from these different modalities (Tlauka & Wilson, 1994).

This consideration of the sensory information sources we use when navigating reveals one more aspect of real-world navigation that again, typically, we take for granted - maneuvering. Equally important to our acquisition of spatial knowledge may be our ability to maneuver our bodies while we explore real places. The visual, auditory, vestibular, and kinesthetic modalities work in concert to provide the necessary feedback to help us maintain desired body orientation, body position, and postural stability with respect to objects and landmarks in the world. However, because these processes are developed by humans at such young ages, as adults we maneuver our bodies under control of automatic processing. As we examine virtual wayfinding, we will find that maneuvering our virtual bodies may require more attention and mental resources than in the physical world.

2.2 Wayfinding in the Virtual World

While interacting in virtual environments, one’s ability to maintain spatial awareness and travel purposefully between locations is both implicit to and fundamental for successful task performance, regardless of the other inherent task requirements (Satalich 1995; Slater, Usoh, & Steed, 1995; Wickens & Baker, 1995). Although maneuvering and navigation represent core interactions, today’s virtual explorers are usually forced to grapple with techniques and interfaces that make these interactions largely non-intuitive and oftentimes frustrating (Darken, 1996). The term "wayfinding" involves both the navigation and maneuver skills required for purposeful virtual travel. Typically, the input and output properties of the human-computer interface limit successful wayfinding in two basic ways: first, the relatively low quality of the sensory feedback provided by the interface’s outputs (e.g. low resolution visual scenes) is not as rich or complete as that which we rely upon in reality, thereby disrupting the natural processes we use for navigation; second, the interface requires the human to learn and use interaction techniques that are less natural than those we have already developed from birth by maneuvering in the real world.

Although Virtual Reality was born with these limitations, its optimistic parents have held a vision of the infant growing with technological advances into a mature, limitless medium complete with the interfaces that provide sensorally rich environments and natural, intuitive interaction techniques. Across the industry, a relatively large chunk of research resources is devoted to investigating virtual wayfinding. For the most part researchers pursue one of two major research agendas. First, making use of the currently available interfaces, many projects have focused on the assessment of the training transfer of spatial knowledge acquired virtually to performance in the real world. Experiments performed by Regian, Shebilske, and Monk (1992), and Ruddle, Payne, and Jones (1997) are representative of this agenda. Second, some experiments, such as those conducted by Darken (1996) and Satalich (1995), have investigated the merit of introducing different enhancements to the interfaces for improving wayfinding success. Common enhancements are the access to a virtual map of the world, the overlay of orthogonal gridlines onto the virtual ground-plane, and the display of directional arrows in the virtual space. Two plausible reasons may explain why most such enhancements are mediated through the visual modality: 1. Of all the human sense modalities, the visual is stimulated most easily and vividly by the current interfaces. 2. The visual sense is the most dominant and useful in both real-world and virtual wayfinding.

2.2.1 Experimental Methods

These two dominant research agendas -- transfer of training and visual enhancement -- tend to make use of experimental conditions that share a set of common characteristics. Most experiments use interfaces that are widely available in most Virtual Reality laboratories for two reasons. Such an approach increases the generalizability of experimental results, and the cost of developing unique interface devices can be high. Because the behavior of interest is the aggregated, holistic ability to wayfind, most experimental tasks and performance measures are selected based upon their sensitivity to demonstrated composite performance rather than their sensitivity to the individual contributions of maneuvering, landmark, route, and survey knowledge. For various experimental and practical reasons, the spatial layout of the experimental virtual environments are relatively simple. Environments, such as those used in experiments conducted by Ruddle, Randall, Payne, and Jones (1996) and Regian et al. (1995) usually model building interiors, where hallways, walls and doors naturally channel travel; intersections are often located so that paths cross one another perpendicularly, so that turns are ninety degrees. These experimental interfaces commonly stimulate the visual, and may stimulate the auditory senses; whereas, mainly due to technological constraints, stimulation of kinesthetic, haptic or vestibular senses is rare.

Although most of the research programs use standard virtual interface devices, there is little standardization of experimental methods. The user’s experience in virtual environments varies greatly depending upon whether the system is desktop or immersive. Ruddle, Randall, et al. differentiate the two experiences based upon the display and devices used:

Desk-top VE’s typically use an abstract interface (e.g., mouse and keyboard), where changes in view direction are produced by movements of the mouse. When using an HMD a user’s physical head and body movements are recorded by tracking devices (e.g. magnetic sensors) and, therefore, users are provided with kinaesthetic feedback for changes in their view direction (1996).

 

However, the distinction between the two is not always clearly specified by the individual researchers. Equally nebulous is the appropriateness of generalizing results found in a desktop study to an immersive condition. Even within these broad categories, interaction techniques vary widely and are largely not accounted for by the literature.

2.2.2 Performance Measures

Two categories of performance measures have been adopted by most researchers within the field. For the assessment of route knowledge, many experiments, including recent work by Bliss, et al. (1997) and Ruddle, Randall, et al. (1996), use route replication tasks. The protocol and evaluation for these tasks are implemented in standardized ways. After a period of route learning conducted in the virtual environment, participants are instructed to repeat the prescribed route as accurately and quickly as possible. Depending upon the design of the specific experiment, this replication is performed either in the physical environment, upon which the virtual environment was modeled, or in the virtual environment itself. Witmer, Bailey, Knerr and Parsons (1996) have assessed route accuracy using the following measures: (a) the number of correct turns or successful decision points crossed during replication performance, (b) whether the person became lost, (c) verbal recollections, and (d) route traversal times.

The assessment of survey knowledge or sense of direction is often derived from a "point-to-unseen-target" task, as used in real-world spatial experimentation by Sholl (1987) and Infield (1991). The virtual equivalent of this task has recently been used by Tlauka and Wilson(1996), Ruddle, Randall, et al., (1996), and Neale (1997). Following a familiarization period with the virtual environment, participants are tasked with pointing to various targets that are not visible from the current viewpoint. Angular deviations of the pointing estimates from the true azimuth provide data necessary to evaluate the accuracy of the participant’s sense of direction. Neale has shown that environmental conditions of low visual navigational information and high difficulty yield a greater degree of error in task performance (1997). Low visual navigational information means that there are few distinctive landmarks in the world, and no visual navigational aids such as a compass or directional arrows. High difficulty relates to the spatial layout of the virtual world and the complexity of the path to be learned.

2.2.3 Wayfinding Task Difficulty

The aggregate difficulty of the wayfinding task is the sum of the contributions made by the input and output properties of the interface, and their influence on each component - maneuver, landmark recognition, route complexity and survey knowledge. If the interface demands a high level of mental effort for each component, then the sum difficulty for wayfinding will be extremely difficult.

For maneuvering, relative difficulty is determined by the amount of mental effort required to translate user intentions into computer inputs. When the mapping of input to action requires conscious mental effort to adequately control one’s virtual position, then maneuvering becomes more difficult.

Likewise, maneuvering is likely to be impacted by the output properties of the visual display, that are a function of both the display quality and the visibility of the landmarks as designed into the environment itself. For example, if the objects in the world are designed to all share a common appearance, then landmarks will be more difficult to recognize; if objects are distinctive and unique, then this increased salience will make them more useful as landmarks. Similarly, visual displays with poor or low resolution can degrade the distinctiveness of even those landmarks that are designed to be easily recognized.

Route memorization requires an interaction between working and long-term memory. A route with seven legs will be more difficult to remember than one with three. Within a virtual world, there may be numerous paths connecting two given landmarks. Paths that are more direct, in that they require fewer direction changes, each with smaller magnitudes, will be easier to learn and remember than those with greater changes. Therefore, under conditions where the participant is required to learn a prescribed route, the experimenter can vary the task complexity by varying both the length and directness of the given route.

2.3 Multi-Sensory Nature of Virtual Reality

A virtual reality experience is characterized by computer-generated, three-dimensional environments in which humans interact with the virtual objects in ways that are increasingly intuitive and natural (MacKenzie, 1995; Poupyrev, Weghorst, Billinghurst, & Ichikawa, 1997; Vaananen & Bohm, 1993). The computer presents outputs through devices that display the world through mainly visual and auditory modalities, with the goal of exclusively directing the human’s sensory arrays away from the physical and into the synthetic environment. Ideally, the devices accept inputs from the human via interaction techniques that reduce the mental overhead associated with communicating one’s intentions to the simulation engine (MacKenzie 1995; Steuer, 1992).

By its own definition, the input interfaces are central to the differentiation of virtual reality from its predecessor media, such as multimedia. Ideally, the interactions should be intuitive, natural, and easy to learn; once learned, they should require low attentional demands (Wexelblat, 1995). Rather than detract from the experience, the design of the input devices and their respective interaction techniques may assist in task performance (MacKinlay, Card, & Robertson 1991). One possible way to improve performance is by accepting inputs from the human across multiple psychomotor and sensory modalities.

2.3.1 Natural Interfaces

Jacob and Silbert (1992) have proposed that different interface devices may be best for certain tasks, while other devices better for other tasks. For example a mouse and keyboard are appropriate devices for editing a manuscript. However, these same devices are less appropriate for maneuvering your spaceship through an asteroid field while shooting down imperial battle cruisers. So, the property of appropriateness is defined by considering both the interface and the task to be performed. When the task to be performed is one that we are accustomed to performing in the real world, without computer mediation, an appropriate interface is often one that allows us to interact with the computer naturally or as we have learned to in the real world. Thus, a joystick is appropriate for piloting a virtual aircraft because it allows interaction techniques that are similar to those required to fly an actual aircraft.

The term "natural" is often used in the literature to describe virtual interaction techniques that are similar to those used in the normal world. Although the interface itself never actually disappears, ideally it becomes transparent through its naturalization. Ideally, the human is unburdened with the need to translate her intentions from the mind to a command sequence, phrased in computer-understandable syntax (Furness & Barfield, 1995; Wexelblat, 1995). In effect, the interface reverses the process so that the computer understands the human’s intentions from natural gestures and behavior (Lampton et al., 1994; Wickens & Baker, 1995).

The consequences of such natural interaction manifest in reduced cognitive load and more complete immersion in the virtual experience. The reduction in mental workload frees cognitive resources for meaningful interaction with the virtual content (Gopher & Donchin, 1986), as shown graphically in Figure 1. Increased immersion also leads to a more convincing illusion that the human is an active participant in the environment rather than a passive observer of it (Furness & Barfield, 1995). Improved virtual task performance is one proposed benefit derived from meaningful interaction and felt presence (Barfield, Zeltzer, Sheridan, & Slater, 1995; Burdea and Coiffet, 1994).

 

2.3.2 Intersensory Integration

Although integration of multiple sensory/psychomotor modalities has inherent appeal, there are also more formal precedents for the motivation to design harmonious, natural connections between the human and the computer. One such motivation is intersensory integration: the natural human ability to perceive the environment through multiple senses and unconsciously combine this sensory data to form one integrated message. The human sensory apparatus is tuned to integrate information incoming from several sense modalities to create a single, coherent sensation (Welch & Warren, 1986). Intersensory interaction research has found that when environmental information feedback is limited to only one or two sensory modalities, interpretations of environmental conditions can be misleading and inaccurate, thereby increasing the complexity of otherwise simple tasks (Sherrick & Cholewiak, 1986; Welch & Warren, 1986). Even in those situations in which the same environmental status may be channeled across multiple modalities, this redundancy is not wasteful (Sherrick & Cholewiak, 1986). Therefore, when computer-mediated interactions strip away information, providing stimulation of only one or two modalities, they do not provide this "complete" expectant and redundant sensory feedback, thereby causing the human to be handicapped and less able to maintain situational awareness (Durlach & Mavor, 1995). Figure 2 illustrates how sensory redundancy can relieve the burden of monitoring all wayfinding information, that might otherwise be dumped onto the visual system by sensorally limited interfaces.

 

2.3.3 Kinesthetic and Vestibular Stimulation and Interfaces

The input and output transducers and effectors within the computer interface define which human modalities are involved in the interactions within the virtual environment. Surely the quality of stimulation offered by current interfaces such as a keyboard and mouse -- when the finger feels a depressed button rise or when the wrist senses one hand’s position atop a mouse pad -- is less than that envisioned by researchers or expected by virtual explorers. Output devices may one day provide dynamic force feedback, but current virtual interfaces limit kinesthetic involvement.

Fortunately, position tracking technology has advanced both the quality and quantity of transducing movement inputs for controlling interactions within virtual environments. Wearing a glove with embedded position transducers, participants can grab and manipulate small handheld objects. Similarly, head-tracking sensors measure gaze direction during head movement. A navigational technique that maps this gaze direction to one’s direction of travel in the virtual world is also somewhat natural because in the real-world we usually look in the same direction as we walk. However, coupling gaze and travel direction also prohibits one’s ability to travel in a given direction, while concurrently scanning the virtual environment to the left and right of the desired direction of travel. Although we may not use a lot of head movement when scanning the real-world, we do not expect our direction of travel to be involuntarily coupled to our direction of gaze. When the virtual environment couples gaze and travel direction, it may force us to suppress our head movements.

More generally, interaction techniques that actively involve the kinesthetic and vestibular modalities are not well developed. There are a few exceptions. Burdea has pioneered the development of haptics and force feedback in hand-manipulator interfaces, primarily because the hands contain the highest density of haptic sensors in the body (Burdea, 1996). Furthermore, technology costs may limit the inclusion of other limbs,

Researchers led by Bowman have recognized and are exploring the influence of interaction technique on our ability to travel, or the "…control of user viewpoint motion through a VE…" In the present context, their research focuses on the maneuvering component of wayfinding. One central principle from Bowman’s theory is that a given interaction technique can be described by how well it promotes a set of quality factors: these are speed, accuracy, spatial awareness, ease of learning, ease of use, information gathering and presence. Certain quality factors are more vital for performance of different virtual tasks, so a given technique can be matched to task through the application of these quality factors (Bowman, D., Koller, D., & Hodges, L., 1997). Most relevant to this thesis is the fact that Bowman is actively investigating the influence of the interface on maneuverability.

2.3.4 Locomotion Interfaces

Slater proposes the concept of "body-centered" interaction. Observing that common virtual interfaces overload the human hands (e.g. the hands must be used to "walk" in virtual space), his work seeks to extend kinesthetic involvement to free the hands from navigational control, so that the hands can control other more natural manual manipulations. Walking is of special interest because it is the way we normally move through space (Slater & Usoh, 1994; Slater, et al., 1995). To date, few full-body locomotion devices have been produced (Durlach and Mavor, 1995). Apparatuses such as treadmills promise natural involvement of most of the integrated modalities, and provide the necessary constraint of physical motion to a limited working volume. Traditional treadmills allow travel along one axis of translation; however recent work has produced treadmill-type designs that allow travel in both the x and y axes (Carmein, 1996; Jones, 1996).

2.3.5 Sufficient Motion

One possible solution for both limiting the physical volume and activating the kinesthetic and vestibular systems is represented by interfaces that provide "sufficient motion" (Wells, Peterson, & Aten, 1996). The concept behind sufficient motion is that it may be possible to involve the kinesthetic and vestibular senses to a degree that self-motion cues are sent to the brain through the initiation of the appropriate muscular contractions and translational and rotational accelerations (Durlach and Mavor, 1995; Lampton et al., 1994).

If individuals produce the locomotory movements that normally propel them through space (but without actually displacing), the associated muscular, joint, and tactile feedback, as well as efferent signals, lead to an experience of self-motion. Even if no locomotor movements are made, patterns of cutaneous or muscular afference that are normally associated with movement through the environment can induce apparent self-motion (Durlach & Mavor, 1995).

 

Locomotion involves a finite set of behaviors, such as walking forward, backing up, veering to the left and right and pivoting in place (Ko, 1995). If the inputs to control each of these are mapped to behavior which is sufficient to stimulate the kinesthetic and vestibular systems, one may expect more convincing and natural interactions mappings (Durlach & Mavor, 1995). The Virtual Motion Controller (VMC) represents one means of providing sufficient motion; I will describe the VMC in more detail in Chapter 3.

2.4 Summary

There are three themes that emerge from this review of the literature: (a) humans use sensory integration to make sense of the environment, (b) mental workload is affected by the human-computer interface, and (c) ideal virtual interfaces incorporate multiple modalities. It is likely that the acquisition of spatial knowledge will be greatly affected by the nature of the interaction with a virtual environment, especially if such interaction is limited to only visual and or acoustic modalities. Humans desire to naturally integrate feedback across multiple modalities, and this gestalt is important to wayfinding skill. Additionally, it is expected that the benefit of natural interfaces involving multiple modalities may reduce the cognitive overhead associated with these interactions, freeing mental resources to be used more effectively in developing survey knowledge. Finally, our attempts to develop full-sensory interfaces may lead to novel ways to improve wayfinding performance that serve to complement the more common visual enhancements, such as maps, landmark enhancements and directional aids (Satalich, 1995).


Human Interface Technology Laboratory