Exploring the Influence of a Virtual Body on Spatial Awareness

by Mark Draper

[Previous Chapter][Table of Contents][Next Chapter]

CHAPTER 2: LITERATURE REVIEW

The following literature review is composed of four major sections: VBs, spatial awareness, the visual periphery, and ecological psychology. Although every attempt was made to separate the research into these categories, there remain numerous areas of overlap between the sections. VB research details work performed to date in the area of VBs and self-representation. The spatial awareness section defines spatial awareness and relates it to SA. This section also presents and details various measures of spatial awareness. The section on the visual periphery is included since in most instances, the visual image of the VB will fall into this area of the retina and be subject to its associated processing. Lastly a section on ecological psychology is included. This section offers an alternative view of visual space perception to that described in the spatial awareness section (mostly a psychophysical approach) and the visual periphery section (a physiological and psychophysical approach). While framed in many high-level assumptions, the ecological psychologists' view does provide a unique look at the possible influence of one's own body image on visual perception. Also, it is often the case that "the various approaches to visual space perception are as much complementary as they are opposed, and that any given research effort may weave together several lines of approach" (Sedgwick, 1986).

To relate research from these areas to this thesis topic, a `logic' diagram was developed (Figure 1). This diagram captures some of the essential findings from each area that, when combined, create a flow of logic that serves as the foundation for why this thesis was pursued.

Figure 1: Thesis Logic Diagram2.1 VBs:

The difference between VR and computer graphics is that VR is inhabited. Participants can be immersed into a virtual world to explore data, navigate paths, or perform tasks. When entering this new virtual environment, is it necessary to carry a

representation of our bodies with us? The answer will be based upon a cost/benefit tradeoff regarding the existence of a VB for the particular task at hand. Below is a sampling of the literature regarding VB representation and potential performance benefits.

2.1.1 What defines a VB?

One researcher (Lasko-Harvill, 1992) looked at the issue from the standpoint of a virtual identity. She found that in multiple-person worlds, the face (and especially the eyes) were important to identity. In single-person worlds, the hand seems to be the primary entity onto which identity is projected. Identification with the hand begins at infancy because it is an exploration tool that can be observed in action. In fact, the hand is observed first and only later does the child learn that it is controllable. Lasko-Harvill theorized that people entering VR for the first time will also first observe their virtual hand. Only after this short observation process of witnessing a 'floating computer hand' will they discover its controllability through experimentation of movements.

Although a virtual hand has been used as the sole means of representing the self in many VR environments, Slater and Usoh (1993b) configured a more complete VB for their studies. The subject's hand would hold a three-dimensional (3D) mouse input device, which allowed this hand to be virtually represented in VR. Thumb and first finger activation of 3D mouse buttons were also reflected in the virtual environment (VE) as equivalent movements by the virtual thumb and first finger. The virtual hand was attached to a virtual arm which could be bent and twisted in response to similar movements of the real arm and wrist. The arm was connected to a body with legs and left arm. These elements came into view when the subject reached for something or looked in the vicinity of the VB. If the subject rotated his/her head by more then 60 degrees, the VB would be reoriented accordingly (this could result in a mismatch with the real body, depending on whether or not the subject's real body rotated as well).

Norman I. Badler and associates at the University of Pennsylvania (Badler, Phillips, & Webber, 1993) developed a commercial human modeling platform entitled Jack. They have used this software to create a fairly realistic VB, closely approximating the human operator's position and posture by using trackers placed on the operator's palms, waist, and base of neck. However, the need to manage the required four or five trackers (the fifth being for gaze control) combined with the computational time for the necessary inverse kinematics routines presents potential serious time-lag problems for certain applications. Pratt, et al. (1994) also used Jack to insert a fully articulated human figure into a networked VE for simulating military situations.

Many other VB configurations currently exist, from simplistic-static to complex-dynamic, and there are many more applications that avoid the inclusion of a VB altogether. Looking to the future, McLellan's (1994) varied descriptions of avatars (virtual agents that represent the user) provide a glimpse of what tomorrow's VBs may entail. However, there has been no identified research quantifying a minimal VB configuration for VR applications.

In some cases, a VB may not even have to be 'virtual'. In their work with the Cave Automatic Virtual Environment (CAVE), Cruz-Neira, Sandin, Defanti, Kenyon, and Hart (1993) have developed a new VR interface that consists of a physical room whose walls, ceiling, and floor surround the viewer with projected images. They claim as a CAVE benefit the ability to implicitly represent the body. It appears physically as it does in the real world; it does not have to be rendered. Drawbacks to this design include the inability to alter the body shape, form, or configuration, and the possible occlusion of virtual objects, even those that are supposed to be 'closer' to the viewer than the viewer's body.

2.1.2 Current VB Research

Very little performance research involving VBs has been done to date. Lasko-Harvill (1992) asserts that a VB may become unnecessary or at least secondary when one is exploring highly abstract virtual worlds (i.e., more then three dimensions) that exist in some scientific visualization applications. Other researchers (Cruz-Neira, et al., 1993) state that interactions in a VE often require a visual representation of the viewer's body, particularly the hands. Slater and Usoh (1993c) claim that a VB is of particular importance in an architectural walk-through context but they did not support this assertion with specific experimental results.

The existence of a VB has been shown to have a measurable effect on a participant's sense of presence. According to Slater and Usoh (1993a), presence reflects a user's perception of "being somewhere"; the extent to which the VE becomes more "real" then physical reality. A factor that contributes to presence is the self-representation of the participant, that is the participant's VB. The logical connection is as follows: if a body is in a certain location and if a person has a certain association with that body, it is likely that this person will believe that he/she is in that location. Slater has empirical evidence to support this assertion (Slater and Usoh, 1993c; Slater and Usoh, 1994) along with research that presents a more complex relationship between VBs and presence (Slater and Usoh, 1993b). Slater feels that to strengthen its link to presence, the VB should be similar in appearance to the participant's own body, respond correctly, and be seen to correlate with the movements of the participant.

In a study by Slater and Usoh (1994) to test issues regarding presence, subjects were given a VB as described earlier. As a small part of the experiment, Slater and Usoh slaved virtual left-hand movements to mirror the actions of the system-tracked right-hand. Along with finding an increased sense of presence in all conditions involving the VB, they found 4 of 23 subjects actually matched real left-hand movements to the slaved VB left hand/arm actions. This indicates that the association to the VB was so strong for some participants that they felt a need to alter their real body to conform to their VB.

Slater and Usoh (1993c) also discussed the current field-of-view (FOV) limitation of today's VR systems as it relates to the presentation of a VB. Current VR technology constraints create a trade-off between the resolution of a visual image and the FOV of that image. Increases in image resolution require a decrease in FOV and vice-versa, due to computing power and visual-display limitations. Since recognition of one's VB is usually in the extreme FOV through the use of the visual peripheral system, a VR participant using today's technology may not be visually aware of his/her VB. The potential influences of the VB may then be lessened or negated by this restricted FOV. Slater's solution for his studies were to have subjects look all around at first so they could view their VB. He is also researching alternative graphic rendering techniques to expand the instantaneous FOV (IFOV) (Slater and Usoh, 1993c).

2.1.3 VB Interface Devices/Methods

There are several types of interface devices that sense and transform movements from the physical body into corresponding movements of the VB. The most common VB device is the virtual hand controller systems, such as the VPL DataGlove, Mattel's Power Glove, and the Virtex Cyberglove (Kalawsky, 1993). These gloves sense hand and finger movements to allow for corresponding movements by a virtual hand. If gross movements/orientation of arm and/or leg movement is required, sensor/receiver trackers and custom software could be a solution (i.e., using Badler's approach). Sensor/receiver trackers are systems to sense and transmit changes in physical body position of a VR participant, allowing for a corresponding change to occur in the VE. A complete VB representation could be accomplished by having the participant wear a body suit, such as VPL's DataSuit. This suit contains many sensors to detect and transmit complete body movements in a rather crude fashion.

It is important to remember that for each additional level of movement accuracy and each additional body component sensed, design costs will increase, bandwidth will increase, and the ability to display real-time performance may decrease. This is due to the requirements for additional hardware, software, and computational power to transform those signals into realistic VB action.

2.1.4 Other Body Representation Research

Lastly, it is worth mentioning the tremendous amount of work being performed in the area of 3D human models and VR synthetic actors. These researchers have gone to great lengths to accurately portray virtual human models for VR applications and/or human-factors research. As mentioned earlier, Badler, Phillips, and Webber (1993) developed and refined a commercial human modeling platform entitled Jack that incorporates highly accurate 3D human models for human behavior simulation, humancad, and human-factors research. Other researchers (Thalmann & Thalmann, 1993) are studying human models for the purpose of generating accurate computer driven 'synthetic actors' to populate future VR worlds. Efforts in both of these areas will no doubt lead to improved VB representations in the future.

2.2 Spatial Awareness

Spatial awareness is a general term that includes several aspects of performance including judgments of the participant's position relative to the world and judgments of the relative positions of objects in the world. It involves space perception, i.e., the participant's ability to perceive the 3D layout of an environment. Spatial awareness research is diverse with identified areas including navigation, tracking, cognitive-map development and maintenance, distance perception, and spatial knowledge representation techniques. Spatial awareness has been defined for this thesis as an awareness of the location of objects in the immediate surroundings relative to one's location (egocentric distance). It is a requirement for several tasks, whether in the physical world or a virtual world. Below is a synopsis of the literature regarding the concept of spatial awareness and its potential metrics as related to this effort.

2.2.1 Is Spatial Awareness a Component of SA?

SA is a complex, multidimensional concept that involves complete understanding of one's spatial, status, and temporal situation in the internal and external dynamic environments. Venturino and Kunze (1989) describe spatial awareness as a subset of SA in their work dealing with aircraft systems. These researchers decomposed SA into state awareness (aircraft systems, fuel, weapon status) and spatial awareness (other aircraft locations, intercept geometry's, etc.). Fracker and Davis (1990) argue that SA has two key components: spatial awareness (knowing where objects are in space) and identity awareness (knowing what the objects are). Other researchers have also implied that spatial awareness is an important aspect of SA (Endsley, 1988; Harwood, Barnett, & Wickens, 1988). Sarter and Woods (1991) provided an excellent description of the complexities of SA and acknowledged spatial awareness to be one of many contributing factors. Based upon the above studies, it is concluded that spatial awareness is a component of SA.

2.2.2 Spatial Awareness Research

Venturino and Kunze (1989) investigated the ability to acquire and memorize patterns of spatial locations using a helmet-mounted display (HMD). Their work was based on the premise that human spatial cognition (i.e., spatial awareness) can be measured by the ability to locate, memorize, and replace patterns of spatial locations in an area 240 degrees azimuth by 90 degrees elevation. Thus subjects had to spatialize object locations existing all around them rather then only within a display directly in front of them. This work was an extension of the previous research on spatial awareness performed by Wells, Venturino, and Osgood (1988). The researchers manipulated FOV, number of targets, and availability of context in replacement tasks. The task involved a presentation of a multiple-object environment in which the subject attempted to memorize all object locations. After memorization occurred, the objects were removed and subjects were told to replace specific objects into their original locations.

The results (Venturino & Kunze, 1989) indicate that FOV size affects acquisition of spatial information about one's surroundings, as indicated by an increase of `time to memorize' with decreasing FOVs. Small FOVs require more head movements, more sampling time, and more integration effort to build a mental representation of the spatial environment. Replacement error (i.e., the ability to replace an object back into its original location) was affected not by FOV but by memory load (i.e., number of initial objects). In addition, replacement error was large for targets in the extreme azimuth/elevation locations (95% greater error azimuth, 23% greater error in degrees elevation, as compared to next closest zones). When contextual information was presented in the replacement phase of the study, there was a reduced replacement error for the extreme positions only (52% greater error azimuth, 7% greater error in degrees elevation, as compared to next closest zones).

These results indicate that a large FOV will facilitate the development of spatial awareness. A larger view allows for easier integration of environmental elements and their associated relationships. It also seems that a multiple target environment will adversely affect the development and maintenance of spatial awareness. The finding regarding contextual information is especially interesting. It indicates that the availability of contextual information is of little aid in spatial location recall unless this context is provided at the extreme edge of the FOV. It is precisely in this area that the VB exists, for many viewing conditions. This then offers a potential for the VB to be used as a contextual aid for spatial awareness. Boff and Lincoln (1988) support this notion, stating "egocentric judgment depends upon perceived spatial relations among objects. It is referenced to an "object", although the object may be a part or location of the observer's own body". The availability of context for recall of situation awareness information was also discussed by Sarter and Woods (1991).

Dorighi, Ellis, and Grunwald (1993) also focused on spatial awareness aides for aircraft pilots. They felt that spatial awareness may be positively affected by an inside-out (egocentric) frame of reference primary flight display ("the tunnel in the sky") versus a typical attitude director indicator (ADI). These researchers found that pilot orientation estimates of initial target directions exhibited symmetrical azimuth angle undershoot errors, as was found in earlier research, but there was no effect for display type used.

Much effort has gone into the study of classical depth cues as sources for spatial depth information (Ellis, 1991; McGurk & Jahoda, 1974; Surdick, et al., 1994). These cues can be broken down to object-centered and observer-centered cues (Wickens, 1992). Object-centered cues include linear perspective, interposition, height in the plane, light and shadow, relative size, texture gradients, proximity-luminance covariance, aerial perspective, and relative motion parallax. Observer-centered cues (those that are characteristics of the human visual system) include binocular disparity, convergence, and accommodation. Depth cues are necessary for the perception of 3D space, and the relative effectiveness of each has been extensively researched in the real world. Recently these efforts have expanded into computer displays and VEs (i.e., Ellis, 1991; Surdick, et. al., 1994).

This highlights an important issue regarding spatial awareness and VEs. How well does one's spatial awareness of a virtual environment match the spatial awareness of a similar real (physical) environment? Research in this area is still in its infancy and the results are often contradictory (Hale & Dittmar, 1994; Henry & Furness, 1993; Lampton, Bliss, & Knerr, 1994). It appears that some spatial distortion may be inevitable, due to the lack of certain depth cues such as accommodation. However, one should be cautioned against relying too much on these initial findings, given that these results strongly depend on the particular system used and there is tremendous fidelity variability in VR technology available today.

2.2.3 Spatial Knowledge Development/Structure

Spatial information gathered by an individual is stored as spatial knowledge. When the spatial environment to be learned is large and can be navigated, this spatial knowledge is often referred to as a cognitive map. However there is some ambiguity in the literature regarding the proper term for spatial information gathered from small-scale, one-room settings (i.e., spatial layout, cognitive map, spatial knowledge). Regardless, studies have shown that environmental context (termed `landmarks' in large-scale environments) aids the development of spatial knowledge, whether that knowledge is of large- or small-scale environments (Venturino & Kunze, 1989; Wickens, 1992).

Wagner (1985) presented a detailed, systematic investigation of the geometric structure of visual space. He administered several psychophysical techniques (magnitude estimation, category estimation, perceptual matching, and mapping) in an attempt to mathematically define how humans represent visual space internally. Previous research had resulted in several conflicting structural descriptions including Euclidean, spherical, and hyperbolic. Wagner found that visual space is severely compressed in the in-depth dimension relative to physical space. He surmised that visual space would be best modeled as an affine-transformed Euclidean space (where perceived depth is compressed from physical depth) for his conditions. Wagner acknowledged that different testing conditions would most likely result in different visual space geometry but believed that the visual world seems to approach a Euclidean ideal of veridical perception as the quality and quantity of perceptual information increases.

2.2.4 Spatial Awareness Metrics

Spatial awareness tasks have involved the use of several different metrics, most of which can be described as being egocentric or exocentric in nature. Venturino and Kunze (1989) used time (seconds) until all object locations were memorized (in search mode) and absolute replacement errors (in degrees error) in a target replacement task, both of which are egocentric. Dorighi, et al. (1993) used body-referenced visual direction to targets in developing mean error in degrees and mean absolute error in degrees. This is also an egocentric measure. Barfield, Rosenberg, and Furness (in press) utilized an alternate metric that followed an egocentric spatial display with a test using an exocentric map. Subjects flew a flight simulation that involved a search for several stimuli. After the flight, spatial knowledge was tested by having the subjects locate each stimulus on an exocentric map of the entire area (evaluating mean horizontal and vertical offset errors). A variation of this technique used by Marshak, Kuperman, Ramsey, and Wilson (1987) involves stopping the simulation at various points to probe subjects on the relative positions of targets. The dependent variable was the absolute percent error for each question summed for all questions in a trial. A final metric, developed by Palmer (1990), adapts standard psychophysical methods to measuring position in a multi-stimulus display. Subjects are presented with an egocentric spatial environment, then with a test condition that is similar except that one target has been displaced in some way. The subject, cued as to the identity of the displaced target, must state how the target has been displaced (up-down, left-right, forward-back). From these data, a psychometric function can be constructed and a difference threshold estimated for position.

McNamara (1986) presented a list of major spatial awareness/spatial representation metric categories. The list is as follows: distance estimation, orientation judgments, map drawing, navigation, and search/replace tasks. Often the choice of metric category depends on the particular aspect of spatial awareness being studied.

Since distance estimation is used in many spatial awareness studies, a brief background of this metric is in order. Distance estimation can be absolute (i.e., distance from the observer to an object) or relative (i.e., distance between two objects or other people). Absolute distance could also be termed egocentric distance and relative distance could be termed exocentric distance.

Under natural, unrestricted viewing conditions, the perception of distance is remarkably consistent (Baum & Jonides, 1979, Boff & Lincoln, 1988). Baird and Biersdorf (1967) as well as others have showed that the relationship of perceived distance and actual distance, on average, can be described by a power function:

J=kDn

where k and n are constants for that location/orientation, J is the judged distance, and D is the actual distance. The exponent `n' approximates 1.0 overall; it is generally slightly greater then 1.0 with indoor observation and generally less then 1.0 with outdoor observations (Da Silva and Fukusima, 1986).

Research has shown that large individual differences exist in judgment of apparent distance (Cook, 1978; Da Silva & Fukusima, 1986). However, Da Silva and Fukusima (1986) found that these individual differences, manifest in individual exponents of each fitted power function for magnitude estimation of apparent distance, remain stable regardless of environment (natural indoor or natural outdoor), range of distances estimated, and length of the inter-session interval, for up to 9 months. It is reasonable, then, to use a within-subjects design for distance estimation-type activities. This design style would negate the effects of large individual differences observed in distance estimation while maintaining the observed temporal stability found within each individual's judgments.

Spatial awareness could also be measured indirectly through techniques designed to measure SA. One such technique is Situation Awareness Global Assessment Technique (SAGAT) (Endsley, 1988). SAGAT involves stopping a task at random intervals so that the subject can be asked a question that relates to his/her SA at that time (not unlike Marshak, et al., 1987). Subject answers are then compared to the actual situations that existed at each time interval to determine the subject's overall awareness of the situation. SAGAT has been shown to have face validity, empirical validity, and predictive validity. However, this technique normally measures more then spatial awareness, and some critics believe that this technique does not even directly measure SA but rather what subjects can recall, a criticism that befalls search/replace measures as well (AIAA/ANSI, 1993). Another SA technique is the Situation Awareness Rating Technique (SART) (AIAA/ANSI, 1993, Taylor & Selcon, 1990). SART treats SA as a complex construct and uses several (3 or 10) separate measurement dimensions to get a complete picture of it. Therefore, it is not optimized for pure spatial awareness measurements.

Of all the metrics presented in this section, several can be excluded for use in this effort. Due to the egocentric nature of this thesis, exocentric measures (i.e., map drawing) are of less interest. Since the initial virtual world study will consist of only one room, navigation tasks can be eliminated. SAGAT is designed to be a direct measure of SA and as such involves factors besides spatial awareness. Orientation metrics alone do not appear to be a complete measure for this research but can be considered a subset of the search-and-replace tasks. This leaves distance estimation and search/replace metrics as potential measures.

2.3 The Visual Periphery

The visual periphery is important to this thesis because it is through this system that most VB visual images are processed. This section provides an overview of the human visual system, describes the importance of the peripheral visual system, and reviews major peripheral vision characteristics to obtain general design guidelines in the development of VB configurations.

2.3.1 The Visual System

The human retina is made up of two components: 1) a central foveal section and 2) a surrounding periphery (Levine & Shefner, 1991). The fovea, made up entirely of cone photoreceptors, is the part of the eye that detects fine detail and is specialized for light adapted (photopic) viewing conditions. It also the area through which the best color vision is obtained. Although cones exist throughout the retina, they are by far most concentrated in the fovea and foveal cones are specialized for finer acuity. Each foveal cone has a dedicated channel to a ganglion cell and, as a result, does not have to share inputs with other receptors. This allows for small receptive fields, providing fine acuity. The periphery of the retina is populated mostly with rod receptors, along with a small proportion of cones. Rods, though absent in the fovea, number approximately 120 million in the retina compared to about 6 million cones. Rods are specialized for viewing in dim illumination (i.e., scotopic viewing) but do not code color or fine detail. Rather, rod inputs link with neighboring rod and cone inputs to one ganglion cell in a process called spatial summation. Summation results in larger receptive fields and a greater possibility for ganglion cells to fire when few photons are present, but it also causes a large decrement in the ability to resolve fine detail. In summary, the fovea is a small section in the center of the retina that is specialized for detailed, color viewing in daylight, and the periphery is a large retinal section surrounding the fovea that offers poor acuity and color vision but is specialized for detecting available light in dim conditions.

2.3.2 Importance of the Visual Periphery

It is well known that the visual periphery plays a major role in how we interact with our environment. An interesting study by Sivak and MacKenzie (1990) detailed the relative contributions of peripheral and foveal vision to a reaching and grasping task. The researchers studied performance of the task by subjects restricted to central (foveal) viewing only (i.e., through the use of special goggles) or peripheral viewing only (i.e., special contact lenses used). The results indicate that subjects require central vision for adequate information of an object's intrinsic qualities (e.g., size, shape, etc.) that affect both reaching and grasping functions. Peripheral vision is required for adequate information of an object's extrinsic qualities (i.e., location) which affect reaching tasks but not grasping tasks. Numerous other studies have shown the importance of the periphery in performing tasks as diverse as walking, balancing, and building cognitive maps (Alfano & Michel, 1990; Assaianti & Amblard, 1992).

Given that humans use their peripheral vision to perform a variety of tasks, the potential exists that the peripheral images of one's own body may be used in spatial awareness activities such as judging distance to nearby objects. Therefore, the following review of peripheral vision characteristics was performed so as to obtain basic design guidance for the development of VB configurations. The major topics of this section (motion, color, size, contrast sensitivity) were chosen after an initial review of the literature revealed these factors as most likely to impact the design of a VB.

2.3.3 Motion

The periphery is more specialized to detect the motion rather then the form of an image. At the extreme visual periphery, however, humans cannot even visualize motion. Rather, the motion stimulates a reflexive rotation of the eye to bring the image into the central field of view. This appears to be an "early warning sensor" to possible approaching dangers.

It is a commonly held belief that peripheral vision is more sensitive to motion then the fovea. This is in fact not the case as threshold for motion rises in nearly linear fashion with eccentricity (McKee & Nakayama, 1984; Tynan & Sekuler, 1982). However, the threshold for resolution of static detail rises far more rapidly with eccentricity, making the periphery relatively better at seeing motion then at seeing static form. As an example, a small dull object that is not detected by peripheral vision if it is stationary may be detected if it is moving.

As a person moves through an environment, there is a consistent transformation of the distribution of discernible points in the optic light array that confronts the eye. This phenomenon involves the velocity of points moving through the optic array in a direction opposite of self-motion and imaging on the retina throughout the periphery. These gradients of motion in the visual field are collectively called the optical flow field. Optical flow fields present a great deal of precise information on an individual's path of motion and on the surrounding environment.

Optical flow fields are important because they are used by the periphery as cues to one's direction, speed, and orientation while moving (Taylor, 1988; Wickens, 1992). For instance, symmetrical expansion of object contours outward from the focus of expansion indicates a collision course between the moving person and that object; asymmetrical expansion indicates a miss (Haber & Hershenson, 1973). Optical flow fields may also interact with the visual image of one's own body. The peripheral body image could in fact partially block the optical flow field, decreasing the flow field information content, or it could work with the flow field to provide enhanced self-alignment information.

2.3.4 Color

With increasing eccentricity, observers need larger visual fields to perceive color. According to studies performed by Abramov, Gordan, and Chan (1991), by increasing the stimulus size it is possible to achieve fovea-like color vision up to 20 degrees eccentricity. However, even the largest stimulus size at 40 degrees eccentricity is not enough to produce fully saturated hues. This is most likely due to the decrease of spectrally opponent ganglion cells and the increased rod input with increasing eccentricity. Peripheral presentations of color appear desaturated compared to foveal presentations if field size is held constant. Red and blue colors have the smallest perceptive field sizes everywhere while green requires the largest field size to maintain fovea-like responses. As eccentricity is increased, there is the potential for substantial hue changes to occur. For example, at 40 degrees eccentricity, the perception of blue could spread to a wavelength of 580 nm, pretty much covering the foveal spectral ranges for green and most of yellow!

2.3.5 Size/Form of an Image

It is universally accepted that acuity drops drastically with eccentricity (Tynan & Sekuler, 1982). An equation (Olzak & Thomas, 1986) to approximate visual acuity performance up to 20 degrees eccentricity is a linear function:

MA(E)=MA(0)(1+aE)

where `E' = degree of eccentricity, `MA' = minimum angle of acuity, and `a' varies with the task, person, and illumination level. At eccentricities greater than 20 degrees, the equation becomes a positively accelerating function, with the rate of acceleration different for different tasks (Millidot, Johnson, Lamont, & Leibowitz, 1975). The 20 degrees eccentricity cutoff (that also appeared with color) is interesting because it is at that point that rod quantity is at its peak and cone quantity bottoms out from a peak near the foveal center to a consistent low number for increasing eccentricities.

Performance in the identification of image form also progressively deteriorates in a linear fashion with increasing target eccentricity (Menzer & Thurmond, 1970). Observers are very poor at identifying absolutely the shape of a form beyond approximately 30-degrees eccentricity but they perform somewhat better if they are provided with target choices in the center of fixation. Outlined shapes will be perceived better then solid forms up until 20 degrees after which solid forms are more easily detected then outlined forms (Menzer & Thurmond, 1970). Spatial summation of rod signals may account for the increasing performance in detection of solids with eccentricity and the lack of spatial summation of cone receptors (allowing finer acuity) may account for the performance of outlined shapes near the center of fixation.

2.3.6 Contrast Sensitivity

Static contrast threshold degrades with eccentricity, however it also degrades as target size, exposure duration, and background luminance decrease. This drop can be likened to a shifting of the entire contrast-sensitivity function toward lower spatial frequencies. Studies considering target motion have shown that the contrast threshold for moving targets varies little as target distance from fixation increases up to 55 degrees (Rogers, 1972).

Its important to realize here that a peripheral stationary object is likely to disappear even if it is not artificially stabilized on the retina, which is required in foveal vision to achieve the same effect. This occurs because small eye movements are less effective in refreshing information transmitted by peripheral receptors, due to spatial summation of rod signals. If this stationary object has low contrast or is viewed under low illumination, it will likely disappear much more quickly.

2.3.7 Summary Design Guidance

The single most influential factor in peripheral vision for general design purposes appears to be its coding of the dynamic environment. Study after study has indicated that the perception of motion is much more accurate than form perception in the periphery. When required, movement in the peripheral region should be portrayed as accurately as possible. Accurate optical flow fields should also be present during self motion for the maintenance of direction and orientation cues.

Color information should be preserved in the peripheral channel up to 20-degrees eccentricity for the perception of color. At eccentricities greater then 20/30 degrees, peripheral color presentation may be more important for the luminance information it provides then for color perception. As objects move out into the periphery, they are more likely to be encoded by rods then cones. Since rods are univariant and only encode the amount of photons being absorbed, changes in quantum catch by rods would simply result in a signal of change of luminance. This luminance variation must be maintained for a consistent perception of the environment to occur.

Fine detail of VBs can definitely be left out, as long as edges, borders, and contours of images retain contrast with regard to the background. Detail should be kept for VB parts that often appear within 20 degrees from the current fixation point, using the equation described above as a design guide.

The findings of poor performance of the far periphery for absolute identification of shape indicate that a peripheral channel can get by without accurately encoding size or shape at eccentricities beyond 20/30 degrees. However, recent findings of improved relative identification of shapes in this area indicate that a human can detect more then he or she can identify through recall techniques. Therefore, a conservative design guide would be to accurately portray all large VB component shapes (or perhaps only low spatial frequency components) that will be peripherally viewed.

Moving VB parts should retain their sharp edges and borders to maintain high contrast and, therefore, increase the likelihood of detection. The interior textures/detail of these moving stimuli could be neglected since fine detail would not be noticed anyway.

The previously identified FOV limitations of current VR systems (along with their generally poor image quality) must be considered when applying the above design guidance. However as VR systems advance, their optical clarity and FOV will increase, allowing for more full application of these strategies.

2.4 Ecological Psychology's View

To develop a more complete understanding of this thesis question, the theory and research of ecological psychology was reviewed as it relates to this effort. Much of this research centered around the writings of J.J. Gibson. Gibson (1979) developed an interactionist view of perception and action that focused on information that is available in the environment. He rejected the mainstream assumption of factoring external-physical and internal-mental processes in the study of perception along with the static, artificial laboratory research methods employed by a majority of visual scientists. According to ecological psychologists, interactions among aspects of cognition and behavior are too subtle and complex for any productive factoring strategies to occur (Greeno, 1994). In Gibson's view, many questions about how information is constructed by people and animals could be considered more appropriately as questions about what sources of information exist in the environment. Humans and animals directly perceive these pieces of information that the environment affords without this information being mediated by internal knowledge and mental processes.

The concept of affordance is central to ecological psychology. Affordance is defined as the behavioral possibilities of the environmental layout taken with reference to an animal's action capabilities. McLellan (1994) discusses the use of ecological affordances as design tools. However for affordances to be used, they must be recognized by the individual. This recognition is sensitive to the task being performed and the amount of other related and unrelated context in the environment.

2.4.1 Importance of the Body

Ecological psychologists do not discount the visual representation of the body. In his book The Ecological Approach to Visual Perception, Gibson not only acknowledges the presence of one's own body in the visual image (or ambient optical array) but that it is an important information source for the individual. He calls this information `propriospecific', which includes optical information that specifies the observer along with the concealing of the environment in a way that is unique to that observer. For instance, the IFOV always involves the occluding images of the nose and cheekbone, and can also include the arms, torso, legs, feet, etc. The seeing of oneself is not a complex intellectual experience but a very primitive one. Gibson also reasons that there is information in the normal ambient visual array to `specify' the parts of the self to the point of observation (being the eyes), although he does not define this specification explicitly.

Information to specify the self and information to specify the environment coexist in the real world and could not exist separately, according to Gibson. The distance between `here' and `there' is one of a continuous layout of surfaces with changing gradients from the nose all the way to the horizon. However, in immersive VR a designer can delete all information about the self from the visual image, creating a new situation. Gibson also stated that "An observer perceives the position of here relative to the environment and also his body as being here". In VR, a participant can only perceive the position of here relative to the environment, as he or she may not have a VB and his/her physical body may exist in a completely separate environment. So the question arises, "In immersive VR with no VB, what is the participant's perception of `here' and on what is it based?".

2.4.2 Ecological Distance Research/ Metrics

In regards to distance, Gibson believes it extends along the ground and not through the air. Distance is then projected as a gradient of the decreasing optical size and increasing optical density of the features on the ground. This gradient is anchored at two places; the horizon, where texture density is at maximized and all visual solid angles shrink to zero and the visual body, specifically the nose. The nose represents the maximum of nearness and the horizon represents the maximum of farness. The nose provides the absolute baseline (i.e., the absolute zero of distance from here) for Gibson's three kinds of optical gradient for perceiving depth: size perspective, disparity perspective, and motion perspective. Hands and feet undergo optical minification by moving the extremity away from the eyes and optical magnification by moving the extremity towards the eyes. However, the solid visual angles of the hands or feet cannot be reduced below a certain minimum. This limited range describe the limits of the body and the world; the extremes of `here' and `out there'.

Ecological psychologists, therefore, often study distances as recession along the ground rather then using distances through the air to objects that have a frontal plane as a background. In one set of experiments (Purdy & Gibson, 1955), subjects were asked to bisect a distance along the ground in an open field setting by directing that a mobile field marker be stopped at the halfway point between two end markers. Distances to be bisected included from marker to marker or from the subject to a marker. Subjects could bisect distances without difficulty and with some accuracy. The further stretch could be matched with the nearer stretch with no constant error even though the further stretch had a more compressed visual angle then the nearer one. The conclusion Gibson reached was that observers were not paying attention to visual angles; they were noticing information in the ambient optical array. This may have included the amount of texture in each visual angle and use of the proposed invariant "equal amounts of texture for equal amounts of terrain". From studies like this, Gibson reasoned that both size and distance are perceived directly. There are no separable depth cues that need exist nor does one have to allow for a distance when perceiving the size of an object. Other open field studies that required absolute distance judgments (in terms of yards) resulted in fairly accurate performance by subjects but they had to first `see' the distance before applying a number to it (Gibson, 1979).

Instead of measuring distance along the ground, many ecological psychologists capture distance estimation in terms of a human activity, such as reaching for an object or walking toward an object blindfolded after previous viewing. Indeed, even before ecological psychology was considered as a separate approach to visual perception, Smith and Smith (1967) indicated that mean errors were greater when distance judgments were made in artificial units (metrical rod length) then when they were made in natural units (arm length).

A set of studies was undertaken to determine what is reachable by an observer (Carello, Grosofsky, Reichel, Solomon, & Turvey, 1989). The goal was to determine invariant properties in the ambient optical array that allowed for the direct perception of reachable distance by the observer under a variety of conditions. Instead of treating the task as one of seeing the distance to a target, perceiving what is reachable was treated as a problem of perceiving an affordance. In this way, ecological psychologists argue that the distance to the object is not registered in absolute terms with respect to an extrinsic reference frame (such as feet or meters) but rather the information in the environmental layout is scaled in terms of a local or intrinsic reference frame. In other words, a small person's reference frame will differ from a tall person's, but both would use their respective frames of reference to make accurate judgments of reachability.

In these experiments, subjects responded to a series of postural positions and environmental layouts of objects by answering the question "Is the object reachable?". Subjects did not perform the act of reaching during the experiment nor did they provide verbal estimates of distance. These researchers found that under a variety of conditions, subjects could accurately perceive what is reachable and what is not using only optical information.

The majority of variance in the reaching estimation was `overshoots', that is estimating that one's reach extends farther then it in fact does. These researchers offered two reasons for this tendency. One, the overshoot errors were natural considering the mobility of the arm. Therefore, by allowing for reasonable adjustment of the activity within the general constraints of the studies, a tolerance region for reach limits was obtained which captured/eliminated the overshoots. A second reason hypothesizes that the errors were caused by the subjects making verbal judgments without actually reaching, whereas in actual reaching tasks the individuals can fine tune their reach during the task.

With regards to the present effort, it is interesting to note that in one of the experimental conditions, the researchers masked the arm closest to the viewing area. They provided no reasoning for this but perhaps they felt that it provided an aid for direct perception that needed to be `factored out' of the analysis. The primary concern of this thesis is to explore the effects of the existence of these body images on spatial awareness.

Many related studies have shown that subjects tend to overestimate whether an object is within reach (Bootsma, Bakker, Snippenberg, & Tdlohreg, 1992; Carello, et al., 1989; Heft, 1993). Heft argued that these overestimates were due to non-perceptual factors intruding on perceptual processes, in other words intellectualizing what inherently is a nonintellectual process. By actively focusing on the perceptual act, errors are introduced to that act. Heft argued that minimizing the analytical processes associated with the obtaining of a percept will result in a more pure and accurate reporting of that percept. This suggests that perception-action functioning can operate independently of, and not necessarily subordinate to, intellectual functioning (Heft, 1993).

Heft studied reaching judgments under three conditions of decreasing influence of analytical/reflective processes on the perceptual task: as the focus task with no time limit, as the focus task with a 2 second time limit, and as a secondary aspect of a larger task. The results indicate that errors in perceptual estimates are reduced as the individual focuses less on the estimate itself. Heft then provided a hypothesis to account for the direction of error (overshoot or undershoot) found in many perceptual-action studies. When the consequences of misjudging an affordance are not serious and possible errors can be corrected through fine tuning (e.g., stretching the limbs, readjusting weight distribution, etc.), inaccuracies based on `thinking too much' will be in the direction of overestimates. This is the case for most reaching tasks. However, if fine tuning of adjustments are not possible and/or the consequences for misperceiving the affordance is serious, perceptual judgments that include analytical/reflective processes will result in conservative underestimations (undershoots).

An interesting study by Loomis, Da Silva, Fujita, and Fukusima (1992) illustrated a conflict in distance estimates obtained with: 1) direct scaling method and 2) a visually-directed action. The direct scaling method required the subject, standing in an open field, to vary a distance interval in the depth plane to appear objectively equal to a fixed distance interval in the frontal plane. The visually-directed action involved having the subject view a target in an open field then walk with his/her eyes closed to where he/she thought that object was located. The results indicated that the subjects produced systematic errors when matching a distance in the depth plane to a distance in the frontal plane with the depth distances needing to be much larger to be equated with the same frontal distance. However, subjects could accurately walk blindly to any previously seen target up to 12 meters away without much systematic error. There seems to be a conflict here as subjects systematically misperceive visual space in the depth plane (in a way that mimics the affine-transformed Euclidean space that Wagner (1985) found), yet under identical viewing conditions can accurately walk blindly to depth targets after visual perception of these targets (representing a pure Euclidean visual space). Perhaps the additional analytic processes required in the direct scaling method distorted the perceptual task, whereas in the visual directed walking action visual perception was only a subset of a larger task (locomotion), reducing analytical/reflective distortions.

A study by Caird (1994) offers a bit of closure to this literature review as it involves elements from ecological psychology, VBs, and spatial awareness. Caird studied the effect of virtual hand size on the perception of grasp extent. Using ecological measures, he varied the size of the VB (in this case, a solitary hand) to determine its effect on perceived virtual object size. Caird found that the perception of VE scale is indeed a function of VB size (i.e., the size of one's hand is used as a referent for size estimation). This thesis attempts to extend Caird's focus by investigating the potential that an entire VB may be used as a referent for object spatialization.

2.4.3 Ecological Psychology Summary/Relation to this Effort

Framed in the ecological psychologists point of view, this thesis could be stated as the determining if the body `propriospecific' information available in the optical array acts as an affordance for the direct perception of distance and location of objects in the vicinity of the body. Perhaps it may combine with other properties in the environmental layout to produce a stronger invariant than would otherwise be available.

The metrics available in ecological psychology would center on perceiving distance along the ground and/or perceiving the reachability of objects. Walking blind to previously viewed objects would not be a viable metric for this effort as its thrust involves the availability of the VB image in the optical array. Perhaps a modification of the walking task could be used, such as walking visually towards an object until it is just reachable. In consideration of Heft's findings (Heft, 1993), perceptual acts should be structured so as to minimize influences of analytical processes. Any use of direct scaling methods for distance estimation should be done in consideration of the results found by Loomis, et al. (1992).

2.5 Summation and Experimental Question

What follows is a brief summary of related finding from the literature review. VB research has shown that VBs can influence one's cognitive state (presence) and there have been assertions that VB representations may be required for some virtual spatial tasks, i.e., architectural walkthroughs. Spatial awareness research has shown the importance of context in spatial tasks and that egocentric judgments depend on perceived relations among objects. PV research has shown that the peripheral visual system is able to detect and transmit visual information on large area, dynamic images (characteristics that match one's own body image). PV has been shown to be vital to the performance of many tasks, including those that are spatial in nature (i.e., locating an object, developing cognitive maps). In addition, ecological psychologists have long theorized the importance of one's own body image as an information source. Lastly, Caird's research (1994) has demonstrated ecologically that a VB can influence one's perception of virtual world spatial scale.

Given the above findings, the following testable design question is presented:

Does the existence of a VB enhance one's spatial awareness of a new environment by providing an invariant, subtle point of reference for object positioning and scaling? The following three chapters detail studies that were run to answer this question along with the results found.

[Previous Chapter][Table of Contents][Next Chapter]