An Exploration of Techniques to Improve Relative Distance
Judgments within an Exocentric Display
Building a virtual display interface in order to communicate situation awareness involves an understanding of a number of key issues. In order to examine these issues, we first define what is meant by the concept of situation awareness and how spatial awareness fits into the attainment of such a goal. We then focus on an aspect of spatial awareness which addresses how humans come to understand the 3D geometry of a given space. Conveying an understanding of 3D space involves two issues. One issue deals with determining how a particular 3D space should be represented so that one can gain a cognitive understanding of the space. Since the human user will need to interface with the display to gain the spatial knowledge, an adjunct issue is to determine how that representation will actually be built on the physical display itself.
To address the above issues, we will first review how humans become cognitively familiar with the configuration of a given environment. This information provides insight into how a designer may conceptually represent a 3D space. In order to address how the 3D space should be represented physically through computer hardware and software, we will outline some relevant human perceptual issues. Specifically we will explore what primary cues to depth and distance humans use in the natural world in order to understand any given 3D environment. Following this is a review of how particular depth cues of interest can be recreated within a virtual environment via various computer graphics techniques along with prior, relevant research utilizing those particular depth cues. Other issues covered in this review include recommendations with regard to creating spatial displays. The goal of this literature review was to determine what is known and what is unknown about building spatial displays for the purpose of communicating the 3D geometry of a given space.
2.2 Components of Situation Awareness
There are many related definitions of situation awareness. In one general definition, situation awareness is conceived of as the participant's internal model of the surrounding mission at any point in time (Endsley, 1988). Other researchers divide the concept of situation awareness into spatial and state (Venturino and Kunze, 1989; Barfield, Rosenberg and Furness, 1995), as well as temporal (Sarter and Woods, 1991) components. As a component of situation awareness, spatial awareness is an understanding of the location, in 3D space, of particular objects or participants within a given environment. In a military mission environment, spatial knowledge would allow one to know the relative location of enemy targets, as well as friendly and neutral players within a given scenario (Venturino and Kunze, 1989). In a medical diagnosis scenario, spatial awareness may entail the understanding of the exact location of a tumor within a specific multi-chambered organ such as the lung. These examples imply that spatial awareness can be defined as an understanding of the 3D geometry of a given environment. Such knowledge may help one to develop a plan of attack against an enemy whether that enemy is an opposing military force or a disease.
State awareness, as another component of situation awareness, refers to an understanding of an individual's or object's status in terms of specific measures. For example in a military situation, state awareness may refer to knowledge about the fuel level or the weapon's status of one's own aircraft. In a medical situation it may be the blood pressure or white blood cell count of a particular patient.
Temporal awareness, as the third component of situation awareness, consists of understanding the impact of events as they occur over a period of time. Individual events may not seem important taken in isolation, but if one understands the overall picture and is able to keep track of succeeding events, the impact of individual events in combination may influence one's understanding of the state of the total mission (Sarter and Woods, 1991). Given that situation awareness is a very broad issue which involves many different components, this research will focus on increasing one's spatial awareness as a means of increasing one's situation awareness.
To increase one's spatial awareness, we need to increase one's conceptual understanding of a particular 3D space. People acquire their knowledge and understanding of a 3D space, whether that 3D space is a military mission environment or a city or the human body, via a variety of means. When trying to understand a physical environment such as a city, for example, people often use mechanisms such as maps, navigation experience, or following directions. These mechanisms lead to different types of knowledge about a particular area such as landmark knowledge, route knowledge, and survey knowledge (Wickens, 1992).
In order to gain an understanding of a given area, maps can be an important tool for gaining the requisite knowledge needed in order to make spatial decisions. Map learning has been shown to be superior to route learning for making judgments of relative location and straight-line distances of objects (Thorndyke and Hayes-Roth, 1982). Map learning can also lead one to obtain what is known as survey knowledge, or an internalized "cognitive map" (Wickens, 1992) of a given environment.
An exocentric display provides a "from above" (outside in) view, and it allows one to gain an understanding of the spatial relationships of objects within a given environment (Barfield, et al., 1993); such a display can be considered a sort of god's-eye-view map. An egocentric display provides a more "from within" (inside out) view of a 3D space. In order to capitalize upon the benefit of a map in helping one to gain a spatial understanding of a particular environment, the series of experiments reported here focuses on an exocentric rather than egocentric view of an environment.
2.3 Perceiving Space and Distance in the Natural World
We now address some of the human perceptual issues which need to be considered if one wants to communicate 3D space within a computer display. People use a variety of depth perception cues in order to answer questions related to the shape and distance of objects within a 3D environment. These questions include (Wickens, Todd and Seidler, 1989):
* How far away is that object from my point of view?
* How far apart are those objects from each other?
* What is the true three-dimensional (3D) shape of that object? .
Our primary focus is on the second question, which deals with understanding how humans perceive the relative distances of objects within a 3D environment. By understanding how humans perceive relative distances in the natural world, we may be able to better build virtual interfaces which will also provide such distance information.
Distance can be perceived via both monocular and binocular depth cues. Monocular depth cues provide an equivalent percept to both eyes; the cues are equally effective whether using one or both eyes. Examples of monocular cues include linear perspective, texture, shadows and lighting, motion, and size. Binocular cues, on the other hand, take advantage of both eyes by allowing each eye to receive slightly offset views of the same visual scene. Binocular disparity is a binocular cue which creates the phenomenon of stereopsis.
2.3.1 Monoscopic Depth Cues
Linear perspective is an important monocular cue occurs when the 3D world is projected onto a two-dimensional (2D) surface, such as a photograph, a painting or our retinae. If an object extends out away from the observer, the nearer portions of the object will be imaged larger on the retina than those portions of the object that are farther away (Levine and Shefner, 1991). This phenomenon produces a strong perception of depth.
Texture is also an important monocular cue which occurs from "depth-dependent" distortion occurring in a 2D picture. When 3D objects get projected as images onto our 2D retinas, their surfaces distort. The "texture" occurs when there is a change in the distance between similar objects on the surfaces as those objects are projected onto the retina. Cutting and Millard (1984) made distinctions between three different types of static texture gradients and then performed experiments to determine what type of texture gradient was most important in perceiving different shapes. Texture can also be an important cue for determining the relative distance between objects (Wickens, 1992).
An object's location within the visual field can also signal to the observer assumed information related to relative distances. In addition to their experiments on lighting and shadows, other experiments by Berbaum, Tharp and Mroczek (1983) drew observations that there is a general pattern of depth assignments assumed in visual processing. One assumption is that objects in the lower parts of pictures are closer to the observer than objects in the higher parts.
Motion is considered a monocular cue and sufficient motion, that occurs from the movement of the observer and/or movement of an object, can cause the perception of depth and distance. The perception occurs because there is a change in the relative distance and/or a change in the original orientation or shape of an object. Depth cues which involve motion are called kinetic cues. Motion parallax is a particular type of kinetic cue which describes the effect of relative, lateral movement of objects at different distances from the observer. Objects which are closer to an observer appear to be moving faster than those objects which are at a distance, if the observer is fixating on the farther object. An observer can move his head from side to side in order to determine which of two objects at different distances is closer. This previous example shows how motion parallax can provide relative distance information. Another motion-based depth cue relies on a phenomenon known as the kinetic depth effect (KDE). KDE, demonstrated by Wallach and O'Connell (1953), shows that people are able to recover 3D form when viewing 2D projections of rotating objects. This effect is an example of how people can perceive depth and shape when motion causes a change in the orientation of an object.
Size is another monocular cue which helps us judge distance. This cue relates to the fact that the size of the retinal image produced by an object will vary in inverse proportion to the distance. One type of size cue which allows us to judge absolute distance is called familiar sizes. This cue takes place when you know the actual size of an object such as an automobile, for example. You will then make a judgment about how far that particular object, or automobile, is away from you by comparing the size you see with the size you know that car to be. Size can be an effective cue to absolute distance, although it has been known to be problematic when the known sizes of objects change. For example, as the actual size of automobiles decreased, there was an increase in the number of rear-end collisions which resulted when drivers did not realize that a small car in front of them was as close as it was. Apparently drivers assume an average vehicle size and use this assumption as the basis for computing the distance from themselves and the car in front of them (Wickens, 1992). Thus this cue is only effective when subjects are very familiar with the absolute size of an object.
Size can also help us judge relative distances. Objects which create a smaller retinal image are perceived to be farther away. In order to judge the relative distance of different objects you need to compare their relative sizes. If you believe two items are of the same size, then if one appears bigger to you than the other, you will think that the bigger object is closer. Perceived object size is also supported by occlusion, with object size estimated by the number of elementary texture units of a background surface occluded by that object (Gibson, 1966). Thus, if you want to maintain an accurate representation of distance by using a relative size cue, you would need to make sure that the texture of the image behind the objects is consistent in order to maintain the relative size cue.
Interposition, or occlusion, also conveys relative depth information and occurs when two opaque objects are in the same line of sight. The non-occluded object is perceived to be closer to the viewer than the partially drawn object (Levine and Shefner, 1991). Occlusion is a very important cue especially in the absence or confusion of other cues (Wickens et al., 1989).
2.3.2 Binocular Depth Cues
Binocular depth cues are those cues which arise from using two separated eyes in order to view the world. Human eyes are separated, and we receive different and overlapping 2D retinal images of the visual scene. This property is called binocular disparity or binocular parallax and it causes the phenomenon of stereopsis (Levine and Shefner, 1991). Our ability to fuse these disparate images produces strong percepts of depth as well as object form. We will perceive disparate images on our two retinas when objects are placed either in front of or behind the horopter. The horopter defines a locus of all points in space which will give us corresponding retinal images for a particular degree of convergence (Patterson and Martin, 1992). All points which fall on the horopter will be perceived as being the same distance away from the observer at a particular focal point; there will be no retinal disparity, thus there will not be any perceived difference in depth among the points. There is an additional area, which surrounds the horopter, known as Panum's Fusion Area. A stimulus which falls within this fusion area will be seen as a single object that is of a different depth than the fixation point distance. Additionally, the disparate images formed on the retina from an object in this area will be fused together. Images formed from objects outside of this area will not be fused and the observer will see double images, or diplopia (Figure 2.1).
Figure 2.1: The Horopter. Adapted from Levine and Shefner (1991).
In order to produce a stereo image on a display, one needs to generate a disparity between two images and then simultaneously present each image to the appropriate eye of the observer. However, there is a limit to the amount of disparity that can be fused by the observer. The largest disparity at which fusion occurs is called the disparity limit of fusion. The disparity limit of fusion covaries directly with the stimulus size or scale and inversely with spatial frequency (i.e., large disparities can be fused only with large low-frequency stimuli). It also turns out that the disparity limit increases with eccentricity (i.e., larger disparities can be fused at larger eccentricities). For example, in the fovea, the disparity limit for a high-frequency image is 10 arcmin, while the limit for the same image at 6 degrees eccentricity is 15-20 arcmin (Patterson and Martin, 1992).
It has been shown that objects which stimulate disparate retinal areas can produce a strong impression of depth. By using random dot stereograms, Julesz (1971) showed how people perceive depth information even in the absence of other cues such as pictorial or motion cues. Thus, people can see depth without having to see surface properties such as lines or shape. However, it has also been shown that our ability to correctly perceive shape by stereopsis alone varies based on the distance that the observer is from the object. Our ability to perceive shape from stereopsis is best when we are at intermediate viewing distances close to 1 meter (Johnston, Cumming and Parker, 1991).
Oculomotor cues also provide us with critical information necessary to determine distances. Oculomotor cues involve combining visual and proprioceptive information from the eye to derive information related to distance. Accommodation is our ability to change the shape of our lens as we change the focus of our visual attention. The cue comes from, through development, our brain processing how much we are using our muscles in order to change the lens, and then correlating that information with how far a given object is away from us. As a distance cue, accommodation works in conjunction with another action of our eye muscles - that resulting from convergence. Convergence occurs when we turn both eyes inward to maintain the appropriate projection of a stimulus onto both foveae. In fact, the amount that the eyes converge, in conjunction with the distance between our eyes, provides a measure of the absolute distance between the observer and the stimulus.
Color, as studied in binocular vision, has been shown to be a cue to depth perception, but the cue can lead to ambiguous results. The effects of brightness, hue and saturation on perceived depth were examined with stereo vision by Egusa (1983). When two achromatic stimuli were shown to subjects, the perceived depth between them increased with increasing brightness differences. However, the side that was judged nearer, either the darker or the brighter side, differed across the subjects. More consistent results were obtained, however, when testing achromatic-chromatic and chromatic-chromatic combinations of stimuli using the colors of red, green and blue with equal saturation. Subjects reported that red appeared to be "nearer" than green or blue, and that green was "nearer" than blue. This effect is often referred to as "chromostereopsis". These results need to be considered within the context of constructing spatial displays. If we are trying to isolate and examine a specific depth cue effect, we must be cautious to not introduce unwanted effects due to these color phenomena. One suggestion may be to use stimuli with less unsaturated colors, and to control for the effects across variables of interest.
2.4 Computer Graphics Displays
Through a combination of both hardware and software features, computer graphics systems have evolved sufficiently to match and augment human perceptual capabilities. Computer graphics capabilities allow the possibility of creating 3D virtual environments which can appear to be quite realistic and, in addition, can improve specific spatial task performance. The additional realism of such environments derives from a re-creation of the many depth cues outlined above within spatial displays. The following is an overview of some relevant computer graphics display characteristics and of prior research on those cues which most directly concern this research. Several of the depth cues mentioned above, such as relative size and occlusion, are considered to be critical to the accurate perception of depth (Wickens et al., 1989); these cues will be supported throughout all the experiments. Others will be treated as experimental variables.
2.4.2 Perspective Displays
A perspective display recreates the linear perspective depth cue by creating a perspective projection of an object onto a view or projection plane. This view, or projection plane, is the screen of the computer graphics display. The projection allows objects which are of a coordinate system of n-dimensions to be transformed into an object which is represented within a less than n-dimensional coordinate system. The projection of a 3D object can then be created by having straight projection rays, called projectors, emanating from a center of projection, passing through each point of the object and intersecting a projection plane to form the projection. The effect of this projection is to allow a 3D object, such as a cube, to be represented on a 2D computer display surface. In a perspective projection, there is a finite distance between the center of projection and the projection plane (Foley et al., 1993).
A plan view display is a flat 2D display format. Previous studies have shown that perspective display formats are superior to plan view display formats for several controller type tasks. The advantage of a perspective view over a plan view is particularly significant for tasks which require the understanding of both horizontal and vertical separation between objects, such as aircraft. Ellis, McGreevy and Hitchcock (1987) had airline pilots monitor an Air Traffic Display to determine if an avoidance maneuver was needed. Pilots took more time to decide about an avoidance maneuver with a plan view, when compared to a perspective view display. The pilots were also more likely to choose a maneuver with a vertical component when using a perspective display, thus offering more options in avoidance strategy.
Bemis, Leeds and Winer (1988) asked subjects to detect threats and select the closest interceptor to a specified target. Their results indicated that perspective displays provide a significant reduction in detection and interception error over the plan view displays. Additionally, the response time for selecting interceptors was significantly reduced with perspective displays. Burnett and Barfield (1991) showed that performance for an altitude extraction task was faster for the perspective display than for the plan view format. Consistent with our goal of developing optimal spatial awareness displays, a perspective display was chosen as the god's-eye-view format to be used in this research.
2.4.3 Stereoscopic Displays
Stereoscopic displays provide the binocular disparity depth cue and may prove to be beneficial in making quick and accurate relative distance judgments. To achieve binocular disparity, stereoscopic displays present two slightly different perspective views of the same visual scene for the left and the right eye. The different perspective views come from slightly changing the camera eyepoint of the visual scene. The camera eyepoint change causes each point in the scene be rotated and translated a specific amount. With stereoscopic displays, the image will appear to have shape and depth and be, in a virtual sense, 3D.
McKenna and Zeltzer (1992) provide an excellent summary and comparison of the key features across a range of stereoscopic displays, including immersive (with head mounted displays), non-immersive (high resolution, stereographic ready monitors with LCD shutter glasses) and autostereoscopic displays (defined as a 3D visual display that does not require viewing aids). Some of the dimensions along which these various displays differ are spatial resolution, field of view and angular resolution. Angular resolution is the visual angle that a pixel subtends from a given viewing distance; it has both horizontal and vertical extents.
The resolution objective of image display technologies is to match the angular resolution of the human eye. Under the absolute best conditions (a stimulus with a high degree of contrast viewed under optimal lighting conditions), the acuity threshold of our visual system, at the fovea, is measured to be 1 minute of arc (Boff and Lincoln, 1988). At this time, high-resolution stereographic monitors with LCD shutter glasses come closest to matching the human visual acuity threshold (McKenna and Zeltzer, 1992).
Studies in human-computer interaction have shown that stereoscopic displays can be effective in providing a salient 3D percept in near field viewing (Surdick, Davis and King, 1994). However, objects which are perceptually placed far away from the viewer are influenced less by stereopsis (Wickens et al., 1989). A number of studies have also compared stereoscopic viewing to monoscopic viewing for a variety of specific tasks. Pepper, Smith and Cole (1981) suggest that stereoscopic viewing offers advantages in interpreting highly complex scenes. When monoscopic cues were degraded, stereoscopic viewing provided significant improvements in performance for a range of remote teleoperational tasks.
Yeh and Silverstein (1992) evaluated response time and accuracy of distance and altitude spatial judgments. They found that the addition of disparity, within a perspective display format, augmented the discrimination of relative depth along the line of sight and hence improved performance in contexts in which discrimination through monocular depth cues was either ambiguous or less effective. The addition of disparity improved subjects' overall response times as well. Kim, Tendrick and Stark (1991) found that stereoscopic displays generally permit superior pick and place performance. Their results show that relatively similar performance can be achieved between monoscopic and stereoscopic viewing if particular viewing enhancements are provided in the monoscopic conditions. The enhancements used in their experiment consisted of reference drop-lines and reference lines with a grid. Stereoscopic viewing did, however, show a significant benefit over monoscopic viewing when these visual-enhancement depth cues were not present. Thus stereo viewing could be advantageous when viewing enhancements such as drop-lines may not be available.
Wickens (1991) suggests that a 3D map representation may help to preserve spatial awareness. He notes that stereopsis would particularly useful if the display is static or changing only slowly, although he believes the benefit of stereopsis may be reduced if key monocular cues (such as perspective, occlusion and texture) are also available. Sollenberger and Milgram (1993) believe stereoscopic displays may prove to be important (even more so than rotational displays) for tasks involving the visualization of relative spatial locations of objects. Due to the results of these studies, stereoscopic viewing was chosen as an independent variable for the evaluation of relative distance judgments in the present study.
2.4.4 Rotating Displays
Image rotation is a computer graphics feature which may be beneficial for the spatial understanding of a 3D environment. One benefit of rotation is the ability to view a scene along a variety of different axes. This capability becomes important since the projection of a 3D space onto a 2D screen, or our 2D retinae inheritantly removes stereoscopic information regarding one of the three axes of the space. Rotation provides access to the missing axis of a perspective display.
Image rotation should be considered in conjunction with other standard geometric transformations such as translation and scaling. In order to provide a convincing and accurate dynamic visual scene, these transformations properly place, size and orient objects. These transformations occur when the coordinates of an object are multiplied by a matrix containing specifically placed values representing the desired transformation. If the object's points are expressed as homogeneous coordinates, all three transformations can be accomplished through one 4 X 4 matrix. A homogeneous coordinate is represented as (x, y, z, W), instead of (x, y, z). The point is homogenized so that W is equal to 1. Thus, the points become (x/W, y/W, z/W, 1) (Foley et al. 1993).
The 3D coordinate system used in most computer graphics systems is right handed. By convention, positive rotations in a right-handed system occur when a 90 degree counterclockwise rotation will transform one positive axis to another. If one is looking down the z axis, for example, a positive 90 degree rotation about the y axis would move the z axis onto the x axis. In order to rotate objects about the x, y, and z axes, different matrix configurations are required in order to accomplish the rotation objective. For example, in order to rotate an object 90 degrees about the x axis, all homogenized points of the object would be multiplied by the following matrix:
| 1 0 0 0 |This matrix assumes no other transformations, such as scaling or translation. For 90 degree rotations about the y or z axis, the cos and +sin values would have different positions in the matrix (Foley et al., 1993).
Rotatex(90) = | 0 cos 90 -sin 90 0 |
| 0 sin 90 cos 90 0 |
| 0 0 0 1 |
The effects of rotation on perceptual tasks have received very little attention in the literature. One exception is a study done by Sollenberger and Milgram (1993) in which subjects were asked to recognize the physical connections between objects, as a function of rotation and binocular disparity. Using a 3D path tracing task, they evaluated the effects of a rotating object (either manually or automatically) versus seeing it stereoscopically. In their first experiment, subjects controlled the rotation while performing the 3D path tracing task. They found main effects for disparity and motion with a marginally significant interaction between the two factors. The ability to rotate the display proved to be significantly better for performing the task than having stereoscopic viewing alone.
Their second experiment crossed rotational speed (fast, slow) with disparity (absent, present) and rotation type (optional, continuous). The results revealed, of all the conditions, subjects did best with slow, continuous rotation while viewing the scene stereoscopically. In this study, they also found main effects for disparity and speed of rotation. Increased rotation speed was found to reduce the advantages of this feature. In a third study, Sollenberger and Milgram (1993) compared continuous motion with providing the participant multiple viewing angles. The results of this third study indicate that motion alone was no better than stereo with multiple viewing angles.
Rotation can also be incorporated into electronic maps. Wickens (1991) suggests that rotation can be an ideal feature for an electronic map. He suggests that axis alignment, through map rotation, is a map representation which may preserve spatial awareness. This suggestion is congruent with experiments that suggest a benefit of track-up maps for specific tasks (Harwood and Wickens, 1991). Such maps were shown to be especially helpful when trying to re-orient oneself in an unfamiliar setting or situation. Also, when one is trying to gain a specific perspective or viewing orientation in order to accomplish a specific task, being able to physically rotate the map would remove the time and cognitive processing needed to do a mental rotation of the terrain (Shepard and Hurwitz, 1984; Aretz, 1989). Rotation may also prove to be beneficial in solving problems of perceptual distortions found in spatial displays as described below in Section 2.5.
2.4.5 Head-Motion Tracked Displays
Motion parallax can be created within a spatial display by tracking the head movements of the observer and then appropriately changing the viewpoint of the computer generated image in real time. When tracking lateral head movements, updates to the viewpoint will cause objects closer to the observer to be displaced by a greater distance than those objects farther away. Objects which are at a similar apparent distance from the observer will be displaced by an approximately equal distance. Head tracking also provides the user access to additional viewing perspectives of the scene. The change in perspective is approximately equal to a rotation by the number of degrees resulting calculated as:
Rotation = Atan (Dy / Dx)
where Dy is the distance that the observer has displaced his or her viewpoint, and
Dx is the distance between the observer's eyepoint and the position of the object within the visual scene (Akka, 1993).
A number of different technologies used to track spatial position (Meyer, Applewhite and Biocca, 1992). One popular implementation uses electro-magnetic techniques. An example is the 3Space system made by Polhemus Inc. which consists of a magnetic emitter, a small sensor receiver and a control unit. The emitter is composed of three orthogonal copper coils that emit a magnetic field when fed a current. The sensor receiver also contains three orthogonal coils which produce a current when moved through a magnetic field. The magnetic field produced by the emitter induces current in the sensor coils. Sensor position and orientation information is gathered by calculating the changes in current that occur as the sensor changes its location relative to the emitter (Meyer, Applewhite and Biocca, 1992).
There have not been many studies reported which assess head-tracking's benefit in performing a perceptual task, in a non-immersive environment. Ware, Arthur and Booth (1993) assert that head tracking was effective in helping subjects perform a 3D path-tracing task. Ikehara, Cole and Merritt (1993) were interested in understanding how motion parallax could affect a subject's performance in a rapid sequential positioning task. Motion parallax was either available or not available in either a monoscopic or a stereoscopic viewing condition. Motion tracking was the only experimental factor which improved performance for tasks which required only visual skills. These tasks were the total time to spot a target and the number of targets not spotted. This led the experimenters to conclude that motion tracking's major contribution is to improve the initial percept of the target. After the initial percept of the target, motion tracking's advantages are less clear, since motion can, on one hand, improve path planning but on the other hand could interfere with the subject's ability to touch a target.
It has also been reported that one of the perceptual distortions of 3D displays is an incorrect understanding of the relative distance between two objects which are parallel to the front plane of the display (McGreevy and Ellis, 1986). The two seemingly parallel objects may be perceived as having the same Z distance yet are separated by some actual distance. Motion parallax may be useful in determining if one object is in fact farther away.
Again, given the work cited above the first study will evaluate the effectiveness of head-motion tracking, both in isolation and in concert with stereoscopic viewing and image rotation, in helping one to make both accurate and rapid relative distance judgments.
2.5 Considerations in Creating and Viewing Spatial Displays
Although computer-generated environments can increase the performance of particular tasks, one must be aware of specific geometric distortions that occur, specifically within perspective displays. These distortions in spatial perspective representations can effect the accuracy of spatial judgments made within the computer environment. The research reviewed below indicates that people have particular biases when making spatial judgment responses; thus, human factors engineers may need to compensate for these biases when designing displays.
2.5.1 Geometric Field of View
As mentioned above, a perspective projection is created by projecting each point in the 3D scene from a single center of projection or station point. These projectors, which begin at the center of projection, and pass through each point, also intersect the picture plane. As shown in Figure 2.2, the picture plane, by computer graphics convention, is placed between the scene and the center of projection.
Figure 2.2: Viewing Frustum
Each edge of the image or picture plane, together with the station point, defines an edge clipping plane. These edge clipping planes partly determine what is visible and not visible within the display. In addition, the near and far clipping planes also determine what points of the image will be shown in the viewport. The shape of the visible volume of display space created by the edge clipping planes, and the near and far clipping planes form the viewing frustum (Figure 2.2). This frustum is a truncated pyramid whose apex is the center of projection. The geometric field of view is defined as the visual angle of the image as seen from the center of projection (apex) to opposite edge clipping planes. The geometric field of view for a display actually consists of two components, a horizontal and vertical field of view. By selecting a specific horizontal and vertical field of view, one is defining the shape of the viewing frustum (McGreevy and Ellis, 1986).
McGreevy and Ellis (1986) found differences in task performance in a study to determine the effect of different geometric fields of view on a person's ability to determine the azimuth and elevation angle between a target and an object within a non-immersive display. They studied four different geometric fields of view, holding vertical and horizontal fields of view equal. A geometric field of view of 30 degrees produced a more telephoto type of image, whereas a 120 degree field of view produced a more wide-angle type of image. Of the four geometric fields of view tested (30, 60, 90 120), a 60 degree geometric field of view was found to produce the least overall azimuth judgment error. This was an overall estimation of the best field of view to use for a perspective display, for even within the 60 degree condition, subjects' accuracy varied sinusoidally with the angle at which an object was placed in relation to the target. Azimuth judgment errors were lowest when the object appeared along a major meridian axis (+90, 0, +180) from the target object.
2.5.2 Eyepoint Elevation Angle
The eyepoint elevation angle, EPEA, is described as the vertical angle between the center of projection and the central point of a reference object within the stimulus image (Figure 2.3).
Figure 2.3: Eyepoint Elevation Angle
The EPEA can be calculated as follows:
EPEA = Atan (Dy / Dx)
where Dy is the absolute height difference between the reference object and the eyepoint, and Dx is the absolute horizontal distance between the reference object and the eyepoint (Hendrix and Barfield, 1995).
Changes in eyepoint elevation have a dramatic effect upon the amount of vertical and depth information that an image can provide. As the eyepoint elevation increases from 0 to 90 degrees (top-down viewing), the amount of vertical compression in the scene also increases and less information is presented with respect to the vertical dimension of the perspective display. Likewise as the eyepoint elevation decreases from 90 to 0 degrees, the amount of information along the depth dimension becomes increasingly compressed (Hendrix and Barfield, 1994).
A number of studies have examined the effects of EPEA on a range of perceptual tasks. Kim, Tendrick and Stark (1991) evaluated the EPEA effects on a human operator's pick-and-place performance with a monoscopic perspective display. Their experimental results showed that as the elevation angle approached either 0 or 90 degrees, the mean completion time for the task increased. They attribute this increase in time to the loss of position information along one axis. Task performance was best for elevation angles between 30 to 60 degrees. In another study, Kim et al. (1987) showed that for a 3D tracking study, a monocular perspective format with a 45 degree elevation angle (with vertical reference lines) produced performance as accurate as did a stereoscopic presentation.
To further exemplify the tradeoff between providing either vertical or depth information, Yeh and Silverstein (1992) asked subjects to make both altitude and depth judgments as quickly and accurately as possible while they manipulated eyepoint elevation. For the depth judgment, subjects were asked to determine which of two objects was closer along the z axis; for altitude judgments, subjects determined which of two objects was higher above the ground plane. Their results showed that depth judgments were slightly faster at the 45 degree orientation when compared to 15 degrees, yet altitude judgments were much slower at the 45 degree orientation when compared to the 15 degree orientation. Based on the above studies, a compromise must be made if both depth and altitude cues are to be conveyed within a single static display. A reasonable alternative may be to present the perspective display at a 30 degree eyepoint elevation in order to accommodate judgments along both axes.