2. INTRODUCTION

2.1. Need

While a large amount of literature exists regarding the physiological issues of HMD systems with tracking, little empirical work has been published regarding systems where head-tracking is not performed. This seems to suggest that the advantages supplied by the HMD indicate that head-tracking must accompany them. However, no correlation has been derived to support this notion. This report is intended to provide a first-pass set of empirical evidence on a tracking task using a telerobotic vision system displayed on either an HMD or a panel-mounted visual display. From the study's data, an understanding of the limitations and benefits of display method is reviewed.

2.2. Problem

The use of Head-Mounted Displays (HMDs) has been commonly designed into man-machine systems in which head-tracking contributes to the task of interest. Is this the only benefit of using HMDs? This study removes the tracking aspect from the HMD in order to isolate differences of display type (HMD vs. panel display) on user performance in a tracking task.

2.3. Review of Related Literature

The apparatus used in this experiment consists of a joystick as the sole means of input control while comparing the performance in a tracking task displayed on a head-mounted display (HMD) and a static console display. Aspects of each device have been previously studied to assess their worth in providing human interaction with computers. This section individually analyzes the output devices (HMD and fixed display) and the input device (spring-loaded joystick).

2.3.1 Analysis of Visual Output Devices

Advantages of HMDs

There are potentially many advantages of using HMDs over static displays. Military aircraft have been using HMDs for over 20 years to assist fighter pilots in maintaining visual awareness outside the cockpit and facilitating quick decision-making in air-to-air combat. HMDs may "display information directly in the line of sight of the pilot, focused at infinity and overlaid on the outside world, such that the pilot can acquire and interpret the information without constant shifting of visual accommodation and attention from the outside world to the instruments below the glare shield and back again" [Taylor 1992]. Commercial aviation HUDs will eventually be made smaller and moved to either a helmet or a pair of specially designed lightweight glasses connected via a fiber optic to a symbol generator.

HMD vs. Static Display

Arthur and Booth [1993] conducted experiments comparing HMDs with tracking and fixed displays with head tracking in a tree tracing task. After gaining experience viewing 3D graphical objects on both displays with the head tracking capability to change their perspective, data from a survey was collected before conducting the experiment. From the survey, the majority of the users preferred head coupling without stereoscopic viewing over stereoscopic viewing without head coupling. It was also noted that the head coupling apparatus was awkward for the subjects to use, and that the actual task would dictate its usability. Although the survey placed little importance on stereoscopic viewing, the experiment results showed that head coupling with stereo was the fastest, while head coupling without stereo was significantly slow. As an addendum to their experiment, the effects of lag and frame rate on performance were considered only for the case of head coupling with stereoscopic viewing. Lag is due to processing the motionary input and rendering the output; network lag did not need to be considered, since the workstation was isolated from any network. The results suggest that for similar tree tracing tasks, lag above 210 milliseconds will produce a decline in performance. In our study, lag does not become a distraction since the telerobotic visual system does not suffer from network traffic.

Perspective Views

Our study only concerns itself with tracking in two-dimensional space. Typical HMD applications tend to display three-dimensions. If this extra dimension were to be considered in our tracking task, a number of additional factors are exemplified by the following studies.

In an aircraft landing study, binocular cues, such as convergence and stereopsis, were found to only be effective in the last few seconds of landing the aircraft; monocular cues seem to be the important ones for visual space perception during approach to landing [Dorfel 1982]. In addition, near-visual response, particularly lens accommodation, results in a minified retinal image leading to the runway appearing more distant and a perception of undershooting [Randle, Roscoe & Petitt 1980]. The most important cues for pilots during final approach and landing are retinal image size (reflected in distance and height judgment), shape of the runway (which affects slope angle judgment), and motion parallax (which gives the time history of these factors). Results so far show that pilot subjects were able to judge distance, height and slope angle from computer generated landing approach scenes with remarkably good accuracy [Dorfel 1982]. Studies have shown that, at large viewing distance, size constancy is the most accurate method of depth perception, with movement parallax becoming preferably at medium range, and the binocular cues taking over at fairly close range [Buffett 1986].

The most prevalent problem normally associated with flat-panel image displays is the two-dimensional viewing of a three-dimensional object. Textures on objects have typically been used to establish depth and to provide anchors. This has mainly been helpful in colorized imagery, but not as effective in gray-scale imagery [Dorfel 1982].

Field of View

Alfano and Michel [1990] reported that restricting the normal field of view leads to perceptual and visuomotor decrements. A series of eye-hand coordination tasks were performed using goggles that restricted the field of view to 9, 14, 22 and 60 degrees. Although Pelli [1986] proposed that a 22 degree restriction would cause no measurable performance decrements, Alfano and Michel confirmed Dolezal [1982] reports that performance suffers even at these field of view restrictions. While the 60 degree field of view restriction yielded significantly better performance than the others, all of the degree restrictions chosen caused a sense of disorientation in the subjects' depth and size judgment. Studies conducted by the Naval Training Equipment Center in the early 1980's conclude that the maximum single-channel field-of-view acceptable for flight applications is about 90 degrees when displayed on a virtual or flat screen display [Chambers 1982]. Increasing the amount of peripheral information (by increasing the field of view) allows the subject to construct an overlapping sequence of fixations in memory, which aids visuomotor performance.

Area of Interest

Area of interest processing is the computational method of enhancing an image in the area that a viewer is looking [Chambers 1982]. Developed for field-of-fire trainers by military contractors, and used effectively in the British Army Training Command, this system would allow the viewer to see the detail required near the target, while significantly less computing was performed on the scene out of the central field-of-view, referred to as the area of interest [Chambers 1982]. The area of interest is approximately 60 degrees, and within it is the foveal focus field-of-view, approximately 2 degrees of arc. Reduction in the scene content (number of polygons) outside the area of interest reduces the amount of computer power required while providing maximum fidelity to the scene where needed. When the viewer is looking in another direction eye tracking devices would drive the computer image generator to enhance the area of interest on the display where the viewer is looking while reducing the content of the non-viewed area of the display. This reduces computational time and the excess would be used to preprocess information for the field-of-view being enhanced.

The area of interest in our experiment is the tracking cursor that is always positioned in the center of the display. No other objects on the foreground of the path distract the subjects from this area of interest, and the cursor is always clearly in view.

Peripheral Information

But, displays are limited in representing a wide field of view. If the actual space is scaled proportionally to the monitor's display space, the display resolution may sacrifice detail. However, if only a partial view of the actual space is represented in the display, tracking tasks may submit to the same problems of small field of view. However, Flach, Hagen and O'Brien [1990] reported that the proportional mapping of actual space to display space was a safer choice in tracking tasks than non-linear mappings. In their experiment, comparisons of performance in a positioning task were made between three different mappings of visual display to movement space. In the normal display condition, displayed distance between targets was proportional to the actual distance. In the split screen condition, 66.5% of the initial distance to the target was mapped to half of the visual space and the remaining 33.5% of the distance (containing the target) was mapped to the other half of the visual space. Finally, in the logarithmic condition, there was a logarithmic mapping from actual to visual space. Subjects were instructed to position a cursor to a target using only horizontal movements. The width of the target varied in three defined sizes. The results indicate that performance is not significantly different for the three display mappings when pursuing the largest target. As the size of the target decreased, linear mappings tended to yield better performance.

Color

Eye fatigue has been a major problem in using HMDs and can partially be attributed to poor color choices. Cyan, white, and green have been proposed as the best colors to use in a display. These colors provided easily distinguishable sharp images. Magenta was found to be the most uncomfortable color, being somewhat harsh, and at first sight "de-focused" [Dorfel 1982]. Strongman [1982] conducted similar studies using seven colors in the display and found the same results, adding that other colors sampled were acceptable. These additional colors were sufficient for a pictorial format. For long-term viewing, the colors chosen to present normal flight information must be balanced to prevent one color appearing more "harsh and demanding than the others" [Strongman 1982]. Our study avoided this issue by utilizing a black path on a white background in sufficient and consistent lighting.

2.3.2 Tracking Task via Joystick Control

In many of the studies involving tracking, horizontal movement along a single, discrete axis was analyzed. The motivation for one-dimensional movement is based on Fitts' Law in which this is a requirement. Fitts' Law states that the time, to acquire a target is logarithmically related to the distance A over the target width W: t = a + b log2 (2A / W), where a and b are empirical constants determined through linear regression, and log2 is the log base 2 function [Fitts 1954]. Although the method of input is not incorporated in Fitts' Law, the "feel" of an input device is extremely important in determining a device's appropriateness and acceptance in a particular context of an application. In constrained linear motions, the mouse, trackball and joystick are the most appropriate choices for input [Baecker and Buxton 1987] in tracking tasks. In addition, Buxton [1987] reported that panning is easier with a trackball than a spring-loaded joystick, because the speed of panning is proportionally mapped to speed of cursor display; however, if the tracking must be constrained along a specified path or line, some unnecessary wrist movements may make the linear motions difficult.

Although MacKenzie and Buxton [1992] unsuccessfully attempted to apply Fitts' Law to two-dimensional space, the track in our study can be viewed as a sequence of horizontal and vertical segments, whose vertices form starting points and targets in a set of single axis tracking tasks. The target width would simply be the width of the line. Therefore, an approximation of time to complete the task could be computed by summing the times to reach one corner from another corner. It should be noted that the time to trace a horizontal line may be faster than tracing a vertical line of the same length when using the joystick method of input. Yamishita and Matsuura [1987] discovered a superiority of performance in tracking a target in the x-dimension over one in the y-dimension. They reasoned that the muscles involved in activating the joystick would have an impact on the comfortability of movement. Motion in the x-dimension of the joystick uses the radial and ulnar muscles in the wrist, whereas the y-dimension involves mainly the dorsi and plantal flexions. The effects of these motor limitations accompany mental and cognitive processes in the tracking performance.

Despite the motor limitations, tracking can be improved with practice to achieve a level of stability [Card, English and Burr]. Sufficient practice, such that learning is complete (i.e. a stability is reached), separates the subject's ability component from the subject's learning component [Alvares and Hulin 1972]. Bliss, Kennedy, Turnage, and Dunlap [1991] attribute possible correlation between video game performances and tracking tasks to the practice achieved in using the input device. The choice of the video game influences the amount of practice needed - some video games will have no significant effect on tracking performance. In their study, several video games were rejected during practice, because performance never reached stability.

2.4. Statement of Hypothesis

In a remotely controlled two-dimensional tracking task with fixed field of view, the frequency and severity of course deviations errors is not affected significantly by the type of display system, either panel-mounted or helmet-mounted.

Continue

Table of Contents