A Virtual Retinal Display For Augmenting Ambient Visual Environments

by Michael Tidwell

[Previous Chapter][Table of Contents][Next Chapter]

Chapter 3: Characteristics of an Ideal Augmented Vision Display

3.1 Introduction

The augmented vision display visually augments physical reality. It is therefore desirable that the display be able to mimic reality, or substitute for it, on demand. The ideal display, therefore, matches the visual acuity of the human eye. Also, the field of view of the ideal augmented vision display corresponds to the field of vision of the human visual system. Furthermore, the display should be comfortable when worn by the viewer making form factor and other ergonomic considerations important.

3.2 Field of View

The field of vision of a single static human eye is approximately 140 [deg.] from the nasal to the temporal limits of vision [8]. The vertical field of vision is approximately 90 [deg.]. The highest resolution of the visual system occurs over a region only 2-4 [deg.] in extent. This region, called the fovea (also called the "focal" visual system), contains mostly cone receptors which are responsible for discerning detail and color in daylight vision. The remaining receptor area (sometimes called the "ambient" visual system) contains mostly rod receptors and detects motion and other spatial information. The rods cover a region from a few degrees off the axis of the eye to 70 [deg.] away from the axis. One might mistakenly assume that a display could be designed so that the center portion of the display is "high" resolution and so that the periphery is "lower" resolution. However, resolution in the peripheral field of view is necessary for accurate detection of motion [8] and to control of saccadic eye movement. Some manufacturers offer displays with higher resolution inset areas and lower resolution backgrounds. The high resolution inset moves according to eye position. The registration of an inset with actual eye position can be difficult and the lower resolution periphery degrades the viewer's motion and visual search ability. The ideal display, however, allows for the fact that the eye rotates about its axis, or gimbals, as a person looks around a scene. In fact, the ideal display matches the resolution of the eye over the entire field of vision (i.e. one minute of arc resolution over a 140 [deg.] horizontal monocular field as shown previously).

3.3 Resolution

Under ideal illumination and contrast conditions, the angular resolution of the focal (or foveal) vision system is approximately one arc minute. This means that the human eye can detect objects separated by approximately one arc minute in bright light of around 104 [cd/m2]. For example, normal, or "20/20" vision can discern objects separated by about 1.5 [mm] at a distance 5 [m] from the viewer. To achieve roughly one minute of arc resolution over 140 [deg.] horizontal by 90 [deg.] vertical, the ideal display should contain 8400 horizontal and 5400 equivalent "pixel" points [8].

3.4 Estimated Retinal Illuminance and Contrast

3.4.1 Overview

The interrelated issues of estimated retinal illuminance and contrast are treated together in this section. The ideal display luminance and modulation contrast relative to the outside environment are derived. Some background information is presented to facilitate the understanding of the relationship between modulation contrast and estimated retinal illuminance.

3.4.2 Ambient and Display Estimated Retinal Illuminance

For a see-through display with total estimated retinal illuminance IT:

IT = ID + IA

where ID = estimated retinal illuminance contributed by the video display and IA = estimated retinal illuminance contributed by the ambient light (from outside environment). A graphical representation of the previous relationship is shown in Figure 3.1.

Figure 3.1: Graphical representation of estimated retinal illuminance from ambient light and display light vs. arbitrary display coordinates.

In Figure 3.1, IA is represented as the average ambient estimated retinal illuminance.

An average is assumed for purposes of analysis in this section but the most general scene will have non-constant luminance across its field and will contribute a non-constant estimated retinal illuminance across the retina. The case of non-constant estimated retinal illuminance is more difficult in the analysis and should be a separate exercise. Also,

IA = R x pupil area (mm2) x scene luminance (cd/m2) x TC

where R is the Stiles-Crawford effectivity ratio as in Section 2.3.1 and TC is the transmittance of the combiner element in the display. Also, in the case that the eye's pupil is smaller than the exit pupil of the display,

ID = R x pupil area (mm2) x display luminance (cd/m2) x (RC)eff

where (RC)eff = the effective reflectance of the combiner element in the display. Extending this concept to a retinal scanning display, the optical power per unit steradian can be calculated as,

(ID)VRD = pupil area [mm2] display radiance [W/sr-m2] Vl

where Vl = the photopic relative luminosity value for conversion from radiometric quantities to photometric quantities, and the display area is the area of the exit pupil.

For example, if the radiometric power measured in the exit pupil by a photodetector is 200 [nW] at 650 [nm] wavelength and the horizontal and vertical fields of view are 40 [deg.] and 30 [deg.] respectively, then,

ID = 200 10-3(2p) (30/180) (40/180) (0.11)

= 5.1 10-3 [trolands].

3.4.3 Contrast, Contrast Ratio, and Estimated Retinal Illuminance

The ideal augmented vision display will have a contrast, C, between the display and the ambient scene above a certain critical value, Cc. In other terms,

(ID - IA)/ID Cc.

To see the dimmest portions of the video display satisfactorily, the previous expression should be:

(IDmin - IA)/IDmin Cc.


(1 - Cc) IDmin > IA.

Usually Cc 0.2 for text and alphanumeric information and Cc 0.96 for high information content images [50]. In the latter case,

IDmin 25 IA

and in the former,

IDmin 1.25 IA.

Furthermore, if the display contrast ratio is CR, and Cc = 0.96 then,

IDmax = (CR) IDmin 25(CR) IA

and for the case of CR = 100,

IDmax 2500 IA.

In a bright daylight scene of 104 [cd/m2],

IDmax 2.5 x 107 [cd/m2].

4.9 x 107 [cd/m2] is near possibly damaging levels for the eye. It becomes apparent that some ambient light must be filtered to view high information content images in bright ambient light with a see-through display. The fundamental reason is that bright daylight is the upper operating range for the eye in terms of brightness and any high information content image must be much brighter than the ambient light. There is significantly less problem viewing text and alphanumeric data satisfactorily. There is also less difficulty viewing images under controlled lighting conditions (i.e. indoors).

3.5 Color

The color characteristics of the ideal see-through display correspond to the CIE chromaticity primary wavelengths of 650 [nm] for red, 530 [nm] for green, and 460 [nm] for blue. These wavelengths for the red, green, and blue channels respectively allow for the greatest color saturation and most on-balance white.

3.6 Binocular Stereo Overlap

Ideally, a see-through display has binocular overlap matching the 120 [deg.] overlap of the between human eyes [25]. The display built for this thesis is, however, monocular, and binocular related issues are not discussed in depth.

3.7 Ergonomics

Any head mounted display such as an augmented vision VRD must be worn by the user. Weight is an important consideration as fatigue may become an issue after extended use for a system that is too heavy. Also the moment, or torque, in all directions, about the center of inertia of the head-spine system is important. For example, a one pound display produces half the moment if the distance from the display's center of inertia to the head-spine system's center of inertia is six inches versus twelve inches. Some applications, medical applications for example, will have more stringent weight requirements than others. More scientific analysis by qualified engineers and scientists is required to fully understand ergonomics of augmented vision head mounted displays as they relate to individual applications.

3.8 Degree of Augmentation

A unique characteristic of the augmented vision display is that it could be switched from completely transmissive (all real environment and no virtual environment) to completely opaque (no real environment and all virtual environment). The ideal display has the capability of switching fully from one state to the other and all points in between.

3.9 Variable Accommodation

The ideal see-through display focuses each resolution element independently.

The display then has variable accommodation. People have both convergence and accommodation cues which dictate depth perception. Discrepancy between the two in a display can cause disorientation and even illness as demonstrated in flight simulators [47]. A display with variable accommodation removes the discrepancy by matching the accommodation to the convergence cue at each resolution element location.

3.10 Summary of Performance Requirements

A summary of system requirements for the ideal augmented vision display would be as follows (Table III.1):

Table III.1. Performance characteristics for an ideal augmented vision display.

Performance Characteristic
Horizontal Monocular Field of View
140 [deg.]
Vertical Monocular Field of View
90 [deg.]
Horizontal Binocular Field of View
180 [deg.]
Vertical Binocular Field of View
90 [deg.]
Binocular Stereo Overlap
120 [deg.]
Angular Resolution
1 [arc min.]
Horizontal Pixel Elements (Monocular)
Vertical Pixel Elements (Monocular)
Estimated Retinal Illuminance
0 - 105 [trolands]
Red = 650 [nm]

Green = 530 [nm]

Blue = 460 [nm]
Variable from 0-100%

The display designer must decide what is useful versus what is possible. A treatment of potential applications for an augmented reality VRD in Chapter 5 sheds light on what may be useful. Fortunately, acceptable performance in many applications is far less demanding than the ideal. In fact, in many applications the performance requirements for an augmented vision system are less demanding than those for an inclusive, or non-see-through, system where the entire scene is computer generated.

Human Interface Technology Laboratory