Depth Cues in the Human Visual System

Author: Marko Teittinen

The human visual system interprets depth in sensed images using both physiological and psychological cues. Some physiological cues require both eyes to be open (binocular), others are available also when looking at images with only one open eye (monocular). All psychological cues are monocular. In the real world the human visual system automatically uses all available depth cues to determine distances between objects. To have all these depth cues available in a VR system some kind of a stereo display is required to take advantage of the binocular depth cues. Monocular depth cues can be used also without stereo display.

The physiological depth cues are accommodation, convergence, binocular parallax, and monocular movement parallax. Convergence and binocular parallax are the only binocular depth cues, all others are monocular. The psychological depth cues are retinal image size, linear perspective, texture gradient, overlapping, aerial perspective, and shades and shadows.

Accomodation

Accommodation is the tension of the muscle that changes the focal length of the lens of eye. Thus it brings into focus objects at different distances. This depth cue is quite weak, and it is effective only at short viewing distances (less than 2 meters) and with other cues.

Convergence

When watching an object close to us, our eyes point slightly inward. This difference in the direction of the eyes is called convergence. This depth cue is effective only on short distances (less than 10 meters).

Binocular Parallax

As our eyes see the world from slightly different locations, the images sensed by the eyes are slightly different. This difference in the sensed images is called binocular parallax. Human visual system is very sensitive to these differences, and binocular parallax is the most important depth cue for medium viewing distances. The sense of depth can be achieved using binocular parallax even if all other depth cues are removed.

Monocular Movement Parallax

If we close one of our eyes, we can perceive depth by moving our head. This happens because human visual system can extract depth information in two similar images sensed after each other, in the same way it can combine two images from different eyes.

Retinal Image Size

When the real size of the object is known, our brain compares the sensed size of the object to this real size, and thus acquires information about the distance of the object.

Linear Perspective

When looking down a straight level road we see the parallel sides of the road meet in the horizon. This effect is often visible in photos and it is an important depth cue. It is called linear perspective.

Texture Gradient

The closer we are to an object the more detail we can see of its surface texture. So objects with smooth textures are usually interpreted being farther away. This is especially true if the surface texture spans all the distance from near to far.

Overlapping

When objects block each other out of our sight, we know that the object that blocks the other one is closer to us. The object whose outline pattern looks more continuous is felt to lie closer.

Aerial Perspective

The mountains in the horizon look always slightly bluish or hazy. The reason for this are small water and dust particles in the air between the eye and the mountains. The farther the mountains, the hazier they look.

Shades and Shadows

When we know the location of a light source and see objects casting shadows on other objects, we learn that the object shadowing the other is closer to the light source. As most illumination comes downward we tend to resolve ambiguities using this information. The three dimensional looking computer user interfaces are a nice example on this. Also, bright objects seem to be closer to the observer than dark ones.

Further Information

Okoshi, T., Three-Dimensional Imaging Techniques, Academic Press, New York, 1976.

Human Interface Technology Laboratory