Depth Cues in the Human Visual System
Author: Marko Teittinen
The human visual system interprets depth in sensed images using both
physiological and psychological cues. Some physiological cues require
both eyes to be open (binocular), others are available also when
looking at images with only one open eye (monocular). All
psychological cues are monocular. In the real world the human visual
system automatically uses all available depth cues to determine
distances between objects. To have all these depth cues available in
a VR system some kind of a stereo display is required to take
advantage of the binocular depth cues. Monocular depth cues can be
used also without stereo display.
The physiological depth cues are accommodation, convergence,
binocular parallax, and monocular movement parallax. Convergence
and binocular parallax are the only binocular depth cues, all others
are monocular. The psychological depth cues are retinal image size,
linear perspective, texture gradient, overlapping, aerial
perspective, and shades and shadows.
Accommodation is the tension of the muscle that changes the focal
length of the lens of eye. Thus it brings into focus objects at
different distances. This depth cue is quite weak, and it is
effective only at short viewing distances (less than 2 meters) and
with other cues.
When watching an object close to us, our eyes point slightly inward.
This difference in the direction of the eyes is called convergence.
This depth cue is effective only on short distances (less than 10
As our eyes see the world from slightly different locations, the
images sensed by the eyes are slightly different. This difference in
the sensed images is called binocular parallax. Human visual system
is very sensitive to these differences, and binocular parallax is the
most important depth cue for medium viewing distances. The sense of
depth can be achieved using binocular parallax even if all other depth
cues are removed.
Monocular Movement Parallax
If we close one of our eyes, we can perceive depth by moving our head.
This happens because human visual system can extract depth information
in two similar images sensed after each other, in the same way it can
combine two images from different eyes.
Retinal Image Size
When the real size of the object is known, our brain compares the
sensed size of the object to this real size, and thus acquires
information about the distance of the object.
When looking down a straight level road we see the parallel sides of
the road meet in the horizon. This effect is often visible in photos
and it is an important depth cue. It is called linear perspective.
The closer we are to an object the more detail we can see of its
surface texture. So objects with smooth textures are usually
interpreted being farther away. This is especially true if the
surface texture spans all the distance from near to far.
When objects block each other out of our sight, we know that the
object that blocks the other one is closer to us. The object whose
outline pattern looks more continuous is felt to lie closer.
The mountains in the horizon look always slightly bluish or hazy. The
reason for this are small water and dust particles in the air between
the eye and the mountains. The farther the mountains, the hazier they
Shades and Shadows
When we know the location of a light source and see objects casting
shadows on other objects, we learn that the object shadowing the other
is closer to the light source. As most illumination comes downward we
tend to resolve ambiguities using this information. The three
dimensional looking computer user interfaces are a nice example on
this. Also, bright objects seem to be closer to the observer than
Okoshi, T., Three-Dimensional Imaging Techniques, Academic Press, New
Human Interface Technology Laboratory