Most people also have two ears. This is the main reason for the appeal of stereophonic sound. Just as two visual perspectives make a 3D view, two audio perspectives can make a 3D soundscape. However, with free-standing stereo speakers the left and right sounds are mixed: both ears hear sound from both speakers. By using headphones and presenting the correct acoustical perspectives to each ear, many of the spatial aspects of sounds can be preserved. HMDs often have headphones built into them.
Additional displays can be used to engage other senses in VR. Since there is not much of a demand for such things as smell or taste generators, you generally have be creative and figure out your own way of catering to more senses than just vision and hearing.
Most of 3D graphics is based on building objects out of triangles or other simple polygons. For computing visuals, a convenient metric is the number of polygons your computer system can draw in one second. Since the computer must draw separate views for 2 eyes at least 20 times each second, you must divide the number of polygons per second by 40 to determine the maximum number of polygons that may be simultaneously visible in your virtual world. Thus, a computer that can draw 50,000 polygons per second will be able to support a virtual environment containing a maximum of:
50,000 polygons per second
------------------------------ = 1,250 polygons
2 eyes * 20 views per second
Since 1,250 polygons is not very many from which to build a whole world, and since the polygon drawing speed quoted by hardware and software manufacturers is often optimistic, you generally need to either design very simple worlds, or get extra graphics hardware to help out your computer.
There are also several manufacturers producing 3D sound cards. These sound cards allow you to give a moderately good sense of position to a small number (1-4) of independent sound sources.
The head tracker needs to be capable of taking a measurement of position and orientation at least 20 times every second. There also must be no more than a 1/20th of a second delay between when the measurements are taken, and when the visual display is updated. Any slower than this, and the eyes and inner-ear give your brain conflicting information about which direction your head is pointing. This is similar to what happens on a small boat in rough waters: It can make you seasick, or in VR terms, simulator-sick.
A wand is basically a hand-held joystick with a number of buttons on it. Wands often include a tracker which then allows you to pick up and rotate objects in the virtual world. You can use a wand in VR in much the same way you use a mouse on a computer desktop. Moving the wand in space moves a 3D pointer in the virtual environment. You can also click and drag virtual objects, but instead of just moving them vertically and horizontally, you can also move them in depth and rotate them about all three axes. This is why wands which can move objects in the X, Y, and Z directions and also rotate them about the X, Y, and Z axes are sometimes called six-dimensional or 6D controllers.
Since distances can be arbitrarily large in a virtual environment (and trackers have a limited range) it is not usually practical to travel through VR on foot. Some of the wand buttons are often used for "flying": you pointing your wand or your head in the direction you wish to travel, and then press the "fly" button. In VR, there is no speed limit.
VR gloves are like wands, only more complex. They consist of a tracker to sense the position and orientation of your hand, and some kind of flex sensors to measure the bend of your fingers. Gloves tend to be expensive and tricky to use partially because the computer must be able to recognize intricate hand signals instead of simple button presses.