One of the astonishing and distinguishing features of virtual interface technology are participant's response to it as a place. While experiencing virtual worlds, they can be heard saying "I wonder what's in here" or, "where am I now?", and afterwards, their comments about their experience often start with the words "when I was there". "Here, there and where " are responses to a place. People who experience virtual worlds feel as if they are really there, in the virtual models. Few other presentation mediums, be they cinematography, television, books, music or architectural renderings can generate such compelling impressions of being in a place.
There are at minimum six fundamental components to virtual interfaces which are responsible for creating the effect of being in the models. In general, virtual environments consist of a stereoscopic display, a device which tracks the position of the observer's head, a device for interacting with the model, a very powerful image rendering computer, a computer model and the computer program which makes the interaction possible. These components can take on very different forms, depending on the make of the product, and they can be used in many differing combinations.
Virtual interfaces are not limited to the visual modality. There exist very effective forms of 3 dimensional sound, as well as more rudimentary forms of tactile feedback. These modalities can be of considerable help for acquiring an accurate perception of space. However, their complexity is such that they cannot be included in this study.
There are several very different stereoscopic display solutions in existence today. They range from the more public "retro-projection" techniques to the more personal techniques such as the Head Mounted Display (H.M.D). In the retro-projection display, the observer(s) is(are) surrounded by several large screens. Two images of the same scene (one for each eye) are projected from behind the screens. By wearing special stereoscopic glasses, observers have the illusion of seeing the objects in depth.
The H.M.D, often referred to as "eyephones", is another common form of display. It is this display medium which will be used for this study. The eyephones consist of two small color C.R.T monitors (one for each eye) and optics for magnification. They are set inside a head-mounted apparatus which looks like diving goggles (see Appendix D: Interface Hardware). Each eye has a field of view of 75deg. and combined field of view of 90deg.. The overlap between both eyes is 60deg.. This overlap allows for the necessary convergence of the eyes required in order to have a good sense of depth. The peripheral field of view beyond the displayed image is prevented from receiving any visual stimulus from the exterior by the black rubber sidings in the goggles.
The resolution of the eyephones are 1/4 th that of N.T.S.C television monitors. At that resolution, one can clearly see the individual screen cells. When the eyephones are put on, the eyes are within about 1 inch from the optics. Fortunately, this is not a big problem because the light which comes through the eyephones is collimated, that is to say, the light is parallel, and it appears to come from an infinite distance. As a result, one tends to focus more easily on the model than on the screen itself.
The position tracker is a device which measures the rotation (Yaw, Pitch and Roll) and translation (x, y and z) of any object to which it is attached. It samples this information many times per second. The tracker is usually attached to the observer's head. It can also be used to track the position of other parts of the body, such as the hands (V.P.L's Dataglove(TM)) or even the whole body (the Bodysuit(TM)). There is a great range in tracking technologies from the more "mechanical" ones such as the Boom(TM), to more "electrical" solutions using ultrasounds, infra-red light, electromagnetic fields, or multiple cameras. What makes these approaches different is their tracking range, their ability to tolerate interference, their precision and their cost.
For this study, I will be using V.P.L's Polhemus(TM) tracker. In this configuration, a "parent" source emits an electromagnetic field a few feet in circumference. A "receptor" located on the eyephones and well within the magnetic field identifies the exact location and rotation of the observer's head. Because the Polhemus has such a short working range, it can only tolerate small translations of the body. For traveling larger distances, the observer has to use an interaction device.
After head-tracking, movement is probably the form of interaction most important for conveying a sense of presence in virtual environments. Ideally, it is preferred that participants physically walk through the modeled environment. At the University of North Carolina at Chapel Hill, researchers have developed a treadmill which allows participants to "physically" walk in the virtual environment. In this way, the users always stay within the reach of the tracking sensor, and yet they have the kinesthetic feedback of physically walking. Each real step moves their viewpoint one virtual step. Researchers are also developing a tracking system which would allow the participant to walk in the virtual space by actually walking in a large real space (Optical trackers). Both of these techniques enhance the perception of space because, in addition to the visual (and potential auditory depth cues), users also get the kinesthetic feedback so essential for making estimates of distance.
Since the tracking devices at the H.I.T.Lab are of the type with a limited range, a movement interaction device is required to replace the act of walking. These devices simulate movement by moving the position of the viewpoint through the model while the observer stands still. There are a number of devices to do this, including the Dataglove, the joystick and the Spaceball(TM). In the case of the Dataglove, the participant "gestures" a pointing finger in the direction they wish to move. This action is interpreted by the program to advance the location of the participant's viewpoint , thereby giving them the illusion of motion. The joystick and Spaceball are variations which work principally the same way.
For this study, the interaction device that will be used for movement is the Spaceball. This device is generally used to affect the rotation and translation of objects in three dimensional models. In this study, it will be used to affect the observer's viewpoint. The Spaceball is an uncommon device for controlling the movement of the viewpoint in virtual environments. There are other devices which are better suited. However, only this device can logically function in all the test conditions of this study (for more information, refer to Appendix D: Interface Hardware).
Along with the quality of the graphics, the most important criteria for computers used for displaying virtual environments is the speed at which they render the images. This is referred to as the update rate. It is the speed with which the computer can calculate and render each new view. Although there has been no definitive study as to the minimum update to sustain the illusion of presence, most researchers would agree that 7 frames per second is an important threshold, below which the intervention of conscious awareness is often inevitable.
Since the data base of these virtual environments are models, the unit of measure for the speed of a computer is the number of polygons it can render per second. The polygon is a surface made up of three vertices, like the triangle. A six-sided cube has 12 polygons. The greater the complexity of the model, the higher the polygon count. Small computers can render models made up of a few polygons many times per second, whereas faster ones can render millions of polygons per second.
The computer at the HITLab is the Silicon Graphics 320 VGX workstation. It is rated at 1,000,000 polygons per second. Since it needs to render two different scenes, one for each eye, the actual polygon count is 500,000/second per eye. And if the number of updates required per second is a minimum of 10, that means it can calculate at best about 50,000 polygons/second. These figures are for ideal conditions. In actual practice however, because the same computer has to run the interaction program and interpret the tracking data, the update rate is much lower. In the particular conditions of this study, it was found to be able to render 4000 polygons at 8 frames per second per eye.
Many computer modeling programs can be used to build models for virtual interfaces. There are some limitations however as to the kinds of objects which can be created. At the time of this study, spline and bezier curves are not supported by the rendering software. Neither are higher order rendering techniques (ray tracing, phong) and sophisticated lighting conditions. The only forms available for building a model are wireframe or flat shaded and simply colored polygons.
The virtual interface computer program continually interprets the position tracking data and the movement interaction data together with the model data base in order to render and display the appropriate view of the model to the eyephones. There are several commercially available packages (V.P.L microcosm, Sense 8), as well as many locally fabricated software programs in various university laboratories. Due to the specificity of the study at hand, a separate rendering program was designed by Marc Cygnus at the H.I.T.Lab. It is a "hard coded" program, that is to say, it can only work for this particular study (V.E.O.S, the operating system designed at the H.I.T.Lab, was not far enough along to be used reliably at the time of this study). The complete virtual interface configuration for this study is summarized in Table 2.1.
Table 2.1 - Virtual Interface System at the H.I.T.Lab.
Eyephones VPL stereoscopic color CRT displays
Resolution 1/4 th NTSC
Field of View 75deg. per eye, 60deg. overlap, and 90deg. overall
Computer Silicon Graphics VGX 320 workstation (1,000,000 polygons/second rating)
Modeling Software Alias. Objects made of polygons only.
Interaction Software Designed by Marc Cygnus. Renders flat shaded polygons. There are 2 directional and 1 ambient light source.
Tracking device VPL DataBox for head tracking (30 times/second)
Interaction Device SpaceBall, in conjunction with the rotation direction of the tracking device in the "Tracked" condition
Update Rate ~8 - 10 frames per second for a 3800 polygon model (Including the multiple light sources, the model rendering, the tracking data and the interaction device data).
When a participant puts on the eyephones, (s)he sees the interior of the computer model. Because the eyephones are equipped with position trackers, the computer can be updated as to the exact location of the participant's head at any one time. With this new information, it can render newly appropriate view of the model. In effect, when the participant flexes their knees, the tracker signals to the computer that the participant has lowered the position of his/her head. Simultaneously, the computer renders a viewpoint into the model which appears to be lower, thereby replicating the participant's expected visual changes in the physical setting. Similarly, as the participant turns his/her body around completely (180deg.), the view they have of the model is simultaneously updated so that what was behind them in the model is now in front of them. For every movement of the head, there is an equal change in the viewpoint of the computer model. As a result, everywhere he/she turns, the participant finds him/herself "surrounded" by the model. This very tight coupling between their physical movements and the updated view of the model gives the participants a sense of presence within the model. It makes them feel as if they were immersed within the virtual environment.
Everything about virtual interfaces seem to indicate that they would be perfect tools for simulating architectural spaces. They seem to satisfy all of the criteria for the perception of motion and space, as defined by Appleyard (Appleyard, 1964):
"A. Apparent self-motion : speed, direction, and their changes (stop-go, accelerate-decelerate, up-down, right-left).
B. Apparent motion of the visual field : passing alongside, overhead, or underneath; rotation; translation ; spreading or shrinking of outline or texture ; general stability or instability ; apparent velocity or lack of it.
C. Spatial characteristics:
(1) Presence and position of enclosing objects or surfaces, their solidity and degree of enclosure,
(2) General proportions of the space enclosed; scale with respect to the observer; position of the observer,
(3) Quality of the light which makes the space apparent; intensity and direction,
(4) Relationship of spaces in sequence : jointing and overlapping,
(5) Direction of principal views, which draw the eye toward different aspects of the spatial enclosure."
The convincing sense of presence and of being in the models are exactly the qualities required for assessing the feel of architectural spaces. These are also the attributes missing in the other forms of representation.
The problems plaguing the previous forms of space representations were scale and depth cues. Because the models were viewed on a 2D screen, the third dimension (depth) of spatial information was diminished. The stereoscopic eyephones used in virtual interfaces recreates the third dimension of space. Scale is no longer a problem either because the position tracking of the body is a continual confirmation of one's presence and scale relative to the model.
As they exist today, virtual environments could help designers and their clients make more accurate evaluations about the basic spatial characteristics of their projects because they would perceive virtual spatial information the same way as they do real spatial information. They could walk about and explore the virtual space as they would the real space.
Virtual environments would seem to be the perfect representation tools to replace the scale model and the monitor-based computer walkthroughs because those, at best, require users to imagine being "there" by mentally projecting themselves into the space. In virtual environments, participants are intuitively "there". No conscious effort, transformations, or projections are required to "get a feel" for the modeled spaces - users interact with the virtual environments almost as naturally as they do in real environments.
While it is clear that virtual interfaces can create the illusion of being present in a computer model, very little is known about the representativeness of virtual spaces to real ones. The technological components are all designed to simulate spaces so that they are perceived in the same way as real spaces; the eyephones display the model in 3D, the position-tracking devices map the viewpoint of the observer one-to-one with the movements of their head, and the model is at human scale. Nonetheless, it is unknown as to whether similar spaces would actually be perceived to be the same in both virtual and real environments.
While the potential for using virtual interfaces as representations of architectural spaces is clear, their risk of leading to misperceptions must not be overlooked. Indeed, if people use them with the conviction that their perception of the virtual space is the same as it would be of the real space, when in actuality it is quite different, it could lead to errors of judgment during the decision making process. If distance were misperceived, spaces might be judged too big when if fact, in real life, they are just the right size. If the layout is misperceived, people could be very upset when they encounter the finalized project.
Clearly it is important that here, as with any new simulation technology, virtual environments be evaluated for their representativeness. More specifically, to make virtual environments a viable alternative for representing architectural spaces, people must be at least as accurate in their spatial perception as when viewing a real-time walkthrough.