This section will provide an overview of 3D input devices, outline theoretical models of interaction techniques and finally, examine the effectiveness of different input devices in 3D environments. The goal of this section is to give the reader a broad overview of input technologies and to establish an understanding of the reasons why use of alternative input devices is crucial in 3D spaces. The latter goal will be accomplished by exploring both theoretical and empirical research.
2.1.1 Overview of 3D Input Technologies
There are a number of input devices for 3D interaction. This section gives an overview of the current and most widely used devices.
Adapted 2D Input Devices (Keyboard, Mouse and Trackball)At the moment the mouse and keyboard are the most widely used input devices for 3D interaction. These devices are 1D (keyboard; or multidimensional, depending on the definition of degrees of freedom) and 2D (mouse and trackball) and are adapted for 3D interaction. The keyboard uses a sequence of key-presses to navigate and interact with the world. The mouse typically uses 2D widgets to control 3D interaction. Widgets typically take the form of slider bars or spin wheels.
Position TrackersPosition trackers determine the position and orientation of the user. Typically, position trackers use six coordinates, or degrees of freedom, to describe the position and orientation. Three degrees of freedom (X, Y, Z) are required to determine the users position, and three more degrees of freedom (pitch, yaw, roll) are required to determine the users orientation (each describing an angle of rotation about one of the axes).
In order for a position tracking system to be successful, it needs to be able to quickly and accurately determine the users position. A variety of technologies are used in position sensing:
Mechanical: Mechanical trackers physically connect the user to a point of reference. The users position is then measured mechanically from different positions and compared to the initial reference point. Mechanical trackers can be both quick and accurate, however, they severely restrict the user’s range of motion. The most commonly used mechanical tracker is the ‘Boom’ by Fake Space Labs.
Magnetic: Magnetic tracking technology is the most commonly used tracking technology in Virtual Environments. Magnetic trackers have an emitter, located at a fixed position in space, and a sensor, which is attached to the user. Both sensor and the emitter consist of three, orthogonal, electromagnetic coils. The emitter generates a magnetic field which generates a current in the sensor coils as it is moved through the field (Stuart, 1996). The induced currents are then used to determine the users position. Overall, magnetic trackers have a wide range of motion, and are quick and accurate. The largest drawback of magnetic trackers is that ferromagnetic and metal conductive surfaces cause field distortions, and that accuracy diminishes as the user moves further from the emitter. The most popular magnetic trackers are from Polhemus and Ascension.
Acoustic: There are several variations of acoustic tracking technology. Typically, ultrasonic frequencies (which are outside the range of human detection) are released from 3 or more emitters. The time it takes for the sound wave to travel from the emitters to the sensor is used to compute the distance between the emitter and sensor. The position of the sensor is determined via triangulation. Acoustic tracking is susceptible to ultrasonic noise interference, echoes, and depending on environmental conditions which can cause variations in the speed at which sound travels through the air, can be unreliable. Logitech offers a variety of popular acoustical tracking systems.
Optical: Optical tracking technology can use several different detectors, ranging from LEDs to ordinary video cameras. In most optical tracking systems, light is either emitted or reflected from particular points on the object, and the object’s position is then calculated. In other optical tracking systems, no points of light are used. These systems use known points on the object and take advantage of advanced image processing techniques to determine the users location. Optical tracking can be both quick and accurate, however, it can be expensive and may require extensive computational power.
Inertial: Inertial trackers use gravity, inertia, and the earth's magnetic pole to track orientation. These trackers are used as an alternative to magnetic trackers. The Intersense IS-300 trackers use a combination of micro-accelerometers and a compass to determine the orientation of the sensor. These sensors have a very fast response time, however they suffer from orientation drift.
Locomotive DevicesLocomotive devices translate the user’s movements in real space to similar movements in virtual space. For example, stationary bicycles, stair-climbers, and treadmills have all been used as locomotive input devices (Stuart, 1996). Locomotive devices provide the user with both a natural way to navigate through the virtual environment and the sense that they are actually moving through space. While locomotive devices can be extremely intuitive input methods, the user is typically limited by the amount of space available and the direction of movement allowed. Different research groups have been trying to solve these deficiencies in variety of ways, and devices like circular treadmills have been developed. (Durlach and Mavor, 1995). The Human Interface Technology Laboratory is currently developing the Virtual Motion Controller (Wells, 1996), which allows users to simply shift their weight to move, rather than taking full steps, eliminating the need for a moving platform and a large space. Overall, locomotive devices are most successful for inputting navigational information rather than other types of information.
Eye TrackingEye tracking allows the system to determine the user’s eye position through a variety of techniques: corneal reflectance, which measures the amount of light reflected from the cornea; electro-oculography, which measures potential differences caused by cornea-retina potential; optical pupil tracking, which processes a video image using the pupil as a point of reference; and various other techniques (Stuart, 1996). Once the position of the eye has been determined, this information can be used to select objects or commands, move objects, display and control menus, scroll through text, etc. (Jacob, 1991).
Haptic DevicesHaptic devices can be considered to be an input and output device. Theses devices use actuators to provide force feedback to the user. Force feedback can help a user perform tasks that require a high degree of precision. Haptic devices have been used primarily for medical simulations and remote robot operation.
Gesture RecognitionGesture recognition offers a natural and intuitive way for users to input information. Typically, the user wears a glove or body suit that enables the position of individual joints, or the flexion of fingers or limbs, to be determined. By combining information on the amount of flex, or the position of joints, with information from a 3D tracker, the position of the hand, or body, is calculated. Technologies that have been used to determine joint position and amount of flexion include fiber optics (DataGlove by VPL), camera-based LED systems with LEDs studded gloves (LED Glove by MIT), resistive-strip sensors (CyberGlove by Virtual Technologies), and strain gages (Power Glove by Mattel). Image processing, which does not require the user to wear a glove or body suit, can also be used for gesture recognition. Two considerations in using gesture recognition are: the instability of the hand, which can decrease precision, and the need for a "clutching" mechanism to discriminate between purposeful gestures and general hand movements.
Speech RecognitionSpeech recognition uses the human voice as an input device. Using either hardware or software (or a combination of hardware and software), words and phrases are matched against a set of known words or phrases. Speech recognition systems take two forms: user-dependent, and user-independent. A user-dependent system requires that a formal training session be performed prior to running the recognition system. A user-independent system attempts to determine the word or phrase without requiring previous training. The "hit rate" (the speech recognition success rate) tends to be higher for user-dependent systems since the system is trained for each user individually.
2.1.2 Theoretical models of Input DevicesSince the early 1970’s scientists have been trying to establish a theoretical framework for thinking about input devices, interactive techniques, and determining when particular devices should be used.
In 1974, Foley and Wallace, drew a separation between the input task and the input device. In theory, the input task was independent of the input device, and Foley and Wallace developed an input model that was device-independent. They outlined four virtual devices, the pick, button, locator and valuator that existed independent of the type of input device. Standards were developed for device-independent graphics packages such as the ACM’s Core Graphics System (GSPC, 1977), that separated the input device from the computer code allowing flexibility and rapid prototyping. In a later graphics standard, Foley and Wallace’s original model was refined by Enderle, Kansy and Pfaff (1984) who describe pick, choice, locator, valuator, stroke and string as part of their GKS standard. While device-independent theories provided a starting point for models of input devices, future studies and general experience, showed that systems that were considered equivalent to one and other by the device-independent theory, differed dramatically based on the input device used (Jacob and Sibert, 1992).
In the early 1980’s, more emphasis was placed on input devices, and several taxonomies of input devices were created. Buxton (1983) developed a taxonomy of input devices based on the degrees of freedom and properties sensed (position, motion, and pressure) by each device. In this way, he identified the ‘pragmatic attributes’ of each device. Mackinlay, Card and Robertson (1990) further expand on Buxton’s taxonomy by including continuous and discrete properties, considering the human element of the system and developing evaluation techniques for devices.
Much like Buxton’s taxonomy, Foley, Wallace and Chan (1984) also established a taxonomy that considered the ‘pragmatic attributes’ of different input devices. They mapped basic interaction tasks to input devices capable of completing those tasks, and proposed that only a limited set of input devices could be used for any particular task, based on the task requirements. Bleser and Sibert (1990) went one step further and developed an interactive design tool to guide the selection of input devices based on pattern matching and heuristic rules.
While the theoretical models and taxonomies helped establish a framework for understanding and selecting input devices they were incomplete. Both cognitive and perceptual issues had been ignored until Jacob and Sibert (1992) developed a model based on Garner’s (1970, 1974) theory of the processing of perceptual structure in multidimensional spaces. Garner’s theory describes how objects in multidimensional spaces have different ‘perceptual structures’, and how observer’s perceive objects differently, based on their perceptual structure. Attributes which are integrally related to one another form a single composite perception, and attributes which are separable related do not. Jacob and Sibert’s model showed that input devices and interaction techniques, that had the same perceptual structure as the task, were most successful
In summary, the trend in theoretical models of input devices has moved from the device-independent models of the early 1970s to models which rely heavily on the task, device and space. It is clear that input devices need to be selected based on these factors.
2.1.3 Empirical studies of the Effectiveness of Input Devices in 3D SpacesVirtual environments offer an intuitive means of displaying information and the opportunity to input information in a natural, intuitive ways. Jacob (1996) explains that:
Computer input and output are becoming more like interacting with the real world. For Input, this means attempting to make the user’s input actions as close as possible to the thoughts that motivated those actions … doing so exploits skills humans have acquired through evolution and experience.
While, Jacob’s predictions appear to be true, there are several 3D systems that have successfully mapped traditional 2D devices (mouse and keyboard) to 3D spaces (Chen, et al., 1988; Bier, 1990; Mackinlay et al., 1990, and Houde, 1992). If traditional 2D input devices can be successfully mapped to 3D space, is there a need for alternative, 3D input devices? This section will attempt to show that, there is a definite need for 3D input devices, by investigating empirical studies that compare traditional, 2D input devices to 3D input devices.
Hinckley et al. (1997) compared four input devices on a 3D rotation task. Two 2D input devices (Virtual Sphere and Arcball), and two 3D input devices (3D Ball and Tracker) were used to rotate an object until it’s orientation matched that of a control object. Hinckley found that using the 3D input devices, subjects were able to complete the rotation task up to 36 percent faster. Subjects also reported a preference for using the 3D Ball over both of the 2D devices. In addition, Hinckley, et al. found that there was no difference in accuracy between the devices.
Ware and Jessome (1988) compared a 3D bat with a traditional mouse, for manipulating 3D objects. Subjects found 3D object manipulation easier with the 3D bat than with the mouse. Ware and Jessome suggest that this difference may exist because subjects are required to mentally break down the 3D task into 2D parts when using the mouse.
The research cited above illustrates the need for alternative 3D input devices in virtual spaces, and shows that 3D input can be more successful than traditional input devices for 3D tasks. Jacob and Sibert (1992) show that 3D input devices are not always the best devices to perform 3D tasks, and that the best suited input device for a 3D task may depend on the "perceptual structure" of the task.
Jacob and Sibert compare performance on two tasks (each requiring subjects to manipulate 3 dimensions) using a 3D tracker or mouse. In the first task, subjects were asked to match the location (x and y coordinates) and size of two squares. In the second task subjects were asked to match the location (x and y coordinates) and the grayscale of the squares. Subjects performed the first task faster using the 3D tracker, and the second task faster using the mouse. Jacob and Sibert hypothesis that the difference in success of the input devices is related to the subjects differing "perceptual structures" of the tasks. In the first task (x, y, size) the quantities are related or "integral attributes". In the second task (x, y, grayscale) the quantities are independent or "separable attributes". Thus, in first task, using a 3D input device is more intuitive, and in the second task using a 2D one is.
Shaw (1997) compared two 3D input techniques (one-handed and two-handed) and a 2D mouse. Both the one-handed and two-handed (THRED, Two-Handed Refining EDitor) input devices used 3D trackers. Shaw compared several facets of the three devices including performance, user preference, and fatigue, on two tasks, modeling a 2D rectangle and a 3D box, respectively. Overall, users preferred the 3D, THRED. When the tasks were broken down into individual components, however, Shaw found that users preferred the mouse for menu and vertex selection tasks. Shaw suggests that this maybe because the mouse has a higher degree of accuracy and precision. There was no significant difference in the amount of fatigue users experienced with the different input devices.
In summary, the empirical research shows that alternative 3D input devices can enhance user performance in 3D environments. In addition, Jacob and Sibert's research shows that it is important for applications that allow users to perform a variety of tasks to incorporate a variety of input devices, both traditional and alternative.
2.1.4 Summary of 3D Input DevicesA variety of input devices exist that can be used for 3D interaction. Ongoing research has shown that alternate input devices are desirable by the user, and enhance the completion of tasks in 3D environments. Alternate input devices should be supported for interactive applications in order to enhance the user’s performance and their experience.
2.2 VRMLVRML (Virtual Reality Modeling Language) was conceived in 1994, at the first annual World Wide Web Conference in Geneva. At the conference, Tim Berners-Lee and Dave Raggett organized a session focussing on Virtual Reality interfaces for the Web and found that several individual groups were already working on 3D graphical tools for the Web. The need for one, standard 3D language was recognized, and the name Virtual Reality Modeling Language was coined.
Silicon Graphics’ Open Inventor ASCII File Format was chosen as the basis of VRML, as it supported polygonally rendered 3D graphics, lighting, materials, and user added features. Silicon Graphics agreed to release Open Inventor into the public domain and the format was adapted for VRML and the Web.
VRML 1.0 was introduced in October of 1994 at the Second International Conference on the World Wide Web and officially launched April 3, 1995. VRML 1.0 was designed to be platform independent, extensible and able to work well over low-bandwidth connections. For practical purposes, VRML 1.0 was not designed to be interactive (except for hyperlinks).
VRML 2.0 was launched at SIGGRAPH 1996 with a variety of new functionality. The most important of which is interactivity. This interactivity was supported by placing "hooks" in the VRML that allow for the Java programming language to interact with the world. This created a highly flexible and interactive platform for the creation of 3D interactive applications.
2.3 JavaHistory
The Internet grew from its original roots of the ARPAnet, and quickly expanded into the public and commercial markets. At its origin, the Internet was entirely text based, and detailed technical knowledge was required of its users. In the early eighties, as applications such as e-mail and newsgroups were developed, non-technical users also began using the Internet. With more users on-line, demand for easier, more useful programs increased. In 1990, a common protocol for delivering different kinds of files over the Internet was developed at the European particle research laboratory CERN, and the World Wide Web emerged. Although the Web still had a primarily text-based interface, its popularity continued to grow slowly. Use of the Web did not sky-rocket until the National Center for Supercomputing Applications (NCSA) in Illinois developed Mosaic, the first graphical interface for the web.
The introduction of Mosaic changed the face of the Internet. Text-based commands were no longer necessary, and it was possible to share graphics and formatted text. Use of the Web continued to grow at a staggering pace and other graphical interfaces, such as Netscape, appeared. The major limitation of the Web was that it was passive in nature. The content of Web pages was static and could not be changed ‘on-the-fly’. Interaction between the user and the page was severely restricted.
In 1990, James Gosling of Sun Microsystems began developing a new programming language designed to work with consumer electronics. The programming language, now known as Java, was designed to overcome obstacles of traditional programming languages like C or C++ and run independent of the platform. In 1993, as graphical interfaces were appearing on the Web, the creators of Java realized a platform-neutral language, such as Java, would be ideal for creating Web applications. Not only was Java platform-neutral, but it would facilitate interactivity on the Web, and in 1995 Netscape 2.0 began supporting Java applets.
Security
Hackers have made the Internet an unsafe place not only for large corporate and government computer installation, but also for the home PC user. Hackers now do not need to rely on viruses or worms to be passed from user to user via diskettes; they can access people's machines while they are on-line. This gives them a huge new playground with millions of computers. And with the introduction of Internet programs that run whenever you visit a Web site, this becomes a major problem. Java has become popular as an Internet language for this very reason. It isolates the end user from the possible harm of mischievous computer nerds. Java was designed from the ground up to be a safe and secure language. It accomplishes this in three ways; the compiler, the interpreter, and the applet. A Java compiler removes pointers and defers memory allocation until run-time. This prevents programmers from creating programs that could create pointers to restricted spaces. However, Java does not rely on the compiler to safeguard the application as compilers could be written that do not enforce these rules. So, the Java interpreter checks the byte-code at run-time. But a program could still be written that could corrupt a system by writing bad data on the hard drive. So, this is where the applet becomes important. The applet is a Web program that is downloaded and run on the users local machine. Applets are embedded into an HTML Web page, and provides the user with interesting graphics and user interaction. The user typically does not even know that an applet is running. The interesting feature about the applet is that while checking the program for bad pointers into system memory, it also forces the program to run in a "shell" completely isolating it from the local computer's components (e.g. hard drive and the communications ports). Because of this, no external devices can "talk" to the applet running on the computer. Since VRML and Java run together in an applet to provide the graphical worlds that you see and interact with, there is no way to directly control the application with any device other than a mouse. This "side-effect" of Java will be revisited in the following chapter.