Using established principles from the field of Computer Supported Collaborative Work (CSCW), we show how the affordances of wearable computers make them ideal platforms for three dimensional CSCW. We also describe two pilot studies which imply that wearables may be able to support three dimensional collaboration and that users will perform better with these interfaces than immersive collaborative environments.
Augmented Reality, CSCW, Wearable Computing
Rapid advances in technology are changing the computer interface and the types of tasks computers are used for. As groupware and collaborative tools become more common the Human-Computer Interface is giving way to a Human-Human Interface mediated by computers. Interfaces for two-dimensional Computer Supported Collaborate Work are relatively common, while three-dimensional CSCW tools are still rare. We advance the notion that wearable computers are ideal platforms for three-dimensional CSCW and shows how the affordances of wearable computing address two major issues in CSCW: seamlessness and enhancing reality. In the following sections we describe previous three-dimensional CSCW interfaces, the affordances of wearable computers, and some preliminary experimental results showing that users may collaborate effectively in a wearable computing setting.
It is important to note that we are focusing on three-dimensional CSCW. Siegel et al. [1] have described a wearable system for collaboration with a remote expert using shared video and hypertext, but until now wearables have not been proposed for three-dimensional CSCW. This paper presents a theoretical framework which describes underlying CSCW principles and shows how wearables satisfy these principles. Although our preliminary results are from tethered experiments we are in the process of developing a wearable system to further validate our approach.
There are several different approaches for facilitating three-dimensional collaborative work. The most obvious is adding collaborative capability to existing screen-based three-dimensional packages. However a two-dimensional interface for three-dimensional collaboration can have severe limitations, such as users finding it difficult to visualize the different viewpoints of their collaborators, or to interact with the environment [2].
Alternative techniques include using large projection screens to project a three-dimensional virtual image into space. The CAVE [3] and Responsive Workbench[4] allow a number of users to view stereoscopic 3D images by wearing LCD-shutter glasses. Unfortunately, in both cases the images can be rendered from only a single user's viewpoint, so only one person will see true stereo. The devices also require bulky hardware such as a projection screen or large beam splitter, and require expensive optics.
Mechanical devices can be used to create volumetric displays, such as scanning lasers onto a rotating helix [5] or a rotating phosphor coated plate activated with electron guns [6]. However, they do not allow direct interaction with the images because of the rotating display surface and have a limited viewing volume.
In contrast, multi-user immersive virtual environments provide an extremely natural medium for three dimensional CSCW; in this setting computers can provide the same experience that people have in face-to-face interactions, such as communication by object manipulation and gesture [7]. Work on the DIVE [8] and GreenSpace projects [9] has shown that collaboration is indeed intuitive in such surroundings. However most current multi-user VR systems are fully immersive, separating the user from the real world.
An alternative approach is to use augmented reality (AR), the overlay of computer graphics on the real world. This is particularly relevant for wearable computers, since most current wearable systems use see-through or field- multiplexed head mounted displays. Successful single user AR interfaces have been developed for computer aided instruction [10], manufacturing [11], and medical visualization [12] among others.
However, there are few examples of multi-user augmented reality systems and none in a wearable computing setting. Rekimoto [13] has explored the use of tracked hand-held LCD displays in a multi-user environment. He attaches small cameras to LCD panels to allow virtual objects to be composited on video images of the real world. These displays have the advantage that they are small, lightweight and portable. Unfortunately they do not support a true stereoscopic view, and are not hands-free. In addition, users must hold the LCD panel in front of their face - obscuring their facial expressions from other participants.
Klaus et al. [14] also use video compositing techniques to superimpose virtual images over a real world view. Their multi-user system is monitor-based so users get the impression that the virtual objects are superimposed on a remote real environment. However, it is difficult for users to change the real camera position and their architecture is designed to support distributed users viewing the same remote real environment rather than local users interacting in the same real environment.
We have developed a new approach which uses see-through head mounted displays with head tracking in a collaborative interface [15]. We call this the Shared Space technique because it allows multiple users in the same physical location to work in both the real and virtual world simultaneously, facilitating CSCW in a seamless manner. Figure 1.0 shows the users view of a Shared Space interface, in this case a collaborative web browser. The users can see each other and virtual web pages floating around them in space. They can quite easily interact with the real and virtual world simultaneously, and groupware support can be mostly left to social protocols. With body worn trackers and portable see-through displays, the Shared Space method could easily be applied in a wearable setting.
Wearable computing is a promising application area for collaborative augmented reality interfaces. Small belt pack PCs are beginning to be developed for commercial and industrial applications where mobility and hands-free operation is important [16]. Wearables are also used in areas with high potential for collaboration and need for enhancement of the real environment and head mounted displays are the most commonly used display devices

Schmalsteig et al. [19] identify five affordances of collaborative AR environments:
In addition, real objects can be used to interact with virtual objects to facilitate natural object manipulation, and AR images need not be as graphically complex as immersive virtual environments.
Fickas et al. [20] describe the following affordances for wearable systems:
They suggest that augmented reality and the computers ability to perceive aspects of its physical environment are the most novel affordances of wearable systems.
The affordances of wearable systems can be further be divided into those offered by the computer, the display, and input/output devices. Thus different classes of wearables afford different actions. For example, table 1.0 shows the affordances possible with different visual display types, and table 2.0 some unique types of information display possible with different head tracking technology. One of the challenges in using wearables for CSCW is matching the requirements for collaboration with technology that provides at least the minimum relevant affordances.
| Display Type | Image Display Affordance |
| LCD Panel | Image viewing |
| Monoscopic field-
sequential HMD | Image overlay or inset into
on the real world |
| Stereoscopic
see-through HMD | Three dimensional image
overlay on the real world |
| Tracking Type | Information Display Affordance |
| No Head Tracking | Head stabilized |
| Orientation Tracking | Body stabilized |
| Position and
Orientation Tracking | World stabilized |
In general, the two dominant affordances for wearable augmented reality systems are seamlessness between the real and virtual world, and the facility for perceiving and enhancing reality. Next, we show how these two characteristics are vital for collaborative interfaces, making wearable computers ideally suited for 3D CSCW.
Most existing CSCW tools introduce seams and discontinuities into the collaborative workspace. Ishii et al. [21] define seams as spatial, temporal and functional constraints that force shifting among a variety of spaces or modes of operation. Examples include the discontinuity between computer-based word processing and traditional pen and paper, or between face-to-face and distributed meetings. Seams can be of two types: Functional Seams between two different functional workspaces, and Cognitive Seams between existing and new work practices.
One of the most important functional seams is between shared and interpersonal workspaces. In face-to-face conversation there is a dynamic and easy change of focus between shared and interpersonal spaces because of a variety of non-verbal cues. However, most CSCW systems have an arbitrary seam between the shared workspace and interpersonal space; for example that between a shared white board and a video window showing a collaborator. This prevents users who are looking at the shared white board from maintaining eye contact, an important non-verbal cue for maintaining smooth conversation flow [22].
A common cognitive seam is that between the computer-based and traditional desktop tools. Grudin [23] points out that CSCW tools are generally rejected when they force users to change the way they work, yet this is exactly what happens when computer-based collaborative interfaces make it difficult for users to use pen and paper in conjunction with the computer-based tools.
Ishii et al. introduced the idea of an "Open Shared Workspace" which uses seamless design to remove the discontinuities in collaborative interfaces [21]. They built the TeamWorkStation[24] and ClearBoard [25] interfaces following this principle. TeamWorkStation removes the seam between the real world and collaborative workspace by combining video- and computer-based tools. Video overlay on a collaborative white board supports use of real- world and computer-based tools.
ClearBoard addresses the seam between the individual and shared workspace. By using work surfaces made of large mirrors and applying video projection techniques, users can look directly at their work space and see a projection of their collaborator behind it. Users can effectively and easily change focus, maintain eye contact, and use gaze awareness in collaboration. The result was an increased feeling of intimacy and copresence.
In order for a CSCW interface to minimize functional and cognitive seams it must have the following affordances:
Although collaboration in an immersive VR environment may be intuitive, there is a seam between the real world and virtual world which makes it difficult use real-world objects and non-verbal body language. In contrast, the affordances of wearable augmented reality match those requirements above, supporting seamless collaboration with the real world, and reducing the functional and cognitive discontinuities between participants.
A key question is how much seams affect communication and collaboration. In general, technologically mediated remote collaboration produces different communication behaviors than face-to-face collaboration. Sellen [26] suggests that what makes the biggest difference is not the communication medium but whether the conversation is mediated or not. Comparing communication between audio-only, video-only and face-to-face collaboration, Sellen found no difference in conversation structure between the audio- and video-only conditions. However, conversation structure in these conditions both differed significantly from face-to-face conversation. Even with no video delay, video mediated conversation didn't produce the same conversation style as face-to-face speakers [27]. This occurs because video cannot adequately convey the non-verbal signals so vital in face-to-face communication [28].
It is clear that seams introduced by technological mediation changes the nature of collaboration. Sellen [26] suggests that there is something to sharing the same physical space that positively affects conversation - it remains an open question if the same benefits can be achieved by sharing the same immersive virtual space. Using augmented reality is an easy way to get the benefits of virtual reality while sharing the same physical space and wearables increase the ease at which users can share the same space.
Removing the seams in a collaborative interface is not enough. As Hollan and Stornetta [29] point out, CSCW interfaces will not be used if they provide the same experience as face-to-face communication; they must provide affordances that enable users to go "beyond being there".
Most collaborative interfaces use computer equipment to provide a sense of remote presence. Measures of social presence [30] and information richness [31] have been developed to characterize how closely telecommunication tools capture the essence of face-to-face communication. The hope is that collaborative interfaces will eventually be indistinguishable from actually being there.
Hollan and Stornetta suggest this is the wrong approach. By considering face-to-face interaction as another medium, it becomes apparent that this method requires one medium to adapt to another, pitting the strengths of face-to-face collaboration against new CSCW interfaces. Mechanisms which may be effective in face-to-face interactions may be awkward if they are replicated in an electronic medium, making users reluctant to use the new medium. For example, the Cruiser video conferencing system was developed to replace face-to-face meetings and support remote awareness. However most users used the system for brief conversations and to set up face-to-face meetings rather than replacing face-to-face collaboration [32]. In fact, it may be impossible for mediated collaborations to provide the same degree of presence as face-to-face collaboration because of the nature of the medium [33].
Hollan and Stornetta argue that rather than using new media to imitate the face-to-face medium, researchers should be considering what new affordances the media offers that satisfy the needs of communication so well that people will use it regardless of physical proximity. A better way to develop interfaces for telecommunication is to focus on the communication part, not the tele- part. The main motivation should be identifying needs which are not met in unmediated face-to-face collaboration and evolving mechanisms which use new media affordances to meet those needs, developing tools which go beyond being there.
Wearable computers are ideally suited for this approach. They allow normal face-to-face collaboration but enhance it with affordances that satisfy previously unmet needs. Some of the limitations of normal face-to-face collaboration include the difficulty of archiving and retrieving conversations, and accessing relevant external data. Starner et al. [34] present single-user wearable applications that could be expanded to meet these needs, including:
INITIAL EXPLORATIONS
In the previous sections we have shown how wearable computers afford seamless collaboration and can enhance the real world, both key elements for effective CSCW interfaces. However there are many questions that must be answered in developing collaborative wearable systems. In this section we describe pilot studies conducted to address two of these questions; does the seamlessness between the real and virtual worlds really benefit performance, and can collaboration in a wearable environment produce similar performance as face-to-face collaboration.
One area of interest is possible differences in task performance on the same collaborative task performed in an immersive virtual environment versus an augmented reality wearable configuration. A difference in results could imply that the seamlessness inherent in wearable AR interfaces does indeed affect task performance.
To explore this we performed a simple pilot study, complete details of which can be found in [35]. We developed a two-player game which involved moving randomly distributed colored cubes or balls around a virtual space and placing them in a target configuration. Each of the players had a different role. One was the "spotter" and could see all the virtual objects. Her role was to search the space, find the objects needed to complete the target configuration and make them visible using voice commands. The second player was the "picker". His role was to find the objects that were made visible by the spotter, pick them up, and drop them over the target configuration. The role division between the players forced them to collaborate. Both players were in the same room, wore stereoscopic Virtual i-O head mounted displays which could either be see-through or occluded, and had their head and hand positions tracked by 6D electro-magnetic trackers. When the displays were used in see-through mode the virtual targets appeared superimposed over the real world.
Four experimental conditions were tested:
When a real or virtual body is present participants could use both voice and pointing gestures to show where objects are located. Without bodies they could use voice only. Real or virtual world reference objects could be used to aid target object location and communication.
Eighteen pairs of college students (20 women, 16 men) served as
subjects, each playing four games for each condition. Some used
real or virtual body cues to aid their performance, while others
used alternative strategies, such as specifying object location
by clock direction. Subjects who used body cues completed the
game significantly faster in the real world/real body condition
than virtual world/virtual body condition (table 3.0), with quicker
real world/real body times than all other conditions (figure 3.0).
Figure 3.0 Average Task Performance Times
Players were also given a post-game survey to determine their subjective evaluation of the conditions. For each condition, they were asked to rate how good they thought they were at playing the game (on a scale from one to seven). Users felt they were best at playing the game in either the real world/real body case or virtual world/virtual body case (figure 4.0). There was a significant difference between responses across conditions, a one factor repeated measures ANOVA gave [F(4,114)=7.65, p<0.0001].

Users were also asked to rank the conditions on a scale from one to five according to how well they thought their pair performed in each condition. The best condition was ranked first and the worst last. The average rankings for each condition are shown in figure 5.0. Using a Friedman two-way ANOVA we again find a significant difference between rankings [Chi-Square=31.89, df = (4,21), p< 0.0001]. Users again thought they performed best in the real world/real body condition or virtual world/virtual body case, and all of the users which relied on body cues thought they performed best in the real world/real body case.
The significant performance difference among subjects that used body cues implies that the increased communications bandwidth facilitated by seeing the real world and a real collaborator may indeed aid task performance. The subject rankings imply that users may prefer collaboration in a setting where they can see their collaborators face-to-face, such as that provided by a wearable computing platform.
A second smaller pilot study compared three-dimensional collaboration with different display conditions to see if the displays typically used in a wearable computer would have any affect on performance. Three subjects (all women, aged 19 - 23yrs) placed cylinders and cones on a 3x4 grid under the guidance of an instructor. For each display condition there were seven target configurations which subjects experienced in a random order. Total task completion time was measured and subjects were videoed to analyze conversational styles. The display conditions were:
Real-World: Instructor and collaborator sat side by side, both with identical grids.
Monitor-HMD: The instructor and subject were separated. The subject wore a portable head-mounted display with video camera attached while the instructor observed their view on a monitor. A second video camera pointing at the monitor allowed the subject to see the instructor's graphical notations on their own video image.
HMD-HMD: The instructor and subject sat facing away from each other, both wearing portable head-mounted displays with attached video cameras. The signal from each camera was fed to the opposite user's display, giving an view of what their collaborator was doing (figure 6.0).

The head-mounted displays used were the battery powered Virtual Vision Sport field multiplexed monoscopic display, common in wearable applications. They have a single LCD panel reflected off a light collimating mirror in the user's peripheral view. Each display was modified by adding a small CCD black and white camera (figure 7.0).

We were also interested in exploring the effects of voice and gesture on collaboration. In the real-world case there were three test configurations:
In the Monitor-HMD case the following configurations were used: Voice Only, Voice and Gesture, and Voice and Graphical Overlay. In the voice and graphical overlay condition the instructor used graphical annotations on the monitor to show his collaborator where to place objects. In the HMD-HMD condition the instructor only used voice and gesture to communicate.
The average performance times for each of seven conditions are shown in table 4.0. As expected, the subjects performed worst in the two voice-only conditions. All the other conditions took nearly the same amount of time.
| Display Cond. | Communication | Avge. Times |
| Real World | Voice Only
Voice and Gesture Example | 35.4 sec
18.9 sec 19.9 sec |
| Monitor-HMD | Voice Only
Voice and Gesture Voice and Graphics | 28.1 sec
21.9 sec 21.8 sec |
| HMD-HMD | Voice and Gesture | 21.6 sec |
The conversation between the instructor and student was also analyzed by counting the average number of words the instructor used for each condition. The subject was generally silent, listening for verbal instructions. However the number of words used by the instructor varied according to display condition, as shown in table 5.0.
| Display Cond. | Communication | Word Avge. |
| Real World | Voice Only
Voice and Gesture Example | 78.4
44.0 32.1 |
| Monitor-HMD | Voice and Gesture
Voice and Graphics | 45.3
49.6 |
| HMD-HMD | Voice and Gesture | 40.7 |
There was a significant difference in the number of words spoken between conditions; a one factor ANOVA gave the following, [F(5,41)=67.37, p<0.0001, Fcrit = 2.48]. There was also a significant difference in number of words used between the technology conditions (HMD-HMD, Monitor-HMD), with the HMD-HMD condition using the least number of words [F(2,18) = 7.448, p<0.001, Fcrit = 3.55]. A one tailed t-test comparing the Real World Example results with the HMD-HMD results found that they were significantly different [t-val = 3.1, p<0.01, tcrit = 1.78].
With such a small sample it is difficult to arrive at any conclusions, but it is interesting that out of the technology mediated conditions, the HMD-HMD condition produced performance times and conversational style closest to the real world example case, even though the conversational measure was still significantly different. This implies that for some tasks, three dimensional collaboration between users with wearable displays may not detrimentally affect task performance. We are currently conducting further user studies to verify this result.
In this paper we have proposed that wearable computers are ideal for 3D Computer Supported Collaborative work because of the unique affordances they offer. The combination of augmented reality, mobility and computer enhanced perception enable wearables to overcomes two major challenges of CSCW; seams and the need to enhance reality.
We have also presented preliminary results from pilot studies that support this proposal. In the first, subjects performed better when they could see each other and the real world, and preferred collaborating in this condition. In the second, when both subjects used wearable displays there was no detrimental effects to task performance and they communciated almost the same as in face-to-face collaboration.
However, considerable more work needs to be completed to establish the usefulness of wearables for 3D CSCW. Empirical studies need to be conducted comparing collaboration in a wearable setting to other competing technologies, establishing which of the affordances of wearables contribute most to facilitating 3D CSCW, and the types of collaborative applications that wearables are most suited for. These studies should also provide the requirement specifications for technology development, such as sourceless position and orientation trackers, and improved displays. Finally, developers must use this knowledge and technology to build wearable interfaces which overcome the limitations of current 3D CSCW tools.
Thanks to Edward Miller for helping with paper revisions. The work was partially supported under the Shared Space: Collaborative Augmented Reality research grant from the Washington Technology Center.