| Publications Page | HITL Home |

A Mixed Reality 3D Conferencing Application

Hirokazu Kato, Mark Billinghurst, Suzanne Weghorst, Tom Furness

Human Interface Technology Laboratory

University of Washington

Box 352-142

Seattle, WA 98195, USA

{kato,grof,weghorst,tfurness}@hitl.washington.edu

This document is available for download as a Microsoft Word file.

ABSTRACT

We describe a Mixed Reality conferencing application which uses the overlay of virtual images on the real world to support three dimensional remote computer supported collaborative work. Remote collaborators are represented on Virtual Monitors which can be freely positioned about a user in space. Users can collaboratively view and interact with virtual objects using a shared virtual whiteboard. This is possible through precise virtual image registration using fast and accurate computer vision techniques also described in the paper.

Keywords

Mixed Reality, Augmented Reality, CSCW

INTRODUCTION

Computers are increasingly used to enhance collaboration between people. As collaborative tools become more common the Human-Computer Interface is giving way to a Human-Human Interface mediated by computers. This emphasis adds new technical challenges to the design of Human Computer Interfaces. These challenges are compounded for attempts to support three-dimensional Computer Supported Collaborative Work (CSCW). Although the use of spatial cues and three-dimensional object manipulation are common in face-to-face collaboration, tools for three-dimensional CSCW are still rare. However new 3D interface metaphors such as virtual reality may overcome this limitation.

Virtual Reality (VR) appears a natural medium for 3D CSCW; in this setting computers can provide the same type of collaborative information that people have in face-to-face interactions, such as communication by object manipulation, voice and gesture [12]. Work on the DIVE project [3], GreenSpace [6] and other fully immersive multi-participant virtual environments has shown that collaborative work is indeed intuitive in such surroundings. However most current multi-user VR systems are fully immersive, separating the user from the real world and their traditional tools.

As Grudin [4] points out, CSCW tools are generally rejected when they force users to change the way they work. This is because of the introduction of seams or discontinuities between the way people usually work and the way they are forced to work because of the computer interface. Ishii describes in detail the advantages of seamless CSCW interfaces [5]. Obviously immersive VR interfaces introduce a huge discontinuity between the real and virtual worlds.

An alternative approach is through Mixed Reality (MR), the overlaying of virtual objects onto the real world. In the past researchers have explored the use of MR approaches to support face-to-face collaboration. Projects such as Studierstube [10], Transvision [9], and AR2 Hockey [7] allow users can see each other as well as 3D virtual objects in the space between them. Users can interact with the real world at the same time as the virtual images, bringing the benefits of VR interfaces into the real world and facilitating very natural collaboration. In a previous paper we found that this meant that users collaborate better on a task in a face-to-face MR setting than for the same task in a fully immersive Virtual Environment [2].

In this paper we report on an application of MR techniques for supporting remote collaboration. We have developed a MR conferencing system that allows virtual images (Virtual Monitors) of remote collaborators to be overlaid on the users real environment. Our Mixed Reality conferencing system tries to overcomes some of the limitations of current desktop video conferencing, including the lack of spatial cues [11], the difficulty of interacting with shared 3D data, and the need to be physically present at a desktop machine to conference. While using this system, users can easily change the arrangement of Virtual Monitors, placing the virtual images of remote participants about them in the real world and they can collaboratively interact with 2D and 3D information using a Virtual Shared Whiteboard. The virtual images are shown in a lightweight head mounted display, so with a wearable computer our system could be made portable enabling collaboration anywhere in the workplace.

In developing a multi-user mixed reality video conferencing system, precise registration of the virtual images with the real world is one of the greatest challenges. In our work we use computer vision techniques and have developed some optimized algorithms for fast, accurate real time registration. In this paper we first describe our conferencing application, and then the video-based registration methods used.

SYSTEM OVERVIEW

Our prototype system supports collaboration between users wearing see-through head mounted displays and those on more traditional desktop interfaces. This simulates the situation that could occur in collaboration between a desk bound expert and a remote field worker. In this section we first describe the MR head mounted interface and then the desktop interface. Due to hardware limitations we are currently only able to support a single MR user collaborating with an arbitrary number of desktop users. However with additional displays and cameras we could increase this number.

Mixed Reality Interface

The user with the MR interface wears a pair of the Virtual i-O iglasses head mounted displays (HMD) that have been modified by adding a small color camera. The iglasses are full color, can be used in either a see-through or occluded mode and have a resolution of 263x234 pixels. The camera output is connected to an SGI O2 (R5000SC 180MHz CPU) computer and the video out of the SGI connected back into the head mounted display. The O2 is used for both image processing of video from the head mounted camera and virtual image generation for the HMD. Performance speed is 7-10 frames per sec for full version, 10-15 fps running without the Virtual Shared Whiteboard.

The MR user also has a set of small marked cards and a larger piece of paper with six letters on it around the outside. There is one small marked card for each remote collaborator with their name written on it. These are placeholders (user ID cards) for the Virtual Monitors showing the remote collaborators, while the larger piece of paper is a placeholder for the shared white board. To write and interact with virtual objects on the shared whiteboard the user has a simple light pen consisting of an LED, switch and battery mounted on a pen. When the LED touches a surface the switch is tripped and it is turned on. Figure 1 shows an observers view of the MR user using the interface.

Figure 1. Using the MR Interface

The software components of the interface consist of two parts, the Virtual Monitors shown on the user ID cards, and the Virtual Shared Whiteboard. When the system is running, computer vision techniques are used to identify specific user ID cards (using the user name on the card) and display live video or a 3D avatar of the remote user that corresponds to the ID card. Vision techniques are also used to calculate head position and orientation relative to the cards so the virtual images are precisely registered with the ID cards. Figure 2 shows an example of a Virtual Monitor, in this case the user is holding an ID card which has live video from a remote collaborator attached to it. This approach is similar to that of Rekimoto who uses vision techniques to identify 2D matrix markers in non-collaborative Mixed Reality applications [8].

Figure 2. Remote user representation in the MR interface.

If the remote system has a camera to capture a live video image of the user, this live video image is displayed using video texture mapping techniques. Otherwise, a 3D virtual avatar of the remote user is displayed. The face angle of the avatar can be operated by the remote participant. The use of virtual images of remote collaborators attached to physical cards means that the local user can arrange the cards about them in space to create a spatial conferencing space. The cards are also small enough to be easily carried, ensuring portability.

Shared whiteboards are commonly using in collaborative applications to enables people to share notes and diagrams. In our application we use a Virtual Shared Whiteboard as seen in figure 3. This is shown on a larger paper board with six similar registration markings as the user ID cards. Virtual annotations written by remote participants are displayed on it, exactly aligned with the plane of the physical card. The local participant can use the light-pen to draw on the card and add their own annotations, which are in turn displayed and transferred to the remote desktops. The MR user can erase their own annotations by touching one corner of the card. Currently our application only supports virtual annotations aligned with the surface of the card, but we are working on adding support for shared 3D objects, similar to what we have with the remote avatar representations.

Figure 3. Virtual Shared White Board

The position and pose of this paper board can be estimated by using the same vision methods used for the virtual monitors. However, since the user’s hands often occlude the registration markers, the estimation has to be done by using only visible markers. We can reliably estimate the card position using only one of the six markers. The LED light-pen is on while it touches the paper board. When this happens the system estimates the position of the pen tip relative to the paper board from the 2D position of the LED in the camera image and the knowledge that the tip of the pen is contact with the board. Users can pick up the card for a closer look at the images on the virtual whiteboard, and can position it freely within their real workspace.

Desktop Interface

The MR user collaborates with remote desktop users that have a more traditional interface. The desktop users are on networked SGI computers, some with video input some without. Users with video cameras on their computer see a video window of the video image that their camera is sending, the remote video from the MR head mounted camera and a share white board application, as shown in figure 4. The video from the MR users head mounted camera enables the desktop user to collaborate more effectively with them on real world tasks. They can freely draw on the shared white board using the mouse, and whiteboard annotations and video frames from their camera are send to the MR user using UDP network socket transfer.

Figure 4. Desktop User Interface

Users without a video camera see the same whiteboard application, and view from the MR user’s head mounted camera, but also see their 3D graphic avatar. Clicking on the avatar and moving the mouse causes it to change it’s orientation both in the desktop view and the view seen by the MR user. Figure 5 shows a view of the avatar interface.

Figure 5. Avatar Desktop User Interface

Users can also communicate with each other using audio. We use VAT, a freely available program that uses multicast audio to enable audio communication between groups of remote machines.

In the future we plan to add another desktop application which will enable sharing and interaction with 3D objects on the Shared Virtual Whiteboard.

Initial User Impressions

Our 3D conferencing application has been informally tested by dozens of users in our lab and visitors to our lab. We are in the process of conducting formal user studies to see how conferencing with our MR interfaces impacts collaboration differently from traditional video conferencing. In general users have found the interface to be very engaging and natural, particularly in how easy the remote collaborators can be positioned about them in space and in the freedom of movement they have about the real world while still conferencing. Users also liked being able to interact with the real world at the same time as the conference participants, and they found that the remote head mounted camera view allowed them to collaborate effectively with the MR user.

The whiteboard was not quite as successful. Although desktop users could easily draw on their whiteboards, the MR user sometimes found using the light pen difficult. Accurate tracking of the light pen required good lighting conditions and didn’t provide the same level of resolution as mouse input. However, the MR user was able to easily see and comment about the annotations being made by the desktop users, and contribute when the vision routines were working well. This is an area for future improvement, especially as we start to use the whiteboard for interacting with shared 3D virtual objects.

Interestingly, we found that users naturally exhibited behaviors impossible in traditional video conferencing, such as picking up remote user representations to talk to them one on one, or positioning users in groups simulating side conversations. Typically users position the virtual monitors around the virtual whiteboard, just as if they were having a face-to-face meeting. This suggests that many face-to-face behaviors impossible to capture in desktop conferencing may be supported by our interface.

These informal trials have also revealed some directions for future improvements. Many MR users would assume that the remote users could actually see a video image from the viewpoint of their avatar representation. They would sometimes say "have a look at this" while picking up the avatar representation and pointing it toward some real object. We plan to support this natural action by adding small cameras to the user ID cards. This has the additional benefit of allowing desktop users to have a view of the MR users face. Currently the interface is asymmetric, the MR user sees all the other users faces, but they only see the MR users view of the real world.

Another improvement that can be made is through the use of spatial audio. When a user leans into an avatar representation their audio should get louder just as in face to face collaboration. Similarly when multiple people are speaking at once, spatial cues should be used to enable users to easily discriminate between them.

COMPUTER VISION TECHNIQUES

Our MR conferencing interface relies heavily on computer vision techniques for ID recognition and user head position and pose determination. In the remainder of the paper we outline the underlying computer vision methods we have developed to accomplish this. These methods are general enough to be applicable for a wide range of mixed reality applications.

Mixed Reality Systems using HMDs can be classified into two groups according to the display method used:

Type A: Video See-through Mixed Reality

Type B: Optical See-through Mixed Reality

In type A, virtual objects are superimposed on a live video image of the real world captured by the camera attached to the HMD. The resulting composite video image is displayed back to both eyes of the user. In this case, interaction with the real world is a little unnatural because the camera viewpoint shown in the HMD is offset from that of the user’s own eyes, and the image is not stereographic. Performance can also be significantly affected as the video frame rate drops. However, this type of system can be realized easily, because good image registration only requires that the relationship between 2D screen coordinates on the image and 3D coordinates in the real world is known.

In type B, virtual objects are shown directly on the real world by using a see-through display. In this case, the user can see the real world directly and stereoscopic virtual images can be generated so the interaction is very natural. However, the image registration requirements are a lot more challenging because it requires the relationships between the camera, the HMD screens and the eyes to be known in addition to the relationships used by type A systems. The calibration of the system is therefore very important for precise registration. A good review of the issues faced in Mixed Reality registration and calibration is found in the work of Azuma [1].

We have developed a precise registration method for the optical see-through mixed reality system. Our method overcomes two primary problems; calibration of the HMD and camera, and estimating an accurate position and pose of fiducial markers. It can also be used for video see-through systems and we can run our conferencing application in either optical or video see-through configurations.

HMD and Camera Calibration

We calibrate the system using the calibration tool shown in Figure 6. This is a simple cardboard frame with a ruled grid of lines on it that is attached to the front of the HMD as shown. By attaching the calibration tool to the HMD, the head position doesn’t change relative to the grid of lines, so we can find the relationship between the camera, the HMD screens and the eyes. There are two critical transformations we need to find; that between the HMD screen coordinates and the calibration tool coordinates (Tst), and that between the camera coordinates and the calibration tool coordinates (Tct). Figure 7 shows these transformations and the various coordinate frames used.

Figure 6. Calibration tool used in our calibration method.

Figure 7. Coordinates frames in our calibration procedure.

HMD Calibration – Finding TST

In a see-through HMD, a ray from a physical object reaches the focal point of the eye through the HMD screen. Hence, a 3D position represented in the eye coordinates whose origin is the focal point of the eye can be projected on the HMD screen coordinates by the perspective projection model. This assumes that the Z axis perpendicularly crosses the HMD screen, and the X and Y axes are parallel to X and Y axes of the HMD screen coordinates frame respectively, as shown in figure 7. The transformation Tst between the calibration tool coordinates and the HMD screen coordinates is therefore represented by the following equation.

(eq. 1)

where (xs, ys) is the HMD screen coordinates, f is the focal length, sx is the scale factor [pixel/mm] in direction of x axis, sy is the scale factor in direction of y axis, (x0, y0) is the position that Z axis of the eye coordinates frame passes, (Xe, Ye, Ze) is the coordinates of the 3D position represented by the eye coordinates frame and (Xt, Yt, Zt) is the coordinates of the same position represented by the calibration tool coordinates frame. R and T represent a rotation and a translation from the calibration tool coordinates to the eye coordinates respectively. This can be simplified by combining two transformation matrices to that equation shown in (eq. 2)

(eq. 2)

The calibration tool is used to find the exact values of the matrix Tst. When wearing this tool, a user sees a grid of virtual lines overlaid on the cardboard frame as shown in figure 8. To calibrate Tst, the user fits the virtual lines drawn on the HMD screen to the corresponding line segments on the physical calibration tool. The virtual lines can be moved and rotated by keyboard operation. The positions of all of the intersections of the real line segments are known in the calibration tool coordinate frame. This user operation finds the corresponding positions in the HMD screen coordinates frame. By using this data, the transformation matrix Tst can be estimated. The user carries out this process for each of eyes, generating two matrices. The resultant Tst matrices are used as the transformation between the virtual 3D coordinate system and the HMD screen coordinate system.

Figure 8. The image the user sees during calibration.

Camera Calibration – Finding TCT

In a similar way the relationships among the camera screen coordinates, the camera coordinates and the calibration tool coordinates can be represented as:

(eq. 3)

where P represents the perspective transformation, Tct represents the translation and rotation transformation from the calibration tool coordinates to the camera coordinates and C is the transformation matrix obtained by combining P and Tct. To find matrix C, the same process as mentioned above is used, however in this case video from the camera is displayed on a computer monitor and the user aligns the virtual lines on-screen.

The main purpose of this camera calibration process is to find the matrices P and Tct. However, the matrix C cannot be decomposed into P and Tct in general because matrix C has 11 independent variables but matrices P and Tct have 4 and 6 respectively, so the sum of the independent variables of P and Tct is not equal to the one of C. A scalar variable k is added into P to make these numbers equal as the following:

(eq. 4)

As a result, the matrix C can be decomposed into P and Tct. The variable k should be zero ideally but it may be a small noise value. The transformation matrix P is used for the estimation of the relationship between the marker coordinates and the camera coordinates by image analysis. Details are given in the next section. Tct is used with Tst for determination of the position in the HMD screen coordinate frame corresponding to the position represented in the camera coordinate frame as described in a later section.

POSITION AND POSE ESTIMATION OF MARKERS

Estimation of the Transformation Matrix

Size-known square markers are used as a base of the coordinates frame in which Virtual Monitors are represented (Figure 9). The transformation matrices from these marker coordinates to the camera coordinates (Tcm) represented in eq. 5 are estimated by image analysis.

 

(eq. 5)

Figure 9. The relationship between marker coordinates and the camera coordinates is estimated by image analysis.

After thresholding of the input image, regions whose outline contour can be fitted by four line segments are extracted. Parameters of these four line segments and coordinates of the four vertices of the regions found from the intersections of the line segments are stored for later processes. The regions are normalized and the sub-image within the region is compared by template matching with patterns that were given the system before to identify specific user ID markers. Currently our system matches on user names and characters, but an image patterns could be used.

When two parallel sides of a square marker are projected on the image, the equations of those line segments in the camera screen coordinates are the following:

(eq. 6)

For each of markers, the value of these parameters has been already obtained in the line-fitting process. Given the perspective projection matrix P that is obtained by the camera calibration in eq.7, equations of the planes that include these two sides respectively can be represented as eq.8 in the camera coordinates frame by substituting xc and yc in eq.7 for x and y in eq.6.

(eq. 7)

 

(eq. 8)

Given that normal vectors of these planes are n1 and n2 respectively, the direction vector of parallel two sides of the square is given by the outer product n1xn2. Given that two unit direction vectors that are obtained from two sets of two parallel sides of the square is u1 and u2, these vectors should be perpendicular. However, image processing errors mean that the vectors won’t be exactly perpendicular. To compensate for this two perpendicular unit direction vectors are defined by v1 and v2 in the plane that includes u1 and u2 as shown in figure 10. Given that the unit direction vector which is perpendicular to both v1 and v2 is v3, the rotation component V3x3 in the transformation matrix Tcm from marker coordinates to camera coordinates specified in eq.5 is [V1t V2t V3t].

Figure 10. Two perpendicular unit direction vectors: v1, v2 are calculated from u1 and u2.

Since the rotation component V3x3 in the transformation matrix was given, by using eq.5, eq.7, the four vertices coordinates of the marker in the marker coordinate frame and those coordinates in the camera screen coordinate frame, eight equations including translation component Wx Wy Wz are generated and the value of these translation component Wx Wy Wz can be obtained from these equations.

The transformation matrix found from the method mentioned above may include error. However this can be reduced through the following process. The vertex coordinates of the markers in the marker coordinate frame can be transformed to coordinates in the camera screen coordinate frame by using the transformation matrix obtained. Then the transformation matrix is optimized as sum of the difference between these transformed coordinates and the coordinates measured from the image goes to a minimum. Though there are six independent variables in the transformation matrix, only the rotation components are optimized and then the translation components are reestimated by using the method mentioned above. By iteration of this process a number of times the transformation matrix is more accurately found.

An Extension for the Virtual Shared White Board

The method described for tracking user ID cards is extended for tracking the shared whiteboard card. There are six markers in the Virtual Shared White Board, aligned around the outside of the board as shown in figure 11. The orientation of the White Board is found by fitting lines around the fiducial markers and using an extension of the technique described for tracking user ID cards.

Figure 11. Layout of markers on the Shared White Board.

Using all six markers to find the board orientation and align virtual images in the interior produces very good registration results. However, when a user draws a virtual annotation, some markers may be occluded by user’s hands, or they may move their head so only a subset of the markers are in view. The transformation matrix for Virtual Shared White Board has to be estimated from visible markers so errors are introduced when fewer markers are available. The magnitude of error increases the further the virtual object is to be drawn from visible markers and as the whiteboard is rotated. To reduce errors the line fitting equations are found by both considering individual markers and sets of aligned markers. Each marker has a unique letter in its interior that enables the system to identify markers which should be horizontally or vertically aligned and so estimate the board rotation. Though line equations in the camera screen coordinates frame are independently generated for each of markers, the alignment of the six markers in Virtual Shared White Board means that some line equations are identical. Therefore by extracting all aligned sides from visible markers for the line-fitting, each line equation is calculated by using all the contour information that is on the extracted sides. Furthermore by using all the equations of the detected parallel lines, the direction vectors are estimated and the board orientation found.

Pen Detection

The light-pen is on while touching the shared whiteboard board. Estimation of the pen tip location is found in the following way. First, the brightest region in the image is extracted and the center of the gravity is detected. Since pen position (Xw, Yw, Zw) is expressed relative to the Virtual Shared Whiteboard it is detected in the whiteboard coordinate frame. The relationship between the camera screen coordinates and the whiteboard coordinates is given by eq.9. (xc, yc) is a position of the center of gravity that is detected by image processing. Also Zw is equal to zero since pen is on the board. By using these values in eq.9, two equations including Xw and Yw as a variable are generated and their values are calculated easily by solving these equations.

(eq. 9)

Rendering of Virtual Objects

Virtual objects such as the Virtual Monitors are rendered on the corresponding marker in the marker coordinates frame. The transformation from marker coordinates to the HMD screen coordinates is important so that virtual objects can appear at the suitable position on the marker in the real world. The transformation matrix Tst from the calibration tool coordinates to the HMD screen coordinates shown in eq.2 and Tct from the calibration tool coordinates to the camera coordinates shown in eq.3 are obtained by our calibration method. Also the transformation matrix Tcm from the marker coordinates to the camera coordinates shown in eq.5 is obtained by the image processing mentioned in the previous section. The final transformation matrix Tsm from the marker coordinates to the HMD screen coordinates is represented by using these matrices as shown overleaf. This matrix is used in rendering the virtual objects to ensure they appear fixed relative to the fiducial markers.

(eq. 10)

The above matrix representation is suitable for the OpenGL graphics libraries we use for developing our applications. OpenGL uses both a modelview matrix and a projection matrix to render 3D graphics, so TstTct-1 can be regarded as a projection matrix and Tcm as the modelview matrix. The matrix TstTct-1 does not change after the calibration. it only needs to be calculated once. However, the matrix Tcm changes each time the user moves their head (and camera) position. So a virtual object can be registered in the marker coordinates frame by continuously updating the modelview matrix for Tcm.

 

CONCLUSIONS

In this paper we have described a new Mixed Reality conferencing application and the computer vision techniques used in the application. The interesting aspect of this system is that it is the opposite of traditional video conferencing. Our goal is to put a virtual representation of the remote user into the local user’s real world location, enabling them to have a videoconference regardless of where they are. In contrast, current video conferencing requires the user to move to a desktop computer or videoconferencing suite, often removing them from their workplace. This system also restores the spatial cues lost in traditional videoconferencing. Having remote users represented as objects in the physical environment means that their virtual avatars can also gesture and interact visually with other objects in the user’s space. This is a potentially powerful new interaction technique for collaborative MR interfaces.

However we have just begun to explore potentially new interaction techniques using this interface. The light pen used in our interface has limitations that may be overcome through other input devices. Other input metaphors will also need to developed when the users are collaboratively interacting with shared virtual 3D objects, rather than just 2D annotations.

Our computer vision methods give good results when the markers are close to the user, but accuracy decreases the further the cards are from the camera. For our MR conferencing application this hasn’t proved to be a problem, however we need to precisely measure the registration errors so we can understand what types of applications our approach is best suited for and methods for increasing the registration accuracy.

Another area which we want to develop is moving our interface over to a wearable computer to take advantage of its portability. In the future we envision a scenario when a user in the field can initiate a videoconferencing session with a remote collaborator simply by pulling a fiducial marker out of their back pocket and looking at it. The current generation of wearable computers almost have enough CPU power to make this possible.

REFERENCES

  1. Azuma, R. SIGGRAPH '95 Course Notes: A Survey of Augmented Reality. Los Angeles, Association for Computing Machinery, 1995.
  2. Billinghurst, M., Weghorst, S., Furness, T. Shared Space: An Augmented Reality Approach for Computer Supported Cooperative Work. Virtual Reality Vol. 3(1), 1998, pp. 25-36.
  3. Carlson, C., and Hagsand, O. (1993) DIVE - A Platform for Multi-User Virtual Environments. Computers and Graphics. Nov/Dec 1993, Vol. 17(6), pp. 663-669.
  4. Grudin, J. Why CSCW applications fail: Problems in the design and evaluation of organizational interfaces. In Proceedings of CSCW ’88, Portland, Oregon, 1988, New York: ACM Press, pp. 85-93.
  5. Ishii, H., Kobayashi, M., Arita, K., Iterative Design of Seamless Collaboration Media. Communications of the ACM, Vol 37, No. 8, August 1994, pp. 83-97.
  6. Mandeville, J., Davidson, J., Campbell, D., Dahl, A., Schwartz, P., and Furness, T. A Shared Virtual Environment for Architectural Design Review. In CVE ‘96 Workshop Proceedings, 19-20th September 1996, Nottingham, Great Britain.
  7. Ohshima, T., Sato, K., Yamamoto, H., Tamura, H. AR2Hockey:A case study of collaborative augmented reality, In Proceedings of VRAIS'98, pp.268-295 1998.
  8. Rekimoto, J. Matrix: A Realtime Object Identification and Registration Method for Augmented Reality. In Proceedings of Asia Pacific Computer Human Interaction 1998 (APCHI'98), Japan, Jul. 15-17, 1998.
  9. Rekimoto, J. Transvision: A Hand-held Augmented Reality System for Collaborative Design. In Proceeding of Virtual Systems and Multimedia ‘96 (VSMM ‘96), Gifu, Japan, 18-20 Sept., 1996.
  10. Schmalsteig, D., Fuhrmann, A., Szalavari, Z., Gervautz, M., Studierstube - An Environment for Collaboration in Augmented Reality. In CVE ’96 Workshop Proceedings, 19-20th September 1996, Nottingham, Great Britain.
  11. Sellen, A. Speech Patterns in Video-Mediated Conversations. In Proceedings CHI ‘92, May 3-7, 1992, ACM: New York , pp. 49-59.
  12. Wexelblat, A. The Reality of Cooperation: Virtual Reality and CSCW, in Virtual Reality: Applications and Explorations. Edited by A. Wexelblat. Boston, Academic Publishers, 1993.