CHAPTER 3

3D COLLABORATION ON THE INTERNET

 

3D, multi-user collaboration on the internet is currently enabled through five component technologies: 3D world modeling technology, communications technology, Web browser technology, server technology, and client/server connection technology. Each plays its part in providing the multi-user experience to each collaborator. The world model defines which visual objects are part of the collaboration and defines each object’s geometry, appearance, location, orientation, and scale. Communication technology defines how collaborators communicate: via text, voice, or email while participating in a multi-user world. Browser technology defines how each collaborator interacts with the world model and how the other communications technologies are integrated with the world model interaction. Server technology coordinates the collaboration to make the collaboration process sensible to each collaborator. Client/server communications technology enables the browser to communicate with the server(s). This chapter provides a component by component review of the current state of 3D world based collaboration over the internet.

The World Model

An author typically uses a 3D modeling tool to create the world model. The model is created visually using a computer program with a special graphical user interface that makes it easy to point and click on points in 3D space to insert objects and define their geometry appearance, location, orientation, and scale.  VRML models can be



created quite effectively using the 3D Studio Max modeling package from Kinetix or the Alias modeling package from Alias|wavefront. Both these packages use four simultaneous views to model objects: top view, front view, side view, and perspective view. Figure 3.1 shows an example of a world modeling package. The state of the art today in virtual world modeling allows world model developers to attach object behaviors to the objects being modeled. Modeling tools are used to create each collaborator’s personal representation, called an avatar, which enables non-verbal communications within the multi-user world.

                   

Figure 3.1 The Alias’ 3D Modeling Interface

Once the model is created, a world model must be delivered to each participant in order that each collaborator has access to the same images on his or her monitor while collaborating.  The internet has improved world model delivery significantly as a collaborator today only needs access to the internet in order to gain access to the latest world model. VRML is a standard that defines a world model and stores it in a simple ASCII or UTF8 text-based file[12]. VRML files can be delivered using http: Web server delivery strategies. Because of the World Wide Web craze of the early 1990s, Web servers have become optimized for delivery of HyperText Markup Language (HTML) documents over the internet. HTML documents are simple text based documents which define how a Web page is viewed by a Web user in a Web browser [13]. HTML documents can obtain components of a Web page from other Web servers, allowing efficient use and reuse of text and graphical information. Within the VRML standard, object geometry, appearance, location, orientation, and scale are defined for each object in the virtual world. VRML then puts the objects together to make a 3D scene. VRML uses the same Web servers that HTML documents have used so successfully. No additional enhancements are needed to deliver 3D models anywhere on the Web. And, VRML has the same facility for obtaining world components from anywhere on the Web and including them in the current 3D world that HTML has with text and 2D graphics.

VRML objects can also be stored on and loaded from a local hard drive or CD-ROM drive. Other shared world model providers use other modeling technologies and deliver the content locally before a collaborator begins to use the technology. Video game technologies have traditionally required local storage of the game world before a player connects to a shared experience. The technology is downloaded using the internet or is purchased in a physical store in diskette or CD-ROM format. VRML has been designed specifically for Web server delivery which is more efficient for rapidly changing world content. The collaborator always accesses the latest world model because she obtains it from a Web server that only houses the latest world. Web server delivery continues to evolve and it seems sure that other technologies will provide on-line, server-based 3D world model delivery similar to VRML.

The world model author develops the model while considering its final file size and its complexity. The file size is considered relative to expected storage capacity and available Random Access Memory (RAM) of a user’s computer as well as estimated download times for internet accessed files. The world complexity is considered relative to the 3D multi-user client’s capability to manage the complexity within a reasonable frame loop explained in the Browser section of this chapter..

Communications

As seen in Chapter 2, collaborators have been using different technologies to enable communications during the collaborative process. These communications technologies are all becoming available in 3D shared multi-user worlds. As to their coordination with the 3D world itself, many voice, text, or email implementations are best considered as separate, parallel computing processes. Telephony, voice-enabled or text-based chat, and email can be used in separate windows or frames in order for collaborators to discuss the 3D world they are sharing. Also, collaborators can choose to use separate tools and share information through a separate communications server that does not rely on any coordination with the shared world.

There is some benefit to integrating the communications within the 3D world. If a user is represented by an avatar modeled as a 3D object, others are aware of his location in 3D space. If the chat environment is aware of the user’s location, it can take that information into account to modify communication messages. An obvious example of this is the voice-enabled chat environment provided in On-Live! Technologies Inc.’s virtual world technology. In that environment, the volume of a collaborator’s voice is dependent on his or her distance from another collaborator. This set-up is a natural interface which provides easy one-on-one communications. Two collaborators need only find a location away from other users and they will only hear each other. If six users stand in a circle, they all hear each other at the same time with similar volume.

Even text-based communications can take advantage of the 3D locations of the collaborators. Distance from others can dictate which chat channels a user is privy to. A new visitor on the scene can quickly determine who is available for a chat session based on their location in 3D space. For example, blaxxun interactive’s Passport client provides a beam feature that allows one collaborator to quickly move to a location in the world that is directly in front of another collaborator identified by name.  His or her avatar is also automatically oriented to face toward the other collaborator. At that point, it is clear that a one-on-one chat session is being pursued and the collaborators can negotiate to establish it. For integrated communications scalability, the communication process can run on a separate server or as another process on the same server.

The collaborator communications decision is a difficult one for a 3D multi-user developer.  The more communication bits coming over the internet to a collaborator, the more bandwidth he or she needs to be able to manage them in real time. To develop for a 14.4 or 28.8 kbs modem connection, trade-off decisions have to be made. For example, On-Live! Technologies Inc.’s Traveler viewer does not provide a body for a collaborator’s avatar since the animation of the mouth bits and voice bits occupy a significant portion of the bit budget supported on a slow internet connection with limited bandwidth.

The Browser

Most clients used by 3D world collaborators use two separate applications. General Web navigation is provided by a Web browser originally designed for HTML document presentation. 3D model specific navigation is provided by another application that communicates with the Web browser through a defined specification; Netscape Communication Corporation’s plug-in Application Programming Interface (API) being the most popular. Almost all popular VRML viewers, for example, connect to the popular Netscape Communications Corporation’s Navigator and Microsoft Corporation’s Internet Explorer Web browsers through the plug-in technology originally provided by Netscape Communications Corporation. The Web browser communicates with Web servers to request and send information on the VRML viewer’s behalf. The VRML viewer then uses the data received to create the 3D world seen by the user. Yet, stand-alone 3D VRML viewers do exist and provide refreshing, creative clients. OZ Interactive Inc.’s OZ Virtual browser is an example of a VRML browser that handles basic Web navigation without the help of another application.

The client is responsible for parsing the world model as it is delivered from a Web server, determining a beginning viewpoint, rendering the scene based on that viewpoint, and then maintaining changes that occur based on the user’s interaction with the scene or server-based messages sent to the client. Incoming bits represent incoming text or voice communications, avatar location changes, behaviors in the shared world initiated by any collaborator’s actions or timers embedded in the world. The local collaborator’s viewpoint is managed locally within his or her client. Outgoing bits include changes to the local avatar position, behaviors activated by the local collaborator, and text or voice messages sent for communication with other collaborators.

Most of the obvious differences between browsers is a result of different choices in the look and feel of the user interface. This chapter will take a look at the basic capabilities of most 3D multi-user enabled browsers. blaxxun interactive’s Passport [14] (Figure 4.1), On-Live! Technologies Inc.’s Traveler [15] (Figure 4.2), Sony Corporation’s CyberPassage [16] (Figure 4.3), and OZ Interactive Inc.’s OZ Virtual [17] (Figure 4.4) are all popular 3D multi-user world browsers. New versions appear approximately every month and, as a result, make any description of their capabilities outdated soon after putting the words on paper.  The following comments are as of February 1997.

Figure 4.1 The Passport Viewer (courtesy of blaxxun interactive)

Figure 4.2 The Traveler Viewer (courtesy of OnLive! Technologies Inc.)

Figure 4.3 The CyberPassage Viewer (courtesy of Sony Corporation)

Figure 4.4 The OZ Virtual Viewer (courtesy of OZ Interactive Inc.)

All four browsers handle voice and/or text chat simultaneously while providing a shared world model to each connected user. Each provides an avatar which can be seen by others as a representation of each user. All are working to incrementally incorporate the VRML 2 standard. The VRML 2 standard provides object behaviors such as change in location, orientation and scale, change in color and lighting and appearance and disappearance.  The user has control of a navigation mode such as walk, fly, or examine mode, the ability to turn on or off a default headlight attached to his or her virtual head, the ability to bookmark an exact location and orientation in a world, and the ability to enforce collision with other objects or disable it. The user moves around in the world using the arrow keys on the keyboard or by way of mouse movement within the world itself or relative to a control panel provided within the interface.

These clients provide each collaborator the ability to choose their avatar from a avatar collection. Some allow each avatar child object (such as hat, shirt, shoes, etc.) to be changed or colored separately, and some allow the user to provide their own avatar following some guidelines and VRML 1 or 2 design.

The client is carefully engineered such that a certain minimum frame rate is maintained if the minimum recommended CPU, RAM, video board, and internet connectivity technologies are used by a collaborator. The frame rate, or number of times per second the world is re-rendered to the screen, is a critical success factor for most users. Internally, the client makes constant tradeoffs between available changes provided by all incoming bits, user movements, and mouse clicks. The higher the frame rate, the better the experience. In an ideal situation, all state changes are easily handled by the client frame loop with time left over. Then, the client can increase the frame rate above the minimum rate used internally, perform some other function, or just wait for the next frame loop to begin. When all state changes can’t be handled within the minimum frame rate loop, the client can throw out some of the changes or queue them for later processing. Each client developer creatively programs these trade-off decisions which then become more important as the world complexity increases, number of collaborators increase, and active behaviors increase.

The Server

In its most basic form, the server simply connects users together in order to send changes from one collaborator’s world model to the other collaborators’ world models. A collaborator connects to a server over the internet, is delivered a current model of the world from the server or another collaborator, and then interacts with the world by sending updates of his or her actions and receiving updates from others’ actions.

In its most complicated form, the server can be transforming the world itself and communicating its changes along with changes from other collaborators. The server can contain logic that monitors each collaborator’s actions and regulates it in any way. For example, blaxxun interactive’s CyberHub server can mute a collaborator on behalf of another collaborator’s request.  Today, most of the server capabilities are tied to a specific client technology. Connecting to a server without a specific client technology makes little sense.

It is possible to architect a 3D shared world without a server if each client knows how to communicate with all other peers. Such server-less multi-user worlds, called distributed worlds, usually are enabled with a broadcast or multicast communications environment. Greenspace is an example of such a multi-user world delivery environment [18]. Greenspace clients are connected over a dedicated network such that changes within one collaborator’s world are communicated only to certain peers. Since the internet IP multicast communications protocol is still in its infancy and internet broadcasting is a tremendous waste of messaging, centralized servers have been used extensively to both deliver 3D virtual worlds and maintain communications between users for internet implementations.

Client/Server Communications

In a client/server architecture, each function point is strategically placed at the server or on the client based on the ability and capacity of each technology. Technologists take advantage of client/server architecture to split the development effort among mutually-exclusive programming efforts. Client specialists work to make the client more user friendly and capable. Server specialists work to make the server more secure, fast, and capable. As long as the client/server communications piece is defined ahead of time, each can work on independent timelines because the latest server will work with the latest client and vice versa. Client/server development strategies are seen everywhere internet enabled multi-user worlds are being created.

Sony Corporation, Silicon Graphics, Inc., Chaco™, Intel Corporation, blaxxun interactive, and Netscape Communications Corporation all focus on either a multi-user client or server or both. The clients continue to improve to add new features and become more efficient. The servers continue to become more secure, fast and functional. And, often, the clients are upgraded monthly while server releases appear every six months. The most troubling constant is the latency brought on by an internet connection which requires developers to respect a certain time lag between a server sending bits and a client receiving them.

The design of the client/server communications piece is critical to the technology’s success. The VRML community continues to extend VRML to handle client/server communications. Yet, VRML viewer developers are creating client application programming interfaces (APIs) that let the browser communicate with a server written in any programming language. Both approaches show much promise for dynamic multi-user virtual world development. The next two sections contrast the two approaches to client/server communications.

Extending the VRML standard

The Living Worlds standard extends the VRML 2 standard through the PROTO and EXTERNPROTO node keywords in order to add multi-user capabilities to a VRML 2 scene [20].  The VRML 2 PROTO node allows an author to encapsulate all characteristics and behaviors of a VRML 2 object and make that prototype available to all other VRML scenes by way of the Web. Another author can use the same PROTO node in his or her scene by referring to it in an EXTERNPROTO node. The EXTERNPROTO node includes a field that points to the original PROTO node on the Web. The Living Worlds standard prototypes new nodes necessary for multi-user communication and object shared behaviors and makes them a standard interface for multi-user server developers to develop server software that is able to communicate with the multi-user worlds loaded in each visitor’s Web browser.

Within the Living Worlds architecture, VRML 2 authors can use a prototype node called Zone to include VRML objects in a multi-user area. The Zone node is a grouping node which tells the world server where multi-user behavior is to be enabled. VRML objects can be added and removed from the Zone nodes on the fly using ROUTE statements which are an integral part of the VRML 2 standard. The most interesting nodes to add to the Zone group are SharedObject nodes because a SharedObject demonstrates its behaviors to everyone within view of the object. A good example of a SharedObject is a pair of dice that can be rolled in a multi-user game. Those dice would be added to a Zone that contained all the game pieces, game board, and game table. The game table would be a simple VRML object with no shared behaviors. Any object that is a child of a SharedObject node can at best only demonstrate behaviors to the local collaborator that initiates them unless a local timer is provided to each collaborator during initial world acquisition. These local timers can enact behaviors in each collaborator’s world without the server.

A SharedObject can demonstrate the standard behaviors defined in VRML 2. If a multi-user server developer wants to create new technologies that can be enabled in a Zone, the Living Worlds standard provides a PrivateZone node which can contain a MuTechZone node and many PrivateSharedObjects nodes, each which can contain a MuTechSharedObject node. The word MuTech is short for a multi-user technician. These four nodes are all made into prototypes using VRML 2 syntax and as such only require a standard VRML 2 .wrl file to enable multi-user world interaction on each client. Unless, of course, the multi-user technician requires additional executable files to reside at the client in order to participate in their unique technology. Those files are downloaded once over the internet from the server provider and then accessed by all subsequent VRML 2 scenes.

The Living Worlds standard-setters realize that some of the features of multi-user technology are better provided by the browser. Still, until the browser developers make those features available, an alternative way of providing rich multi-user experiences is provided through the Living Worlds standard. A Living Worlds-like methodology could be provided to any internet multi-user world syntax where the world model itself includes the logic to initiate the server routines referenced. Such a methodology requires the server to be on the more sophisticated end of possible server types (versus a simple message pass-through server) until the world generation syntax matures. VRML is just one standard world syntax that is getting the most publicity today.

Client based APIs

An alternative approach to providing multi-user capabilities on the internet is to open up a world to other processes that call basic functions available within the client [21]. Considering this approach, the VRML 2 standard becomes important only for its ability to define the world objects’ appearance, geometry, location, orientation, and scale and the ability to change those parameters on the fly. Outside programs can determine the changes to be made and then call the world functions available in the world viewer client that then make the changes, including adding new objects and removing existing objects.

The processes that request changes can be written in any programming language and reside on the client or a server to which the client is connected. This approach allows much flexibility for the software developer. The VRML 2 external interfaces usually work as follows:

1.   The programmer creates variables in his or her program(s) that point to nodes in the VRML 2 scene graph.

2.   The programmer uses the variables in her program(s) to change variable states based on programming logic or event processing (available events are triggered by timers, mouse clicks, and proximity to objects).

3.   Periodically, the programmer requests the VRML 2 scene graph to update its state (and render the scene) based on the changes taking place within his or her program(s).

 

In this case, each event is communicated to a server and sent to all connected clients that can see or are otherwise affected by the event processing. The server is written in an appropriate language and receives event notifications from a client when a client validly changes its copy of the world. With this architecture, the server can be simple or complex as behavior generating processes can be placed at the server or the client. The server need not even keep its own version of the world if it is designed using a simple message pass-through architecture.


Considerations

The benefit of having a standard such as VRML 2 is that content authors can create content which is viewable by an audience using all kinds of different browsers. If the browser is standard compliant and the author’s work is standard compliant, the content should work on the browser without ever being specifically tested by the author on that browser. Many authors and browser developers have subscribed to the VRML 2 standard in order to reap these expected benefits.

The Living Worlds standard is an extension of the mentality of the VRML 2 standard creators. The Living World’s task force is developing a standard that encompasses how an author can identify shared behaviors and multi-user ability within the VRML scene graph file itself. Then, using the standard, the multi-user server developers can create a standard compliant server and multi-user functionality will be assured by the world author without specifically testing the world on each server. In the long run, if the standard is written well, Living Worlds could easily obtain the success that VRML 2 is currently enjoying. In fact, Living Worlds may become the cornerstone for VRML 3. Such a standard is useful if a united, connected, cyberspace is to be provided by many different interests.

In the short run though, using an external interface from the VRML 2 browser to other programs appears to be gaining a ground swell of interest. The Java programming language is meeting the needs for other Web based technologies and is being used for creating object behaviors in a VRML scene [19]. The Web is such a dynamic and ever-changing medium that developers are bound to keep pushing the technology through their own creative client programs and server technologies. An external interface in the VRML viewer client lets this rapid development process happen. Multi-user 3D world technology can improve rapidly because the five component technologies can be worked on separately by specialists and an improvement in any one of the world model builder, text/voice communications, Web browser, server, or client/server communications technologies improves the whole process.