Graphics Subsystems

Author: Michael Dennehy

The graphics subsytem is the part of a VR system's reality engine which handles image generation. This article will describe the purpose, general architecture, capabilities, and functionality of a typical graphics subsystem and discuss example subsystems. The computer science foundation areas of computer architecture, operating systems, and algorithms along with the specialized area of computer graphics provide the background knowledge to design the subsystem's processing hardware and interfaces.

The graphics subsystem (GS) receives graphics commands from the reality engine's CPU over a bus, builds the image specified by the commands, and outputs the resulting image to display hardware. Separation of graphics processing from the CPU has the distinct advantage of allowing hardware specialized for graphics to work in parallel with the CPU. However, because effort is expended in the communications, the bus between the CPU and the GS must have the bandwidth to make the advantage worthwhile. Thus, when selecting or designing a GS it is important to be aware of the capabilities of the bus interface.

Key GS features for VR include 3D polygon support, lighting, flat and smooth (gouraud) shading, antialiasing, alpha transparency, Z-buffering, and texture mapping. Lighting can support specular (spotlight), diffuse (direct), ambient (indirect), and emmision (self-luminous) parameters. Alpha values specify the degree of transparency from sheer to opaque. For accuracy of hidden surface removal especially for distant objects, the Z-buffer should be at least 24 bits deep. A key texture mapping feature is mipmapping which involves the generation of prefiltered lower resolution versions of a texture which closely fit the varying sized polygons to which they are applied. (The use of these features and more details on their functions are discussed in the rendering article of this publication.)

Other important considerations include available object cache and the application programming interface. A GS with available object memory allows an application to download a complex object once and from that point on reference it through some tag (usually an index number). Transformations can be performed on the objects to position them accordingly in the scene. This can improve performance by cutting down traffic on the CPU to GS bus. Another important consideration is the graphics language available to application developers. Each GS seems to recognize its own proprietary set of commands requiring translation or redevelopment of software when switching platforms. An emerging graphics programming language interface standard in the PC and workstation arena is OpenGL. OpenGL is based on Silicon Graphics Inc.'s (SGI) Iris GL (graphics library) and has been endorsed by many leading PC and workstation vendors. Selection of a GS with OpenGL support reduces the chance of being locked into a proprietary system with a dead end growth path. Keep in mind that GSs that do not provide direct OpenGL support, but instead translate the OpenGL interface to their own in software, will exhibit poorer performance.

Once graphics commands are read from the GS to CPU bus, a GS performs four basic functions: geometric processing, scan conversion, raster management, and display generation. During geometric processing, input vector and polygonal data is received in 3D world space coordinates (X, Y, & Z). Scale, rotation, and translation transformations are then performed to size, position, and orient the input objects with respect to the viewer. Lighting and clipping (eliminating what is out of the bounds of the viewing surface) operations are performed as well. The output of geometric processing is a stream of objects in 2D screen coordinates.

The scan conversion process transforms the 2D vertices and polygons into pixel data. The pixel data is organized into span lines representing either each row or each column of screen pixels. Assuming columnar spans, the scan conversion process will walk along the top and bottom edges of each polygon looking for pixels in the same columns. The portion of the column in between these pixels is a span. Each pixel in the span has its screen coordinates (X & Y), a depth calculated from the endpoints original Z coordinates, and color, translucency, and texture values. Edges are antialiased in this phase as well to reduce their jagged appearance on raster (based on a grid of discrete picture elements) displays.

The raster management process accepts a stream of spans for each polygon and vector and coalesces them into a single memory based grid of pixel values called the frame buffer. Color interpolation for lighting and shading, color blending for transparencies, and hidden surface removal are all done at this stage. Hidden surface removal is generally done through a memory based grid of depth values called a Z-buffer. As each span is processed each pixel depth is compared to the corresponding grid element in the Z-buffer. If the pixel depth is greater, it is assumed the span occludes the previous spans at this pixel. The pixel's color and texture values are written into the frame buffer and the new depth is written into the Z-buffer. Many higher end GSs do not support simultaneous Z-buffering and transparencies because the top pixel is written into the frame buffer without blending it with underlying values when the Z-buffer is on.

Finally, during display generation the digital frame buffer is scanned and converted to video signals. Digital-to-analog converters combined with other circuitry support the generation of the video signals. Most GSs support a variety of video signals. To prevent annoying screen flicker of horizontal lines, a non-interlaced video signal of at least 60Hz should be supported.

GSs range greatly in capabilities and price. On the extreme high end are million dollar subsystems which are included with highly specialized systems (e.g. flight simulators). For VR, GSs which are part of a general development platform are usually more suitable. For GSs of this type the SGI Reality Engine 2 and the Evans and Sutherland Freedom Series 3000 sit at the high end. In the midrange, SGI's Elan GS and E & S's Freedom 1000 are representative offerings. SGI's Indy and Intel's ActionMedia are low end examples.

In addition to price, the most useful metric is the number of shaded, textured, antialiased polygons that can be drawn in a second. When combined with the desired frame rate of the application, this metric defines the maximum complexity of a scene in the virtual environment. Typically, vendors quote figures obtained when using optimal input data (e.g. triangular meshes of 25 vertices). An application will usually be unable to provide optimal input data and can therefore achieve significantly poorer performance sometimes by as much as 50%. The Reality Engine 2 and Freedom 3300 systems can produce over 200,000 textured, shaded, antialiased polygons per second and around 600,000 shaded (not textured) polygons per second for around $80,000. SGI's Elan and the Freedom 1100 can output about 80,000 textured polygons p/s and 200,000 shaded polygons p/s for around $25,000. At the lower end, an Indy GS can produce 24,000 shaded polygons for around $5000. Intel's ActionMedia GS outputs 3,000 shaded polygons per second for about $3000.

These are only representational examples. There are many other GSs available at many levels in between the example systems in terms of performance and price. When developing a VE it is important to know the features, performance, and limitations of a GS to make an intelligent selection decision.

Human Interface Technology Laboratory