Title ===== Recognition of Signals for Combat Formations and Battle Drills Credits ======= Dan Searles (searles@vnet.ibm.com) * Original JOVE project concept. * Initial background information survey. * Worked with physical Polhemus tracking devices, recorded gesture performances for group. * Developed template matching gesture recognition program. * Wrote template matching sections of report. Jerry Smith (jsmith@eng.iac.honeywell.com) * Researched necessary background information. * Researched existing gesture recognition studies and selected a suitable study for comparison purposes. * Wrote comparisons and analysis sections of report. Gregory Baratoff (baratoff@cfar.umd.edu) * Developed trajectory matching gesture recognition program * Wrote trajectory matching sections of report Brian Bohmueller (bomuller@nadc.nadc.navy.mil) * Coordinated integration of written report * Edited final report * Co-wrote introductory and Conclusion sections of report Introduction ============ A computerized gesture recognition system has been developed involving infantry command signals. These command gestures use hand and arm location and movement to coordinate combat formations and battle drills. Tracking data from three subject-mounted, position sensors provided a dynamic loci of gestural motion. Two alternate algorithm methods were investigated for their capability to discriminate these gesture sequences in "real-time." A template matching algorithm exploited static characteristics of the gesture set, while both trajectory matching and region tracking analyzed dynamic characteristics to interpret specific gestures. Ongoing development and testing of this system is continuing toward the evolution of an effective, final system design. This report provides preliminary data and analysis of the current design performance. [A detailed abstract is provided in Jove/Articles/dsgbjsbb-abstract.html of the ftp server.] Gesture Set Description ======================= The seventeen signals initially selected, were chosen from a standard Army gesture set used for coordination of combat formations and battle drills. The gestures are executed by using one or both arms in either a static pose or moving gesture. The gestures are based on arm and hand location/movement, in contrast with the predominantly finger and hand location/movement used in data glove applications. Ten of the chosen gestures are dynamic in nature, while the remaining seven gestures culminate in a static pose. Each gesture classified as static uses both arms and both hands. The dynamic subset of gestures defines both single and dual arm and hand signals. See Appendix A for detailed descriptions of the gestures. Applications ============ The applications for this system fall into two related areas. The original intent of signal use was for Infantry field coordination, where a soldier would sign a series of gestures to command troops (real or automated) in a virtual battle field environment. Additionally, as an instructive tool, the system could be used for the training and testing of command infantrymen in the execution of these gesture commands within a non-virtual or virtually simualated environment. Description of Setup ==================== Tracking data was provided by three Polhemus tracking devices. The tracking devices were attached to the posterior of the subject's wrists and back, midway between the shoulders. The sensors reported their positions and orientations relative to the Polhemus transmitter device located behind the user at approximately shoulder height. An IBM RISC system, using C++, was used for data recording and processing purposes during the development and testing of the gesture recognition logic. The data provided by the sensors was normalized so that the tracking data from the right wrist sensor referenced the center of the right shoulder socket as its origin. Similarly, the left tracker data was normal- ized to the center of the left shoulder socket. The back-mounted tracker was chosen to locate the origins, due to its stable relative position. Three rotations were possible, one of which needed to be accounted for in the calculation of the origins. This rotation involves the subject's twisting (or facing) right or left relative to the transmitter frame of reference. The other two rotations, leaning forward or backward and leaning right or left were not required since the gestures are performed in a normal standing position. Thus, little if any data interpolation was required in calculating the shoulder origins, since a normal standing position was inherent in the gesture set. Template Matching Approach ========================== Concept ------- The central concept behind the template matching algorithm is reference points. Reference points are points at the center of spatial regions in 3-D space. For this particular system, the regions were defined as an x, y, z center point, and three distance values, one for each axis. By alternately adding and subtracting the distance values along the appropriate axis about the center point, a region in the shape of a cube is formed. A possible alternative would be to define a point and a radius, and subsequently describe a sphere as the region. The origin for the coordinate system is defined to be the center of the subject's right shoulder socket. Since the data from the wrist tracking sensors is normalized to this origin, it is relatively easy to determine which reference point region the sensor is in at a given moment. Reference point regions are defined to correspond to the locations of the sensors when gestures are executed. Also, by negating the right/left axis value for the center point, a symmetric set of regions is defined for the left hand sensor. Two straight forward calibration methods are possible. The first has the subject assume one of the static poses, while the computer records the location sample generated by the tracking devices. This test can be repeated for all reference points concerned. As an alternative, the reference points can be analytically computed based on the length of the arm of the subject. This approach would model the arm as being composed of two segments connected by a joint, and subsequently, the desired reference points and their regions can be calculated with simple geometry. The recognition of static poses is accomplished by defining reference points and regions which map to the static poses of the gesture set. The algorithm checks which reference point regions the right and left wrist sensors are in, and then compares it to the computer's database. When the position data maps back to the definition of a pose, and the sensors are not removed from gesture unique regions for a critical amount of time, the static gesture is recognized. The recognition of dynamic gestures uses a similar technique, except the sensors are tracked from one region to another, within a critical amount of time, and the sequence of reference points must match a predetermined sequence defining a specific dynamic gesture. A simple state machine is defined such that it will arrive at a terminal state at the end of any defined gesture sequence. This state machine also considers the amount of time spent in each region and resets to its initial state at a predefined time limit. Two gestures (2-43 STAGGERED_COLUMN and 2-45 HERRINGBONE) require the coordination of the right and left hands. For these gestures to be identified, both the right hand sequence and the left hand sequence must arrive at their final states within a critical time interval. One of the issues encountered during the design of this approach was the question of how to define the reference points and regions. Should each point a sensor may reach be in a defined region? What region shape selection might be most effective? Should regions be allowed to overlap? For our initial implementation, a sparse set of rectangular regions was used. A null region was defined to exist outside all other defined regions. Rectangular regions were used because they were simple to implement, and performed effectively for most of the gestures. Gestures for which the rectangular regions did not work well included 2-40 COIL and 2-30 ASSEMBLE. These gestures are very similar and also exhibit greater variation over repeated executions then the other gestures. Reference point regions were allowed to overlap slightly if needed, but the search order was controllable, so preference could be given to specific regions over others. Unfortunately, even this allowance could not completely correct for variations of gesture executions and the noise inherent in the tracker data for every case. Results and Evaluation ---------------------- At the time of writing for this report, a subset of gestures was selected for implementation in the template matching system described above. The gestures selected were: S/D Hands 2-33 INCREASE_SPEED D R 2-34 QUICK_TIME D R 2-36 TAKE_COVER D R 2-37 WEDGE S B 2-38 VEE S B 2-39 LINE S B 2-41 ECHELON_LEFT S B 2-42 ECHELON_RIGHT S B 2-43 STAGGERED_COLUMN D B 2-44 COLUMN D R 2-45 HERRINGBONE D B S=Static, D=Dynamic, R=Right, B=Both Left and Right. Eight reference points were required for gesture discrimination, and are shown in Figure A (figa.gif located in the Jove/Gestures directory). The dynamic definition for gesture 2-36 (TAKE_COVER) is F -> C - >D -> E for example. For the development and testing of the template matching reference point logic, an IBM RISC system, employing C++ code, was used. Recordings were made of the data generated by the tracking devices during the performance of the gestures. Two sets of recordings were made at different times. The system was set up to execute and recognize gestures in real-time, as well as to execute gesture sequences from previously recorded data. Some time was spent in tuning the location and size of reference point regions after the recording of data. The results of testing were encouraging. There were 3 executions of each of 5 static gestures with 1 miss for a successful recognition rate of 93%. There were 6 executions of each of 6 dynamic gestures with 7 misses for a successful recognition rate of 80%. There were also 4 identifications of the wrong dynamic gesture (none for the static). The 3 executions of gesture 2-45 (HERRINGBONE) (all from the second recording period) were identified as gesture 2-36 (TAKE_COVER), and one execution of gesture 2-43 (STAGGERED_COLUMN) was identified as gesture 2- 45(HERRINGBONE). The test sample set was small, supporting further testing and development on this algorithm. Two areas of difficulty deserve greater attention: (1) the maintaining of consistent placement of the tracking sensors, especially the back-mounted sensor, and (2) reduction of the noise inherent to the data, in coordination with a low sampling rate. The tracking devices are capable of generating data at 120Hz (40Hz each for 3), but several factors reduce the effective bandwidth realized. Additionally, feedback from the recognition system to the subject regarding intermediate results contributed to a higher recognition rate. Trajectory Matching Approach ============================ Concept ------- The central idea of the trajectory matching algorithm is to view dynamic gestures as space curves. A space curve is uniquely determined by a starting point, an initial direction, and the curvature and torsion functions, which describe its shape in a position and orientation-invariant manner. Like the template-based approach, the trajectory matching algorithm requires the use of reference points to identify the starting and ending positions of a gesture. The intermediate points of a gesture are, however, not matched against reference points. Instead a representation of the shape of the trajectory, based on its curvature and torsion, is computed, and matched against the shapes of known gestures. Curvature and torsion: Curvature describes the variation of the tangent of a curve. If one takes two infinitesimally close points on a curve, and constructs the tangent vectors to the curve at these points, then the angle between these two tangent vectors is called the "turn". The curvature is defined as the turn divided by the arc length, (i.e., the distance of the two points measured along the curve). Since a straight line has the same tangent direction at every point, it follows that it has zero curvature everywhere .A circle is the geometrical object which has constant curvature. The radius of the circle is given by the inverse of its curvature. The two tangent vectors that help define the curvature define a plane, called the "osculating plane", in which lies the "osculating circle". This circle has a radius of 1/K, and has second order contact with the curve,(i.e., it touches it and has the same tangent direction). Just as the tangent vector represents the first order approximation to the curve, the osculating circle is a second order approximation. In order to describe non-planar curves, an additional parameter is required to describe the deviation of a curve from its osculating plane. Similar to the definition of the "turn" angle, the "dihedral" angle is defined as the angle between the osculating planes of two infinitesimally close points on a curve. The torsion is defined as the dihedral angle divided by the arc length. A planar curve has the same osculating plane for each point on the curve, and therefore has zero torsion. Curvature and torsion are only defined for curves that are respectively twice and three times differentiable. Many gestures, however, have "turning points" at which the corresponding space curve does not meet these requirements. This problem can be handled by segmenting the curve into pieces that are sufficiently smooth. In practice, of course, we only have a discrete sampling of the space curve, and the samples are affected by measurement noise. The trajectory matching algorithm is, therefore, preceded by a preprocessing phase in which the trajectory is segmented and smoothed out to reduce this measurement noise. Motivation What is gained by representing the gesture trajectories in terms of curvature and torsion? Some useful classes of curves have a simple representation: planar curves have zero torsion, circles have zero torsion and constant curvature, and straight lines have zero torsion and zero curvature. If one takes a look at our gesture set, one notices that all dynamic gestures are defined by planar trajectories, and that many of them are made up of circular arcs or straight lines. In particular, the circular arcs are generated by rotations around the shoulder (gestures 2-29 (DISPERSE), 2-30(ASSEMBLE), 2- 34(QUICK_TIME), 2-36(TAKE_COVER), 2-40 (COIL), and 2- 44(COLUMN)) or the elbow (gestures 2-31, 2-44(COLUMN), 2- 45(HERRINGBONE)), or a combination of both (gestures 2- 29(DISPERSE), 2-44(COLUMN)). The actual execution of each gesture is preceded by a preparative, or approaching, phase, and a followed by a retractive phase. During both of these phases, the trajectory is not contained in the plane of the gesture, so we can expect the torsion to be non-zero just before, as well as just after the gesture. Algorithm The trajectory matching algorithm consists of five stages. They all run concurrently, analyzing the data from the lower level as soon as it is made available. 1) smoothing The data samples are modified to reduce false peaks of high frequency. This has to be done without smoothing too much over sharp discontinuities that could correspond, say, to transitions from a resting state to motion. 2) segmentation The trajectory is segmented into pieces at "turning points", and at points where the torsion drops to zero or departs from zero. The speed of the motion is zero at a turning point, therefore this criterion can be used to detect such points in the trajectory. 3) circular arc and line recognition For each segment, three hypotheses are tested: that the segment is a circular arc, that it is a line segment, or that it is neither (the null hypothesis). During the course of a segment, the following values are estimated: The mean curvature, the mean direction of the axis of rotation, the mean direction of the axis of translation Since the axis of rotation points in the direction of the normal to the plane in which the circular arc lies, it can be used, together with the curvature, to estimate the position of the center of this circle. In order to determine whether these estimates are representative, the variances of curvature, direction of rotation axis, and directions of translation axes are computed. Based on this information, the most likely of the three hypotheses is selected in stage 5. 4) mapping to user-relative axes and reference points Once a circular arc or a line segment has been detected, its starting point and end point and the center of rotation are mapped to reference points defined relative to the user. A translation axis is mapped to one of the following six movement directions: Up, Down, Right, Left, Forward, Backward A rotation axis is mapped onto the same set of axes. The corres- pondence between these axes and some rotations is shown in the following table (for the right arm): Up <--> horiz. front to right Down <--> horiz. right to front Right <--> vert. front up Left <--> vert. front down Forward <--> vert. side down Backward<--> vert. side up 5) recognition of gesture segments The dynamic gestures in our gesture set are comprised of one or two sub-gestures. Each one of these sub-gestures is defined in terms of the following attributes : start and end point motion type : translation or rotation in case of translation direction of axis of translation (Up, Down, ...) in case of rotation direction of axis of rotation (Up, Down, ...) center of rotation radius of circular arc angle traversed A specification of the ranges of admissible variation of the values of these attributes forms the definition of a sub-gesture. The recognition algorithm selects the sub-gesture whose speci- fication best matches the segment description extracted in the previous stages. 6) recognition of entire gestures Most of the gestures are defined as repetitions of a small number of primitive gesture segments, possibly synchronized between left and right arms. In this last stage, the algorithm brings into correspondence sub-gestures of left and right arm, and groups together over time consecutive segments that correspond to entire gestures. Implementation Smoothing of each data point was done based only on local information. Only the positions of the immediately preceding and following point were used. It consisted in moving the central point closer to the midpoint of the line segment between its right and left neighbors. The amount by which the central point was moved was proportional to two factors. The first one was determined by the angle formed at the central point, and was bigger for acute angles. The second one was based on the ratio of the distances to the left and right point from the central point. Its effect was to prevent sharp discontinuities from disappearing, a necessity for further segmentation. In the segmentation stage, only the detection of turning points was used. The zero-torsion criterion for segmentation turned out to be too unreliable. This is because the torsion is third order information about the curve, and numerical differentiation is known to be unstable for higher order derivatives. Because this criterion couldn't be used, gestures 2-30(ASSEMBLE) and 2-40(COIL) could not be recognized. A very simple method was used for the detection of turning points. It was based on the size of the displacements between consecutive data samples. A new segment is started at transitions between below- threshold to above-threshold displacements, and the current segment is terminated at transitions in the opposite sense. The implementation covered stages 1 through 5. Stage 6 could be implemented fairly easily. The individual segments also have starting and ending time attached to them, so synchronization between left arm and right arm sub-gestures can be implemented by some simple range checks. Since our gestures have at most two sub-gestures each, grouping over time can be achieved by a simple comparison. Results ------- Among the 11 dynamic gestures in the gesture set, 7 were selected for testing. They are : 2-31 JOIN_FOLLOW 2-33 INCREASE_SPEED 2-34 QUICK_TIME 2-36 TAKE_COVER 2-43 STAGGERED_COLUMN 2-44 COLUMN 2-45 HERRINGBONE The four remaining ones were rejected because of unclear definition (2-29 DISPERSE), dependence on the unreliable measurement of torsion (2-30 ASSEMBLE, 2-40 COIL), or very small motion (2-32 FIX_BAYONETS). The same set of recordings as for the template based method were used. Six sets of recordings were available for each gesture. All of them were run through stages 1 through 5 of the trajectory matching algorithm. Therefore, it was the performance of the recognition of sub- gestures that was measured, not of the recognition of entire two-hand gestures. The data was first run through a normalization program in which a coordinate transformation was applied to the trajectories of left and right arm. This corrected for the rotation of the body with respect to the world coordinate system, and moved the origin to the right shoulder for the right hand measurements and the left shoulder for the left hand measurements. The normalized trajectories were finally split into different files. For the results reported here, only data from the right hand trajectories was used. In order to determine whether the smoothing stage was effective, or necessary at all, the test suite was run twice : once without smoothing and once with smoothing. Each entry in the following tables is therefore a pair "raw" and "smooth," where "raw" is the percentage of successful recognition with the original data, and "smooth" is the figure for smoothed data. Another word about the data : gestures 2-31 and 2-36 only contained one occurrence of a sub-gesture. The recognition rate is therefore based on only 6 occurrences total. For the other five gestures, four occurrences per recording were available, giving a total of 24 occurrences. The first table shows the recognition rates by sub-gesture : Gesture 2-31 2-33 2-34 2-36 2-43 2-44 2-45 Total --------------------------------------------------------------------- 100/100 100/100 88/96 67/100 100/100 79/88 100/96 92/96 Smoothing is effective in most cases, but can sometimes lead to less reliable results, as it happened in one instance of gesture 2-45. Overall, the trajectory matching algorithm achieved a satisfactory recognition rate of 96 %. However, most of recognition errors appear to be idiosyncratic. A closer analysis of the results at each stage might help uncover the causes of these errors. Evaluation ---------- The results show that representation of a gesture trajectory in terms of its curvature can be used to distinguish circular arcs from straight lines, and that both can, to some degree, be told apart from other motions. The system is able to recognize the following gestures fairly reliably : 2-31(JOIN_FOLLOW), 2-33 (INCREASE_SPEED), 2-34 (QUICK_TIME), 2-43 (STAGGERED_COLUMN), 2-44 (COLUMN), and 2-45 (HERRINGBONE). On the other hand, we weren't able to recognize gestures requiring torsion information, or the detection of changes in curvature. For these gestures, the simple model of zero torsion and constant curvature is not adequate, because it requires the center of rotation to be stationary for the duration of the gesture. This is clearly too strong a restriction for many gestures, as it is for the gestures 2-30 (ASSEMBLE), 2-31 (JOIN_FOLLOW), 2-40 (COIL), and 2-44 (COLUMN) of our gesture set. The shortcomings of the chosen representation are due to the locality of the representation. Curvature and torsion are descriptions of the differential geometry of curves (i.e., they deal with its local variations), and are thus not able to describe more global properties. In addition, the torsion information, and to some degree also the curvature information, computed by our implementation was unstable due to the small size of the neighborhood (only the two closest neighbors) used. Lastly, our turning point detection method is very simplistic. The threshold used to segment the trajectory depends on the sampling rate of the sensors. A more robust detection could be achieved if it were based on the speed of the motion, and not just the distance between two consecutive samples. Comparison and Analysis ======================= Most current research in gesture recognition uses neural-net or fuzzy logic algorithms. Neural network algorithms generally try to set a generic learning algorithm that begins by recognizing all data as only one gesture. By "training" the neural network, repeating both correct and incorrect versions of the gesture and providing feedback as to how well the network predicted the gesture, the researchers allow the network to find distinctions between a gesture that should be recognized as the target gesture and one that should not. As the network learns the first gesture, the researchers then add gestures to the network's data set until the network matches all the desired gestures. Fuzzy logic algorithms use the mathematical concept of fuzzy numbers. For example, the number 6 is more like the number 5 than the number 10. Therefore, in this example 6 is a fuzzy 5 (or at least a fuzzier 5). Fuzzy logic is basically a mathematical codifying of inequalities. Fuzzy logic algorithms would recognize gestures in much the same way as template matching. The code would contain logic such as: IF the right arm is 20 degrees (fuzzy number) above horizontal and etc., then this is gesture number seven. The user would raise their right arm approximately twenty degrees above the horizon to satisfy the equation. This is unlike the template matching scheme in that fuzzy logic does not necessarily describe whether 22 degrees is good enough to recognize gesture number seven. That depends on the context of the above equation. Template matching, however, does not use numbers whose meaning may change. Twenty degrees always means 20 degrees, never more, never less. Neural networks and fuzzy logic show potential in many diverse research areas; however, the less complicated method of matching motion or position is apt to perform more effectively. A template matching algorithm performs gesture recognition tasks as well (or nearly as well) as any other known method. Of the existing studies and projects published in the field of gesture recognition, the Glove- Talk Pilot Study [Fels 90] most parallels the Army Gesture Recognition Study (this JOVE project)in several key respects. First, the Glove-Talk Pilot Study contains only 203 hand gestures. Although this is much more than the Army Gesture Recognition Study, it is one of the smaller data sets in its field. Most natural gesture sets contain more than 500 gestures (i.e., American Sign Language). Second, the Glove-Talk hand gestures are static and distinct. Unlike hand gestures such as pointing at an object, waving, or other gestures where the orientation of the hand may change the meaning of the gesture, the Glove-Talk gesture set contains gestures that represent commands, instructions, warnings, etc. (like the command TAKE_COVER, 2-36.) Third, the Glove-Talk obtained data using a VPL data glove. Larger, better funded studies generally use hardware with many more sensors than exist in the VPL data glove. The Glove-Talk data bandwidth, therefore, is similar, though larger, to the Army Gesture Recognition data bandwidth. --------------------------------------------------------------------- |The Glove-Talk Pilot Study | The Army Gesture Recognition | | Pilot Study -----------|----------------------------|---------------------------- Hardware | VPL DataGlove | 3 Polhemus Tracking Devices | | Software | Multi-Layer Neural Net | Template/Trajectory Matching | | Data Set | 203 hand gestures | 5 static arm positions and | | 6 dynamic arm gestures | | Generality| Individual User | Multiple Users | | % Correct | 92% | 82% (93% static) (78% dynamic) | | % No Guess| 7% | 14% ( 7% static) (11% dynamic) | | % Wrong | 1% | 8% ( 0% static) (11% dynamic) | | (all wrong guesses were related | | to gesture 2-45, HERRINGBONE) --------------------------------------------------------------------- -Comparison of Neural-Net solution to Template/Trajectory Matching solution. As the above chart demonstrates, our pilot study did comparably well at recognizing static gestures as the Glove-Talk Pilot Study. Further, the Glove-Talk Pilot Study used several layers of networked analysis to find it's solution in near-real time where as the Army Gesture Recognition Pilot Study uses a one-pass analysis, making it faster. Also, The Glove-Talk system requires training of the network with gestures from a specific individual who may then use the system. The templates in the Army Gesture algorithms are static and general enough for many users without the need of a training period. The Glove-Talk system does, however, contains almost 19 times as many gestures and can be more easily adapted to a different set of recognizable gestures. Conclusions =========== Preliminary implementations of this recognition system have provided encouraging results using the three Polhemus 3-D tracking devices. The template and trajectory algorithms have recognized several of the target gestures quite well. Further integration of the template matching and trajectory matching methods into a unified recognition system appears to be a worthwhile approach to pursue. These two simple algorithms complement each other, together providing a potentially effective recognition of both static and dynamic gestures. A recognition system using these two interpreting techniques promises to be fast, accurate and efficient. A simple tracking system based on the investigated system could be eventually developed for field use in the virtual coordination of future troop operations. As these are preliminary results, further development of this gesture recognition system will require an ongoing effort. Appendices ========== Appendix files on ftp site directory vrtp/Jove/Gestures/*.* A: Gesture Set (mp1a.tif, mp2a.tif, mp2b.tif, mp3a.tif mp3b.tif, mp4a.tif) B: Sensor Position Data (record*.tar, records.fmt) C: Programs Used For Data Analysis/Manipulation (test.c, gesture.c, gesture.h) Acknowledgements/Bibliography ============================= We would like to thank Dr. Ben Shneiderman for inspiring this report's construction, coordinated virtually in its entirety from four remote sites. We would also like to express our appreciation to Jack Hsu, Cheryl Eslinger, Bruce Chih-lung Lin, and Cindy Tonnesen for their efforts in reviewing this paper. [Baudel 93] Baudel, T. and Beaudouin-Lafon, M. (1993 July)CHARADE: Remote Control of Objects using Free-Hand Gestures.Communications of the ACM. 36(7): 28-35. [Murakami 91] Murakami, K. and Taguchi, H. Gesture recognition using recurrent neural networks. Proceedings of Human Factors in Computing Systems (CHI'91), ACM Press, 1991, pp. 237-242. [Bolt 84] Bolt, R. Put-that-there: Voice and gesture at the graphics interface. "Compute. Graph. 14,3" (July 1980), In Proceedings of ACM SIGGRAPH, 1980. [Fels 90] Fels, S. Sidney, and Hinton, Geoffrey E., Building Adaptive Interfaces with Neural Networks: The Glove-Talk Pilot Study. Proceedings of IFIP INTERACT'90: Human-Computer Interaction, 1990, pp 683-688. [Weber 90] Weber, Gerhard. FINGER -- A Language for GESTURE RECOGNITION. Proceedings of IFIP INTERACT'90: Human- Computer Interaction, 1990, pp 689-694. [Hauptmann 89] Hauptmann, Alexander G., Speech and Gestures for Graphic Image Manipulation. Proceedings of ACM CHI'89 Conference on Human Factors in Computing Systems, 1989, pp241- 245. [Weimer 89] Weimer, David and Ganapathy, S. K., A Synthetic Visual Environment with Hand Gesturing and Voice Input. Proceedings of ACM CHI'89 Conference on Human Factors in Computing Systems, 1989, pp 235-240. [Goldberg 91] Goldberg, David and Goodisman, Aaron. Stylus User Interfaces for Manipulating Text. Proceedings of the ACMSIGGRAPH Symposium on User Interface Software and Technology, 1991, pp 127- 135. [Higgins 84] Higgins, C. A. and Whitrow, R., On-Line Cursive Script RECOGNITION. Proceedings of IFIP INTERACT'84: Human- Computer Interaction, 1984, pp 139-143. [Thiel 91] Thiel, David D., The Cue Ball as Part of a Gestural Interface. Proceedings of ACM CHI'91 Conference on Human Factors in Computing Systems, 1991, pp 463. [Pittman 91] Pittman, James A., Recognizing Handwritten Text. Proceedings of ACM CHI'91 Conference on Human Factors in Computing Systems, 1991, pp 271-275.