Title
=====
      Recognition of Signals
               for
Combat Formations and Battle Drills

Credits
=======
  Dan Searles   (searles@vnet.ibm.com)
  * Original JOVE project concept.
  * Initial background information survey.
  * Worked with physical Polhemus tracking devices,
      recorded gesture performances for group.
  * Developed template matching gesture recognition program.
  * Wrote template matching sections of report. 

 Jerry Smith (jsmith@eng.iac.honeywell.com)
  * Researched necessary background information.
  * Researched existing gesture recognition studies and
     selected a suitable study for comparison purposes.
  * Wrote comparisons and analysis sections of report.
 
 Gregory Baratoff      (baratoff@cfar.umd.edu)
  * Developed trajectory matching gesture recognition program
  * Wrote trajectory matching sections of report

  Brian Bohmueller (bomuller@nadc.nadc.navy.mil)
  * Coordinated integration of written report
  * Edited final report
  * Co-wrote introductory and Conclusion sections of report

Introduction
============
A computerized gesture recognition system has been developed 
involving infantry command signals.  These command gestures use 
hand and arm location and movement to coordinate combat formations 
and battle drills.  Tracking data from three subject-mounted, position 
sensors provided a dynamic loci of gestural motion.  Two alternate 
algorithm methods were investigated for their capability to 
discriminate these gesture sequences in "real-time."  A template 
matching algorithm exploited static characteristics of the gesture set, 
while both trajectory matching and region tracking analyzed dynamic 
characteristics to interpret specific gestures.  Ongoing development 
and testing of this system is continuing toward the evolution of an 
effective, final system design. This report provides preliminary data 
and analysis of the current design performance.

[A detailed abstract is provided in Jove/Articles/dsgbjsbb-abstract.html 
of the ftp server.]

Gesture Set Description
=======================
    The seventeen signals initially selected, were chosen from a 
standard Army gesture set used for coordination of combat formations 
and battle drills.  The gestures are executed by using one or both arms 
in either a static pose or moving gesture.  The gestures are based on 
arm and hand location/movement, in contrast with the predominantly 
finger and hand location/movement used in data glove applications.  
Ten of the chosen gestures are dynamic in nature, while the remaining 
seven gestures culminate in a static pose. Each gesture classified as 
static uses both arms and both hands. The dynamic subset of 
gestures defines both single and dual arm and hand signals. 
See Appendix A for detailed descriptions of the gestures.

Applications
============
    The applications for this system fall into two related areas. The 
original intent of signal use was for Infantry field coordination, where a 
soldier would sign a series of gestures to command troops (real or 
automated) in a virtual battle field environment.  Additionally, as an 
instructive tool, the system could be used for the training and testing of 
command infantrymen in the execution of these gesture commands 
within a non-virtual or virtually simualated environment.

Description of Setup
====================
     Tracking data was provided by three Polhemus tracking devices.  
The tracking devices were attached to the posterior of the subject's 
wrists and back, midway between the shoulders.  The sensors reported 
their positions and orientations relative to the Polhemus transmitter 
device located behind the user at approximately shoulder height.  An 
IBM RISC system, using C++, was used for data recording and 
processing purposes during the development and testing of the gesture 
recognition logic.

    The data provided by the sensors was normalized so that the tracking 
data from the right wrist sensor referenced the center of the right
shoulder socket as its origin. Similarly, the left tracker data was normal-
ized to the center of the left shoulder socket.  The back-mounted tracker was 
chosen to locate the origins, due to its stable relative position.  Three 
rotations were possible, one of which needed to be accounted for in the 
calculation of the origins.  This rotation involves the subject's twisting (or 
facing) right or left relative to the transmitter frame of reference.  The 
other two rotations, leaning forward or backward and leaning right or 
left were not required since the gestures are performed in a normal 
standing position. Thus, little if any data interpolation was required in 
calculating the shoulder origins, since a normal standing position was 
inherent in the gesture set.

Template Matching Approach
==========================
Concept
-------
    The central concept behind the template matching algorithm is 
reference points. Reference points are points at the center of spatial 
regions in 3-D space. For this particular system, the regions were 
defined as an x, y, z center point, and three distance values, one for 
each axis. By alternately adding and subtracting the distance values 
along the appropriate axis about the center point, a region in the shape 
of a cube is formed. A possible alternative would be to define a point 
and a radius, and subsequently describe a sphere as the region. The 
origin for the coordinate system is defined to be the center of the 
subject's right shoulder socket. Since the data from the wrist tracking 
sensors is normalized to this origin, it is relatively easy to determine 
which reference point region the sensor is in at a given moment. 
Reference point regions are defined to correspond to the locations of 
the sensors when gestures are executed. Also, by negating the right/left 
axis value for the center point, a symmetric set of regions is defined 
for the left hand sensor.

    Two straight forward calibration methods are possible. The first has 
the subject assume one of the static poses, while the computer records 
the location sample generated by the tracking devices. This test can be 
repeated for all reference points concerned. As an alternative, the 
reference points can be analytically computed based on the length of 
the arm of the subject. This approach would model the arm as being 
composed of two segments connected by a joint, and subsequently, the 
desired reference points and their regions can be calculated with 
simple geometry.

    The recognition of static poses is accomplished by defining 
reference points and regions which map to the static poses of the 
gesture set. The algorithm checks which reference point regions the 
right and left wrist sensors are in, and then compares it to the 
computer's database. When the position data maps back to the 
definition of a pose, and the sensors are not removed from gesture 
unique regions for a critical amount of time, the static gesture is 
recognized.

    The recognition of dynamic gestures uses a similar technique, 
except the sensors are tracked from one region to another, within a 
critical amount of time, and the sequence of reference points must 
match a predetermined sequence defining a specific dynamic gesture.  
A simple state machine is defined such that it will arrive at a terminal 
state at the end of any defined gesture sequence. This state machine 
also considers the amount of time spent in each region and resets to its 
initial state at a predefined time limit. Two gestures (2-43 
STAGGERED_COLUMN and 2-45 HERRINGBONE) require the 
coordination of the right and left hands.  For these gestures to be 
identified, both the right hand sequence and the left hand sequence 
must arrive at their final states within a critical time interval.

    One of the issues encountered during the design of this approach 
was the question of how to define the reference points and regions.  
Should each point a sensor may reach be in a defined region?  What 
region shape selection might be most effective? Should regions 
be allowed to overlap? For our initial implementation, a sparse set of 
rectangular regions was used. A null region was defined to exist 
outside all other defined regions. Rectangular regions were used 
because they were simple to implement, and performed effectively for 
most of the gestures. Gestures for which the rectangular regions did 
not work well included 2-40 COIL and 2-30 ASSEMBLE. These 
gestures are very similar and also exhibit greater variation over 
repeated executions then the other gestures. Reference point regions 
were allowed to overlap slightly if needed, but the search order was 
controllable, so preference could be given to specific regions over 
others. Unfortunately, even this allowance could not completely 
correct for variations of gesture executions and the noise inherent in 
the tracker data for every case.

Results and Evaluation
----------------------
At the time of writing for this report, a subset of gestures was selected 
for implementation in the template matching system described above. 
The gestures selected were:

                             S/D     Hands
   2-33     INCREASE_SPEED    D       R
   2-34     QUICK_TIME        D       R
   2-36     TAKE_COVER        D       R
   2-37     WEDGE             S       B
   2-38     VEE               S       B
   2-39     LINE              S       B
   2-41     ECHELON_LEFT      S       B
   2-42     ECHELON_RIGHT     S       B
   2-43     STAGGERED_COLUMN  D       B
   2-44     COLUMN            D       R
   2-45     HERRINGBONE       D       B

   S=Static, D=Dynamic, R=Right, B=Both Left and Right.

    Eight reference points were required for gesture discrimination, and 
are shown in Figure A (figa.gif located in the Jove/Gestures directory).  
The dynamic definition for gesture 2-36 (TAKE_COVER) is F -> C -
>D -> E for example.

    For the development and testing of the template matching reference 
point logic, an IBM RISC system, employing C++ code, was used. 
Recordings were made of the data generated by the tracking devices 
during the performance of the gestures. Two sets of recordings were 
made at different times. The system was set up to execute and 
recognize gestures in real-time, as well as to execute gesture sequences 
from previously recorded data. Some time was spent in tuning the 
location and size of reference point regions after the recording of data.  
The results of testing were encouraging. There were 3 executions of 
each of 5 static gestures with 1 miss for a successful recognition rate of 
93%.  There were 6 executions of each of 6 dynamic gestures with 7 
misses for a successful recognition rate of 80%.

    There were also 4 identifications of the wrong dynamic gesture 
(none for the static). The 3 executions of gesture 2-45 
(HERRINGBONE) (all from the second recording period) were 
identified as gesture 2-36 (TAKE_COVER), and one execution of 
gesture 2-43 (STAGGERED_COLUMN) was identified as gesture 2-
45(HERRINGBONE).

    The test sample set was small, supporting further testing and 
development on this algorithm.  Two areas of difficulty deserve greater 
attention: (1) the maintaining of consistent placement of the tracking 
sensors, especially the back-mounted sensor, and (2) reduction of the 
noise inherent to the data, in coordination with a low sampling rate. 
The tracking devices are capable of generating data at 120Hz (40Hz 
each for 3), but several factors reduce the effective bandwidth realized.  
Additionally, feedback from the recognition system to the subject 
regarding intermediate results contributed to a higher recognition rate.


Trajectory Matching Approach
============================
Concept
-------
    The central idea of the trajectory matching algorithm is to view 
dynamic gestures as space curves. A space curve is uniquely 
determined by a starting point, an initial direction, and the curvature 
and torsion functions, which describe its shape in a position and 
orientation-invariant manner. Like the template-based approach, the 
trajectory matching algorithm requires the use of reference points to 
identify  the starting and ending positions of a gesture. The 
intermediate points of a gesture are, however, not matched against 
reference points. Instead a representation of the shape of the trajectory, 
based on its curvature and torsion, is computed, and matched against 
the shapes of known gestures.

Curvature and torsion:

Curvature describes the variation of the tangent of a curve. If one takes 
two infinitesimally close points on a curve, and constructs the tangent 
vectors to the curve at these points, then the angle between these two 
tangent vectors is called the "turn". The curvature is defined as the 
turn divided by the arc length, (i.e., the distance of the two points 
measured along the curve). Since a straight line has the same tangent 
direction at every point, it follows that it has zero curvature 
everywhere .A circle is the geometrical object which has constant 
curvature.  The radius of the circle is given by the inverse of its 
curvature. The two tangent vectors that help define the curvature 
define a plane, called the "osculating plane", in which lies the 
"osculating circle". This circle has a radius of 1/K, and has second 
order contact with the curve,(i.e., it touches it and has the same 
tangent direction). Just as the tangent vector represents the first order 
approximation to the curve, the osculating circle is a second order 
approximation. In order to describe non-planar curves, an additional 
parameter is required to describe the deviation of a curve from its 
osculating plane. Similar to the definition of the "turn" angle, the 
"dihedral" angle is defined as the angle between the osculating planes 
of two infinitesimally close points on a curve. The torsion is defined as 
the dihedral angle divided by the arc length. A planar curve has the 
same osculating plane for each point on the curve, and therefore has 
zero torsion.

Curvature and torsion are only defined for curves that are respectively 
twice and three times differentiable.  Many gestures, however, have 
"turning points" at which the corresponding space curve does not meet 
these requirements. This problem can be handled by segmenting the 
curve into pieces that are sufficiently smooth. In practice, of course, we 
only have a discrete sampling of the space curve, and the samples are 
affected by measurement noise. The trajectory matching algorithm is, 
therefore, preceded by a preprocessing phase in which the trajectory is 
segmented and smoothed out to reduce this measurement noise.

Motivation

What is gained by representing the gesture trajectories in terms of 
curvature and torsion? Some useful classes of curves have a simple 
representation: planar curves have zero torsion, circles have zero 
torsion and constant curvature, and straight lines have zero torsion 
and zero curvature. If one takes a look at our gesture set, one notices 
that all dynamic gestures are defined by planar trajectories, and that 
many of them are made up of circular arcs or straight lines. In 
particular, the circular arcs are generated by rotations around the 
shoulder (gestures 2-29 (DISPERSE), 2-30(ASSEMBLE), 2-
34(QUICK_TIME), 2-36(TAKE_COVER), 2-40 (COIL), and 2-
44(COLUMN)) or the elbow (gestures 2-31, 2-44(COLUMN), 2-
45(HERRINGBONE)), or a combination of both (gestures 2-
29(DISPERSE), 2-44(COLUMN)). The actual execution of each 
gesture is preceded by a preparative, or approaching, phase, and a 
followed by a retractive phase. During both of these phases, the 
trajectory is not contained in the plane of the gesture, so we can expect 
the torsion to be non-zero just before, as well as just after the gesture.

Algorithm

The trajectory matching algorithm consists of five stages. They all run 
concurrently, analyzing the data from the lower level as soon as it is 
made available.

1) smoothing

        The data samples are modified to reduce false peaks of high
        frequency. This has to be done without smoothing too much over
        sharp discontinuities that could correspond, say, to transitions
        from a resting state to motion.

2) segmentation

        The trajectory is segmented into pieces at "turning points", and
        at points where the torsion drops to zero or departs from zero.
        The speed of the motion is zero at a turning point, therefore this
        criterion can be used to detect such points in the trajectory.


3) circular arc and line recognition

        For each segment, three hypotheses are tested: that the segment
        is a circular arc, that it is a line segment, or that it is
        neither (the null hypothesis). During the course of a segment, the
        following values are estimated:

                The mean curvature,
                the mean direction of the axis of rotation,
                the mean direction of the axis of translation

        Since the axis of rotation points in the direction of the normal
        to the plane in which the circular arc lies, it can be used,
        together with the curvature, to estimate the position of the
        center of this circle.

        In order to determine whether these estimates are representative,
        the variances of curvature, direction of rotation axis, and
        directions of translation axes are computed. Based on this
        information, the most likely of the three hypotheses is selected
        in stage 5.

4) mapping to user-relative axes and reference points

        Once a circular arc or a line segment has been detected, its
        starting point and end point and the center of rotation are
        mapped to reference points defined relative to the user.
        A translation axis is mapped to one of the following six 
	movement directions:

                Up, Down, Right, Left, Forward, Backward
        A rotation axis is mapped onto the same set of axes. The corres-
        pondence between these axes and some rotations is shown in the
        following table (for the right arm):

                Up      <--> horiz. front to right
                Down    <--> horiz. right to front
                Right   <--> vert.  front up
                Left    <--> vert.  front down
                Forward <--> vert.  side down
                Backward<--> vert.  side up

5) recognition of gesture segments

        The dynamic gestures in our gesture set are comprised of one or
        two sub-gestures. Each one of these sub-gestures is defined in
        terms of the following attributes :

          start and end point

          motion type : translation or rotation

            in case of translation
                direction of axis of translation (Up, Down, ...)

            in case of rotation
                direction of axis of rotation (Up, Down, ...)
                center of rotation
                radius of circular arc
                angle traversed

        A specification of the ranges of admissible variation of the
        values of these attributes forms the definition of a sub-gesture.
        The recognition algorithm selects the sub-gesture whose speci-
        fication best matches the segment description extracted in the
        previous stages.

6) recognition of entire gestures

        Most of the gestures are defined as repetitions of a small number
        of primitive gesture segments, possibly synchronized between left
        and right arms. In this last stage, the algorithm brings into
        correspondence sub-gestures of left and right arm, and groups
        together over time consecutive segments that correspond to entire
        gestures. 

Implementation

Smoothing of each data point was done based only on local 
information. Only the positions of the immediately preceding and 
following point were used. It consisted in moving the central point 
closer to the midpoint of the line segment between its right and left 
neighbors. The amount by which the central point was moved was 
proportional to two factors. The first one was determined by the angle 
formed at the central point, and was bigger for acute angles. The 
second one was based on the ratio of the distances to the left and right 
point from the central point. Its effect was to prevent sharp 
discontinuities from disappearing, a necessity for further 
segmentation.

In the segmentation stage, only the detection of turning points was 
used. The zero-torsion criterion for segmentation turned out to be too 
unreliable. This is because the torsion is third order information about 
the curve, and numerical differentiation is known to be unstable for 
higher order derivatives. Because this criterion couldn't be used, 
gestures 2-30(ASSEMBLE) and 2-40(COIL) could not be recognized.

A very simple method was used for the detection of turning points. It 
was based on the size of the displacements between consecutive data 
samples. A new segment is started at transitions between below-
threshold to above-threshold displacements, and the current segment is 
terminated at transitions in the opposite sense.

The implementation covered stages 1 through 5. Stage 6 could be 
implemented fairly easily. The individual segments also have starting 
and ending time attached to them, so synchronization between left arm 
and right arm sub-gestures can be implemented by some simple range 
checks. Since our gestures have at most two sub-gestures each, 
grouping over time can be achieved by a simple comparison.

Results
-------
Among the 11 dynamic gestures in the gesture set, 7 were selected for 
testing. They are :

        2-31 JOIN_FOLLOW
        2-33 INCREASE_SPEED
        2-34 QUICK_TIME
        2-36 TAKE_COVER
        2-43 STAGGERED_COLUMN
        2-44 COLUMN
        2-45 HERRINGBONE

The four remaining ones were rejected because of unclear definition 
(2-29 DISPERSE), dependence on the unreliable measurement of 
torsion (2-30 ASSEMBLE, 2-40 COIL), or very small motion (2-32 
FIX_BAYONETS).

    The same set of recordings as for the template based method were 
used. Six sets of recordings were available for each gesture. All of 
them were run through stages 1 through 5 of the trajectory matching 
algorithm. Therefore, it was the performance of the recognition of sub-
gestures that was measured, not of the recognition of entire two-hand 
gestures. The data was first run through a normalization program in 
which a coordinate transformation was applied to the trajectories of 
left and right arm. This corrected for the rotation of the body with 
respect to the world coordinate system, and moved the origin to the 
right shoulder for the right hand measurements and the left shoulder 
for the left hand measurements. The normalized trajectories were 
finally split into different files. For the results reported here, only data 
from the right hand trajectories was used. 

In order to determine whether the smoothing stage was effective, or 
necessary at all, the test suite was run twice : once without smoothing 
and once with smoothing. Each entry in the following tables is 
therefore a pair "raw" and "smooth," where "raw" is the percentage of 
successful recognition with the original data, and "smooth" is the 
figure for smoothed data.

Another word about the data : gestures 2-31 and 2-36 only contained 
one occurrence of a sub-gesture. The recognition rate is therefore 
based on only 6 occurrences total. For the other five gestures, four 
occurrences per recording were available, giving a total of 24 
occurrences.

The first table shows the recognition rates by sub-gesture :

Gesture 2-31    2-33    2-34    2-36    2-43    2-44    2-45    Total
---------------------------------------------------------------------
      100/100 100/100  88/96   67/100 100/100  79/88  100/96    92/96

Smoothing is effective in most cases, but can sometimes lead to less 
reliable results, as it happened in one instance of gesture 2-45. 
Overall, the trajectory matching algorithm achieved a satisfactory 
recognition rate of 96 %. However, most of recognition errors appear 
to be idiosyncratic. A closer analysis of the results at each stage might 
help uncover the causes of these errors. 

Evaluation
----------
The results show that representation of a gesture trajectory in terms of 
its curvature can be used to distinguish circular arcs from straight 
lines, and that both can, to some degree, be told apart from other 
motions. The system is able to recognize the following gestures fairly 
reliably : 2-31(JOIN_FOLLOW), 2-33 (INCREASE_SPEED), 2-34 
(QUICK_TIME), 2-43 (STAGGERED_COLUMN), 2-44 (COLUMN), 
and 2-45 (HERRINGBONE). 

On the other hand, we weren't able to recognize gestures requiring 
torsion information, or the detection of changes in curvature. For these 
gestures, the simple model of zero torsion and constant curvature is 
not adequate, because it requires the center of rotation to be stationary 
for the duration of the gesture. This is clearly too strong a restriction 
for many gestures, as it is for the gestures 2-30 (ASSEMBLE), 2-31 
(JOIN_FOLLOW), 2-40 (COIL), and 2-44 (COLUMN) of our gesture 
set. 

The shortcomings of the chosen representation are due to the locality 
of the representation. Curvature and torsion are descriptions of the 
differential geometry of curves (i.e., they deal with its local variations), 
and are thus not able to describe more global properties. In addition, 
the torsion information, and to some degree also the curvature 
information, computed by our implementation was unstable due to the 
small size of the neighborhood (only the two closest neighbors) used.

Lastly, our turning point detection method is very simplistic. The 
threshold used to segment the trajectory depends on the sampling rate 
of the sensors. A more robust detection could be achieved if it were 
based on the speed of the motion, and not just the distance between 
two consecutive samples.

Comparison and Analysis
=======================

    Most current research in gesture recognition uses neural-net or 
fuzzy logic algorithms.  Neural network algorithms generally try to set 
a  generic learning algorithm that begins by recognizing all data as 
only one gesture.  By "training" the neural network, repeating both 
correct and incorrect versions of the gesture and providing feedback as 
to how well the network predicted the gesture, the researchers allow 
the network to find distinctions between a gesture that should be 
recognized as the target gesture and one that should not.  As the 
network learns the first gesture, the researchers then add gestures to 
the network's data set until the network matches all the desired 
gestures. 

    Fuzzy logic algorithms use the mathematical concept of fuzzy 
numbers.  For example, the number 6 is more like the number 5 than 
the number 10.  Therefore, in this example 6 is a fuzzy 5 (or at least a 
fuzzier 5).  Fuzzy logic is basically a mathematical codifying of 
inequalities.  Fuzzy logic algorithms would recognize gestures in 
much the same way as template matching.  The code would contain 
logic such as: 

IF the right arm is 20 degrees (fuzzy number) above horizontal and 
etc., then this is gesture number seven. 

The user would raise their right arm approximately twenty degrees 
above the horizon to satisfy the equation.  This is unlike the template 
matching scheme in that fuzzy logic does not necessarily describe 
whether 22 degrees is good enough to recognize gesture number seven.  
That depends on the context of the above equation.  Template 
matching, however, does not use numbers whose meaning may 
change.  Twenty degrees always means 20 degrees, never more, never 
less. 

    Neural networks and fuzzy logic show potential in many diverse 
research areas; however, the less complicated method of matching 
motion or position is apt to perform more effectively.  A template 
matching algorithm performs gesture recognition tasks as well (or 
nearly as well) as any other known method.  Of the existing studies 
and projects published in the field of gesture recognition, the Glove-
Talk Pilot Study [Fels 90] most parallels the Army Gesture 
Recognition Study (this JOVE project)in several key respects.  First, 
the Glove-Talk Pilot Study contains only 203 hand gestures.  Although 
this is much more than the Army Gesture Recognition Study, it is one 
of the smaller data sets in its field. Most natural gesture sets contain 
more than 500 gestures (i.e., American Sign Language).  Second, the 
Glove-Talk hand gestures are static and distinct.  Unlike hand gestures 
such as pointing at an object, waving, or other gestures where the 
orientation of the hand may change the meaning of the gesture, the 
Glove-Talk gesture set contains gestures that represent commands, 
instructions, warnings, etc. (like the command TAKE_COVER, 2-36.)  
Third, the Glove-Talk obtained data using a VPL data glove.  Larger, 
better funded studies generally use hardware with many more sensors 
than exist in the VPL data glove.  The Glove-Talk data bandwidth, 
therefore, is similar, though larger, to the Army Gesture Recognition 
data bandwidth. 

---------------------------------------------------------------------
           |The Glove-Talk Pilot Study  | The Army Gesture Recognition
           |                            |        Pilot Study
-----------|----------------------------|----------------------------
 Hardware  | VPL DataGlove              | 3 Polhemus Tracking Devices
           |                            |
 Software  | Multi-Layer Neural Net     | Template/Trajectory Matching
           |                            |
 Data Set  | 203 hand gestures          | 5 static arm positions and
           |                            | 6 dynamic arm gestures
           |                            |
 Generality| Individual User            | Multiple Users
           |                            |
 % Correct | 92%                        | 82% (93% static) (78% dynamic)
           |                            |
 % No Guess|  7%                        | 14% ( 7% static) (11% dynamic)
           |                            |
 % Wrong   |  1%                        |  8% ( 0% static) (11% dynamic)
           |                            | (all wrong guesses were related 
           |                            |  to gesture 2-45, HERRINGBONE)
---------------------------------------------------------------------
-Comparison of Neural-Net solution to Template/Trajectory Matching 
solution.
  
    As the above chart demonstrates, our pilot study did comparably 
well at recognizing static gestures as the Glove-Talk Pilot Study.  
Further, the Glove-Talk Pilot Study used several layers of networked 
analysis to find it's solution in near-real time where as the Army 
Gesture Recognition Pilot Study uses a one-pass analysis, making it 
faster.  Also, The Glove-Talk system requires training of the network 
with gestures from a specific individual who may then use the system.  
The templates in the Army Gesture algorithms are static and general 
enough for many users without the need of a training period.  The 
Glove-Talk system does, however, contains almost 19 times as many 
gestures and can be more easily adapted to a different set of 
recognizable gestures. 

Conclusions
===========
    Preliminary implementations of this recognition system have 
provided encouraging results using the three Polhemus 3-D tracking 
devices.  The template and trajectory algorithms have recognized 
several of the target gestures quite well. Further integration of the 
template matching and trajectory matching methods into a unified 
recognition system appears to be a worthwhile approach to pursue.  
These two simple algorithms complement each other, together 
providing a potentially effective recognition of both static and dynamic 
gestures. A recognition system using these two interpreting techniques 
promises to be fast, accurate and efficient. A simple tracking system 
based on the investigated system could be eventually developed for 
field use in the virtual  coordination of  future troop operations.  As 
these are preliminary results, further development of this gesture 
recognition system will require an ongoing effort.

Appendices
==========
Appendix files on ftp site directory vrtp/Jove/Gestures/*.*

A: Gesture Set  (mp1a.tif, mp2a.tif, mp2b.tif, mp3a.tif mp3b.tif, 
mp4a.tif)
B: Sensor Position Data  (record*.tar, records.fmt)
C: Programs Used For Data Analysis/Manipulation  (test.c, gesture.c, 
gesture.h)

Acknowledgements/Bibliography
=============================

We would like to thank Dr. Ben Shneiderman for inspiring this 
report's construction, coordinated virtually in its entirety from four 
remote sites.  We would also like to express our appreciation to Jack 
Hsu, Cheryl Eslinger, Bruce Chih-lung Lin, and Cindy Tonnesen for 
their efforts in reviewing this paper.

[Baudel 93] Baudel, T. and Beaudouin-Lafon, M. (1993 
July)CHARADE: Remote Control of Objects using Free-Hand 
Gestures.Communications of the ACM. 36(7): 28-35.

[Murakami 91] Murakami, K. and Taguchi, H. Gesture recognition 
using recurrent neural networks. Proceedings of Human Factors in 
Computing Systems (CHI'91), ACM Press, 1991, pp. 237-242.

[Bolt 84] Bolt, R. Put-that-there: Voice and gesture at the graphics 
interface. "Compute. Graph. 14,3" (July 1980), In Proceedings of 
ACM SIGGRAPH, 1980.

[Fels 90] Fels, S. Sidney, and Hinton, Geoffrey E., Building Adaptive 
Interfaces with Neural Networks: The Glove-Talk Pilot Study. 
Proceedings of IFIP INTERACT'90: Human-Computer Interaction, 
1990, pp 683-688.

[Weber 90] Weber, Gerhard. FINGER -- A Language for GESTURE 
RECOGNITION. Proceedings of IFIP INTERACT'90: Human-
Computer Interaction, 1990, pp 689-694.

[Hauptmann 89] Hauptmann, Alexander G., Speech and Gestures for 
Graphic Image Manipulation. Proceedings of ACM CHI'89 
Conference on Human Factors in Computing Systems, 1989, pp241-
245.

[Weimer 89] Weimer, David and Ganapathy, S. K., A Synthetic 
Visual Environment with Hand Gesturing and Voice Input. 
Proceedings of ACM CHI'89 Conference on Human Factors in 
Computing Systems, 1989, pp 235-240.

[Goldberg 91] Goldberg, David and Goodisman, Aaron. Stylus User 
Interfaces for Manipulating Text. Proceedings of the ACMSIGGRAPH 
Symposium on User Interface Software and Technology, 1991, pp 127-
135.

[Higgins 84] Higgins, C. A. and Whitrow, R., On-Line Cursive Script 
RECOGNITION. Proceedings of IFIP INTERACT'84: Human-
Computer Interaction, 1984, pp 139-143.

[Thiel 91] Thiel, David D., The Cue Ball as Part of a Gestural 
Interface. Proceedings of ACM CHI'91 Conference on Human Factors 
in Computing Systems, 1991, pp 463.

[Pittman 91] Pittman, James A., Recognizing Handwritten Text. 
Proceedings of ACM CHI'91 Conference on Human Factors in 
Computing Systems, 1991, pp 271-275.