From: waleed@cse.unsw.edu.au (Mohammed Waleed Kadous)
Subject: Re: APPS: Apple Newton, Disabilities and Sign-language Recognition.
Date: 9 Apr 1997 01:50:08 GMT
Message-ID: <5ieskg$765$1@mirv.unsw.edu.au>
Organization: University of New South Wales


Andrew Plumb (Tekmage@io.com) wrote:

: Text-to-Speech.  This part is easy.  When Dragon Systems gets their
: Speech-to-Text applications working on the Newton Platform, all of a sudden
: a deaf person could (again in theory) carry on a "normal" conversation on a
: telephone... or with a blind person, though for this a simple IrDA-equipped
: Braille keypad could close the loop.

I really don't think that's as easy as you make it sound. There are *very*
significant practical problems:

1. Both text-to-speech and speech-to-text are very CPU and memory
intensive (although if you only want poor quality TTS then it's a little
easier). There are problems running these systems on desktop machines,
you're going to try to do it on a Newton?

2. Dragon Systems' speech recognition system doesn't just work out of the
box with anyone you care to mention. Usually, you need to give it some
initial samples (about three hours) and its performance improves with
time. 

: Who is familiar with the dynamics of sign-language?  I'm not, but I have
: an idea to toss out there.  General Reality Company 
: <http://www.genreality.com/> sells data-gloves, normally used in VR 
: environments. 

No offence, but it's not a new idea at all. James Kramer's PhD thesis in
1991 discussed the idea of using gloves to recognise sign language. I have
a Web page devoted to researchers who are looking at just this and similar
issues (URL: http://www.cse.unsw.edu.au/~waleed/gsl-rec), although some of
them are using different ways of capturing data (e.g. normal cameras). 

A couple of years I looked at using PowerGloves to recognise Australian
Sign language signs (just discrete signs, not continuous signing) with
moderate success (if you've ever used a PowerGlove, you'll understand why)
of about 80 per cent recognition with 95 signs. 

Thad Starner has also done more extensive work at the MIT Media Lab along
similar but more advanced signs. He was recognising about 40 signs in the
context of gramatically constrained sentences and was getting about 90 per
cent accuracy (per sign) using video cameras. 

Right now, I'm using the 5th Gloves you mention above, but using Ascension
Flock-of-Birds trackers to provide 6D position/orientation info. BTW, the
5th glove only provides 2D orientation information (roll and pitch, no
yaw) and it would be near impossible to recognise signs using this info
alone. 

: The latest incarnation communicates over a regular serial
: (RS-232) port and tracks 3D glove pointing direction (not position).
: The MP2000 has two serial ports available on the interconnect port, thus
: able to support both a left and right glove right out of the box.  What
: would it take to implement even simple gesture recognition/sign-language
: translation on the Newton? 

I don't think you can. A very small notebook, perhaps, but I just don't
think the Newton has the grunt or memory required. If you have a large set
of gestures (say 100), then (i) you need an algorithm to recognise the
gestures which can run fast enough on a newton (ii) you need the data
required by the algorithm to be stored in memory (whether it be templates,
HMM weights, whatever). 

In addition, people think of sign language as a signed version of English.
It's not. It's a language of its own. Translating sign languages to
spoken languages makes translating from spoken languages to spoken
languages (say, Russian to English) look positively easy. In either case,
both are unsolved problems in artificial intelligence. 

Plus there are practical problems as well. The gloves need 9V supplies and
my guess is that they would kill the Newton batteries in a couple of
hours. In addition, you don't have position tracking either, which is
pretty much necessary for reliable recognition. 

In short, for now, forget about it. I don't mean to rain on your parade,
but current off-the-shelf technology just can't do it. But there are
people working on it. In particular ASEL (Applied Science and Engineering
Labs) at the University of Delaware, are looking into it. I wouldn't
expect anything before, even being very optimistic, five or six years. 

One last point: Has anyone asked the Deaf if they're really interested in
this sort of technology? My enquiries indicate that the Deaf have mixed
feelings about these technologies; especially with regards to the cost. So
maybe there's no market for them. Virtual Technologies have had a working
fingerspelling recognition system for more than a year now, but it
certainly hasn't taken over the world (well, maybe that has something to
do with the fact that to get a working system is something like
US$11,000). 

Regards,


Waleed.




+----------------------------------------------------------------------------+
|>  Waleed Kadous. RAVE Lab, AI Dept, Computer Science & Engineering, UNSW  <|
|>  e-mail: waleed@cse.unsw.edu.au URL: http://www.cse.unsw.edu.au/~waleed  <|
+----------------------------------------------------------------------------+
