代写留学生论文:手为中心的人体运动项目研究

发布时间:2011-05-18 09:23:48 论文编辑:第一代写网

Hand Centered Studies of Human Movement Project
Technical Report 96-1
Hand Gestures for HCI

Research on human movement behaviour reviewed in the context of hand =entred input.

Prepared by: Axel =ulder, School of Kinesiology, Simon Fraser University, February 1996

Acknowledgement: This work was supported in part by a strategic grant =rom the Natural Sciences and Engineering Research Council of Canada.
?Copyright 1996 Simon Fraser University. All rights reserved.
Contents
Summary=

Classifyin= Hand Movement

Examples
Definit=ons
Verbal Synonyms
Cla=sifications
Semiotic =and Movements
Ergotic =and Movements
Gestu=al Communication
Evolution=20 of Human Communication
Gestura= Linguistics
Gesti=ulation
Sign Language
Ergotic =s Semiotic and Emotions
Handedne=s
Other = modalities
Hand Gestures =or HCI
Def=nitions
Prior Art
Hand Gesture Interface Design
Standard =and Gestures
Appendix =: Verbal Synonyms for Hand Movements
References=20 (you can also search for more =eferences or .. add your references !)

Summary

This paper focusses on the design issues involved in implementing =uman computer communication by means of full hand movements, i.e. based on =and position and shape.

A variety of terms for describing and defining hand movements are =xamined, to gain a better insight into possible ways for classification. Hand =ovements can be grouped according to function as semiotic, ergotic or =pistemic (Cadoz, 1994). Semiotic hand movements can be classified as iconic, =etaphoric, deictic, beat-like (McNeill, 1992) and, according to their =I>linguisticity, as gesticulation, language-like, pantomime, emblematic or as sign =anguage (Kendon, 1988). Human communication comes in many modalities. =hey include speech, gestures, facial and bodily expressions which appear to implement in close cooperation parts or all of the aspects of the =expression, such as temporo-spatial, visual, structural and emotional =spects. Thus, human communication is not only symbolic. Emotional aspects of an expression modulate other aspects of the expression.

Research efforts in the design of gestural interfaces and other types =f input devices which capture hand shape and position are reviewed. The incorporation of research results from human movement behaviour as =isted above is gradually taking place, although there are still tendencies to ignore =he importance of these findings. Difficulties in the design and development =f gestural interfaces are discussed taking some of these findings into =ccount. Issues discussed include hand movement tracking needs, context =etection, gesture segmentation, feature description and gesture identification. =he identification of a method to define a set of standard gestures =s addressed.

Classifying Hand Movement

This paper is a reflection of an ongoing effort to examine results of =research into human communication through movement to benefit the design =nd development of computer interfaces that more adequately capture such =orms of human communication. Human communication comes in many =I>modalities, including speech, gestures, facial and bodily expressions. A variety of forms of expression, such as poetry, sign language, mimicry, =usic and dance, exploit specific capacities of one or more of these modalities. =his paper focusses on the design issues involved in implementing human =omputer communication by means of hand movements into human computer =nteraction.

Examples

To refresh the mind let us look at a random list of examples of hand movements:

praying (two flat hands up together)
begging (flat hand)
expressing anger (raising a fist)
derogation (middle finger up)
accusation (index pointing)
live or die decisions in the Roman amphitheater (thumb up/down)
hitch hiking (thumb up, hand moving sideways)
legal and business transactions (handshake, judge hammering)
waving and saluting
counting (fingers and/or hand)
pointing to real and abstract objects and concepts (index, hand)
conducting of an orchestra (variety of both gestures with arms and =ody)
traffic control of cars and airplanes (hands flat pointing or =oving)
shaping of imagined objects (hands tracing out curves and shapes)
martial arts, fighting (variety of movements of arms and body)
dance (Balinese dancing)
gesturing by singers (hand and body movements)
stock exchange operations (various hand shapes)
affective gestures (hand touching)
rejective (index up moving left & right) / appreciative (hand clapping) gestures
game playing (hand signs to communicate with partner in card =ames)
game scoring (cricket, basketball, soccer, rugby, football)
dinnertable actions (commanding waiter to refill wine glass)
positioning of real (remote or close) and abstract objects
control panel operations (mousing, steering a vehicle)
moving, touching and interacting with objects
silent and non-verbal communication (shrugging, holding one's own =arlobe, scratching)
"italianate" gestures (two hands open shaking) *
mimicry and pantomime (actions and objects are depicted with =and/body movements)
sign language (a complete linguistic communication system) =/LI>Definitions

代写留学生论文A brief discussion of the word gesture and its possible meanings is appropriate. Gesture has been used in place of posture and vice versa. =he tendency however, is to see gesture as dynamic and posture as static. In =rosaic and poetic literature, gesture is often used to mean an initiation or =onclusion of some human interaction, where no human movement may be involved. The =otion of a musical gesture without actual human movement is quite common. =bviously, musical expression is intimately connected with human movement, hence =he existence of such idiom.

In this paper, a hand gesture and hand movement are both defined as =he motions of fingers, hands and arms. Hand posture is defined as the =osition of the hand and fingers at one instant in time. However, hand posture and =esture describe situations where hands are used as a means to communicate to =ither machine or human. Empty-handed gestures and free-hand gestures are =enerally used to indicate use of the hands for communication purposes without =hysical manipulation of any object.

The motivation for these definitions will become apparent in the =ourse of this paper, while slight nuances will also be added.

Verbal Synonyms

Spoken english language has over the ages incorporated a number of expressions and words that signify hand actions and gestures. The mere =act that these words and expressions exist indicates that /the goals, not =ecessarily the corresponding hand actions and gestures, they identify are common in =aily life. McNeill (1992) pointed out that gestures are not equivalent to speech, =ut that gestures and speech complement each other (this will be further =iscussed below). In other words, the speech modality may have developed such that =ertain communication can only be expressed using gestures. Consequently, the =vailable verbal language may not represent a number of common and/or important =estures. In Appendix A a list of words describing hand movements is given. The =ist can be divided in a number of groups:

goal directed manipulation
changing position of an object
changing orientation of an object
changing shape of an object
contact with the object
joining objects
indirect manipulation (ie. via other objects)
empty-handed gestures
haptic exploration (cf. Lederman & Klatzky, 1987)

Classifications

It is almost immediately clear from the above discussions that hand =ovements can be divided into two major groups, one involving communication (such =s empty handed gestures), the other involving manipulation and prehension. =omewhat inbetween lie the hand movements identified as haptic exploration =ctions. Similar considerations must have led Cadoz (1994) to classify hand =ovements according to their function:

semiotic: to communicate meaningful information and results from =hared cultural experience
ergotic: is associated with the notion of work and the capacity of =umans to manipulate the physical world, create artefacts
epistemic: allows humans to learn from the environment through =actile experience or haptic exploration
All three functions may be =ugmented using an instrument, e.g. handkerchief for a semiotic good-bye movement, = pair of scissors for the ergotic cutting movement, a stick for an epistemic =oking movement. For the purpose of this paper we can by definition substitute =estures for semiotic hand movements. Hand actions commonly identified as =rehension are a subset of ergotic hand movements.
Similarly, Thieffry, in Malek et al (1981) classifies hand movements =s:

transitive
intransitive
Transitive hand movements are part of an uninterrupted sequence of interconnected structured hand movements that =re adapted in time and space, with the aim of completing a program, such as =prehension. These hand movements could be equally classified as Cadoz's =rgotic hand movements.
代写留学生论文Intransitive hand movements or gestures have a universal language =alue especially for the expression of affective and aesthetic ideas. Such =estures can be indicative, exhortative, imperative, rejective a.o.. The gesture =lone expresses fully the intention and motivation of its author. These =estures could be equally classified as Cadoz's semiotic hand movements.

We can further classify Cadoz's semiotic hand movements or gestures =nd ergotic hand movements.

 

Semiotic Hand Movements

Many researchers consider gestures, or semiotic hand movements, as =ntimately connected with speech and some conclude that speech is complementary to =esture.

McNeill (1992) compares his classification scheme with a number of =ther researchers (Efron, Freedman, Hoffman, Ekman and Friesen) and concludes =hat all are using very similar categories. He classifies gestures as follows:

iconics: gestures depicting a concrete object or event and bearing = close formal relationship to the semantic content of speech
metaphorics: as iconics but depicting an abstract idea
deictics: gestures pointing to something or somebody either =oncrete or abstract
beats: gestures with only two phases (up/down, in/out) indexing =he word or phrase it accompanies as being significant
Any of these =estures can be cohesive gestures or gestures that tie together thematically =elated but temporally separated parts of the discourse.
Kendon (1988) classifies gestures along a continuum, discussed more =n depth below:

1. gesticulation: idiosyncratic spontaneous movements of the hands =nd arms during speech
2. language-like gestures: like gesticulation, =ut grammatically integrated in the utterance
3. pantomime: gestures =ithout speech used in theater to communicate a story
4. emblems: =italianate" gestures (e.g. insults and praises)
5. sign language: a set of =estures and postures for a full fledged linguistic communication system =/UL>Nespoulos & Lecours (1986) take a more detailed approach and suggest a three level =cheme for classification:
arbitrariness: classifies gestures with respect to their =niversality
arbitrary gestures: uncommon gestures, which need to be learned
mimetic gestures: more common gestures, within a culture
deictic gestures: as McNeill's deictics
specific: pointing to a particular object
generic: pointing to a whole class of objects
function indication: pointing to an object and implying an =ction
function: classifies gestures with respect to their use
quasilinguistic expression: gestures in absence of any verbal =ehaviour
coverbal expression: gestures in presence of verbal behaviour
illustrative: depict concepts which are verbally expressed
expressive: expressing emotions during verbal behaviour
paraverbal: emphasizing verbal elements within a discourse =/LI>
social interaction: constitute pragmatic elements of interaction = strategies
phatic: involving gestural and/or visual activity
regulatory: to indicate attention to others, for example =/LI>
metacommunication: modulating the speaker's own verbal behaviour =e.g. negation)
extracommunication: gestures without semiotic value =/LI>
This approach seems of limited use due to its lack of clear distinctions. =hile it summarizes a number of gestural behaviours it does not suggest some form =f underlying structure or model for the processes involving gestural =xpression. Nevertheless, the concept of arbitrariness of gestures is duly noted. It =efers to the fact that gestures may be somewhat formalized and generally =ecognizable by others, but such formalization exists only within a culture. There is =o such thing as an universal gesture. Kendon's continuum brings forward an =pparent principle, that of different levels of linguisticity of gestures. =McNeill's classification is clearly recognizable and just as Kendon, =mphasizes the strong connection between speech and gesture.


Ergotic Hand Movements
It seems likely that the oldest purpose of our hands is to manipulate =he physical world, such that it better suited our needs. In terms of =bjects of a size of the order of our hands we can change the object's position, =rientation and shape. Objects can be solid, fluid or gaseous. Therefore ergotic =and movements can be classified according to physical characteristics:

object type: solid, fluid, gaseous
change effectuated: position, orientation. shape
how many hands are involved: one or two
indirection level: direct manipulation or through another object =r tool
The value of such a classification is limited due to the =act that no reference is made to the task at hand, although the indirection level =ears some relation to the notion of a task. Basically, the model of ergotic =andmovements suggested by this classification omits the importance of cognitive =rocesses in such hand movements.
It is more common to classify ergotic hand movements according to =heir function, ie. as either prehensile or non-prehensile. Non-prehensile =ovements include pushing, lifting, tapping and punching. Mackenzie (1994) defines =prehension as the application of functionally effective forces by the =and to an object for a task, given numerous constraints. While various =axonomies exist, one readily recognizable classification scheme (Napier, 1993) =dentifies a prehensile movement as either a:

precision grip
power grip
hook grip
scissor grip
The type of grip used in any given activity =s a function of the activity itself and does not depend on the shape or size =f the object to be gripped. Although in extreme cases this does not always =old. While this classification relates to the musculo-skeletal properties of the =and, notably opposition, it incorporates the notion of a task, such actions =equiring precision or power. However, neither the scissor grip nor the hook grip =an be related in a similar way to the notion of a task. These grips merely =efer to a frequently used hand movement. The classification is therefore somewhat ambiguous.
Pressing (1991) lists some more ways to classify ergotic hand =ovements:


1. use of control effect: modulation (parametric change), selection (discrete change), or excitation (input energy)
2. use of kinetic =mages: scrape, slide, ruffle, crunch, glide, caress etc.
3. use of spatial = trajectory: up, down, left, right, in, out, circular, sinusoidal, =piral, etc..
Classification 1 is based on a control task taxonomy and =s, for specific purposes, useful. Classification 2 puts the emphasis on the =bserver's point of view and extracts the semiotic function of the hand movement, =lthough the movement may be purely ergotic from the executer's point of view. It =demonstrates that the semiotic function of hand movements is always =resent. Classification 3 omits hand shape as a parameter and has limitations =imilar to the first classification discussed above.


--------------------------------------------------------------------------------


Gestural Communication

 

Evolution of Human Communication

Kimura (1993) and others have pointed out that there is evidence that =(hand)gestures preceded speech in the evolution of communication systems =mongst hominids.This finding supports the modeling of gesture and speech as =orms of expression generated by a system where formalized linguistic =epresentation is not the main form from which gestures are derived. Instead, it is =onjectured by McNeill (1992) that gestures and speech are an integrated form of =xpression of utterances where speech and gestures are complementary.

 

Linguistic Aspects of Gesture

Many have investigated the relation between human gestures and =peech. Kendon (1988) ordered gestures of varying nature along a continuum of =linguisticity":

Gesticulation - Language-like gestures - Pantomimes - Emblems - Sign languages

Observe that while going from gesticulation to sign languages:

the obligatory presence of speech declines
the presence of language properties increases
idiosyncratic gestures are replaced by socially regulated signs =/LI>
In other words, the formalized, linguistic component of the expression =resent in speech is replaced by signs going from gesticulation to sign languages. =his supports the idea that gesture and speech are generated by one integral =ystem as suggested above.
In an effort to further define the underlying structure of gestures, =cNeill, Levy and Pedelty (1990) propose a diagram (based upon Kendon's work) =hat clarifies the relations between the units at each level of the speaker's =gestural discourse. Each unit consists of one or more of the units of a =higher numbered) level:


1. consistent arm use and body posture
2. consistent head movement
3. gesture unit
4. gesture phrase
5. preparation, =ptional hold, stroke, optional hold, retraction
From this structure it =an be seen that quite often the word gesture is used to identify the stroke. =erhaps this simplification is due to the fact that most of the linguistic aspects of =he expression are communicated in the stroke. In a similar vein, it should =e noted that a hand posture often involves a preparation and retraction phase =cf. the OK sign). Therefore the definition of a hand posture as solely the =osition of hand and fingers at one particular point in time is somewhat misleading. =Sequences of postures occur with other hand movements in between.


Gesticulation

McNeill (1992) concluded that there is no body "language", but that =nstead gestures complement spoken language. In Kendon's (1980) words: the =hrases of gesticulation that co-occur with speech are not to be thought of either =s mere embellishments of expression or as by-products of the speech process. =hey are rather, an alternate manifestation of the process by which ideas are =ncoded into patterns of behaviour which can be apprehended by others as =eportive of those ideas. Such hand movements voluntarily but also involuntarily =onvey extra information, besides speech, about the internal mental processes =f the speaker. Obviously, McNeill is concerned with gestures similar to =esticulation as defined in Kendon's continuum. McNeill supports his conclusion above =y finding that gesticulation-type gestures have the following =on-linguistic properties:

the meaning of a gesture is determined by its global appearance =ie. the gesture as a whole) and synthetic qualities (ie. gesture segments only =onvey meaning when they appear together as one single gesture)
gestures are non-combinatoric, ie. do not form larger, =ierarchically structured gestures
they are context-sensitive, ie. different gestures may refer to =he same entity and they may address only the salient and relevant aspects of =he context
no standards of form, ie. different speakers display the same =eaning in idosyncratic ways
timing of gestures is synchronized with speech; they anticipate =peech in their preparation phase and synchronize with it in the stroke phase
Due to the lack of linguistic features of gesticulation such =estures can not be analysed with the tools developed for studying spoken =anguage, and caution must be taken when summarizing the pragmatics, semantics and =yntax of such gestures. McNeill's arguments for supporting the hypothesis that =esture and spoken language are a single system based on the following findings: =
gesticulation occurs only during speech
gesticulation and speech are semantically and pragmatically =o-expressive
gesticulation and speech are synchronous
gesticulation and speech develop together in children
gesticulation and speech break down together in aphasia

Sign Language

Examples of sign languages are the American Sign Language (ASL) and =he Deaf and Dumb Language. ASL is an amalgam with French Sign Language. Other =ystems of formally coded hand and arm signals are pidgin, or creole language, and = gesture language used by the women of the Warlpiri, and aborigine people =iving in the north central Australian desert.

In ASL, the prevalent form of signing consists of unilateral or =ilateral series of movements usually involving the whole arm. Typically, a =articular hand shape is moved through a pattern in a location specified with =espect to the body. Each sign roughly corresponds to a concept such as a thing or =n event, but there is not necessarily an exact equivalence with English =ords or with words of any spoken language. A native sign language like ASL is =herefore quite different from a manual depiction of spoken language, such as =igned english, or co-verbal gesticulation.

A manual sign is claimed to be distinguished from other signs by 4 =eatures (Stokoe, 1980):

hand posture
hand/arm orientation with respect to the body
location on the body (2D collapsed view of the signer as seen by =n observer)
movement
Sign languages exhibit language-like properties =ince they are used by people who have to communicate symbolically encoded =essages without the use of the speech channel. They exhibit the following =anguage-like properties (McNeill 1992):
segmentation and combination, i.e. meaning complexes are either =nalyzed into segments or constructed by combining segments
lexicon formation, i.e. segments recur in the same form in =ifferent contexts
syntax including paradigmatic oppositions, ie. combinations of =egments adhere to standard patterns, including organization into contrasting =ets

distinctiveness, i.e. details are added to the form of segments =olely to distinguish segments from other segments
arbitrariness, i.e. segments are used to refer to entities and =vents in contexts where their iconicity is ruled out
standards of well formedness, i.e. signs and or combinations of =igns are held to standards of form
utterance-like timing, i.e. segments are produced with a timing =hat reveals them to be the final temporal stage of the process of =tterance construction, rather than the utterance's primitive form
sociolinguistically embedded, i.e. a community exists that =nderstands the signs and sign combinations without perpetual metalinguistic =xplanation
In signlanguages meaning can be modulated, e.g. the =ddition of emotional expression, by varying the following parameters (personal communication with sign language instructors and Klima & Bellugi, =979):
speed of gesturing
size of gesture space
facial expressions
number of repetitions (not more than 4 or 5) or duration
tension of gesturing
hold-time of a posture while signing

Ergotic versus Semiotic and Emotions

The relation between the semiotic and ergotic function of hand =ovements, how they differ, their possible (mutual) dependencies in terms of =euro-motor systems have barely been researched. Kimura (1993) suggests that manual =raxis is essential for signing, however manual praxis and signing are not =dentical. This suggest a model where gestural communication is a higher level in =he hierarchy of systems involved in the creation of hand movements. As =iscussed in the section on ergotic hand movements, it appears that the semiotic =unction is always present, whether consciously intended or not. This can be =xplained by remembering that for communication a sender and receiver are needed. The =receiver can always decide to interpret signal that the sender =nintentionally has sent. Emotion functions in human communication as a means for modulating the semiotic content, such as emphasis. The expression =f emotions during ergotic movements adds some semiotic content to the =ctions so that the hand movements are differently interpreted by an observer. This =procedure may be used, either by the sender (by amplifying the emotional =content, such as the dropping of items to draw attention to an issue or = problem) or the receiver (by amplifying the focus on emotional content, =uch as the initial remarks in conversation as in you seem rather tense today =.. anything wrong ?) to initiate a communication with more semiotic =ontent, usually of verbal nature. Perhaps emotions could therefore be seen as a =ridge between ergotic and semiotic movements.

 

Handedness

As far as cortico-spinal systems are concerned, arm, hand and =ingermovements are controlled contralaterally, while arm and shoulder movements may =lso be controlled ipsilaterally, ie. proximal movements can be controlled =ontra- as well as ipsilaterally, while distal movements are only controlled contralaterally. The left hemisphere is specialized for complex movement =programming, ie. manual praxis. Consequently, movements involving =dentical commands to the two limbs, ie. motor commands resulting in mirror image movements, whether they are temporally coinciding or not, are more =requent in natural gestural communication. In contrast, different, simultaneously =xpressed hand postures occur frequently due to the more disparate control of the =istal musculature. The fact that the manual praxis system is left hemisphere =ased is also thought to be the origin of the prevalent right-handedness and is =upported by findings susch as the preference of signers to use the right hand if =nly hand can be used and the correlation between sign language aphasia, =anual apraxia and left hemisphere damage. Although the left hemisphere is =ssential for selecting movements, the left hand is thought to be better in =xecuting independent finger movements than the right hand (Kimura, 1993).

Since the human brain exhibits different capabilities for the right =nd a left hemispere, differences can be expected between the capabilities of =he left and right hand. This is not as noticeable at the physical level, i.e. in =rgotic hand movements, but more apparent in communicating with the hands. Our =eft hand, it can be speculated, may perform better in expressing holistic =oncepts and dealing with spatial features of the environment, while our right =and may perform better in communicating symbolic information.

Napier (1993) discusses handedness and suggests that the dominance of =ight handedness (in the order of 90% are right handed) gradually evolved from = slight left handedness in nonhuman primates, perhaps under the influence =f social and cultural pressures not so much due to capabilities of the =ight hand which would supersede capabilities of the left hand. While an exception, =umans have been known to be almost perfectly ambidextrous.

 

Other Modalities

From the above, a clear relation can be seen between gestures and =peech as well as body posture. Napier (1993) explicitly includes facial =xpressions and bodily movements when examining the use of gestures as:

accompaniment to normal speech
a substitute for a foreign language
a substitute for normal speech
a substitute where normal speech becomes inaudible, =isadvantageous or dangerous
an accompaniment to certain professional activities, e.g. by =ctors, dancers, and political speakers, to supplement or to replace the =poken word
It is well known that sign language can be almost entirely =eplaced by facial expressions. In musical conducting the integral body posture and =ynamics are by many considered an integral part of the expression of the =onductor. A more formalized conducting method (Saito, 199?) prescribes that only =estures of the upper body are allowed.
Human communication consists of a number of perceptually =istinguishable channels which operate through different modalities. It appears that a =ingle system underlies this communication which can direct aspects of the =xpression through one modality while using other modalities for other aspects of =he expression. Each modality has its intrinsic limitations due to =onstraints of musculo-skeletal and neuro-motor nature for example. In co-verbal =esticulation for instance, these aspects are the structured content of the expression =s present in linguistic forms. Both hand gestures and speech can be used =o express such linguistic forms. Further research is needed to establish =ore detail in these aspects of human communication and how they are =istributed amongst the various modalities and under which conditions. It can be =asily observed that, in humans with speaking ability, speech is the most =fficient channel for structural aspects (relating concepts and identifying how) =f expression, while temporo-spatial and visual representation aspects (identification of where, when, which and what) are more easily conveyed =hrough hand gestures. Perhaps posture, facial expression as well as gestures =re best applied for communication of emotional aspects (modulation of concepts). =

Hand Gestures for HCI

Definitions

In the HCI literature the word gesture has been used to identify many =ypes of hand movements for control of computer processes. Perhaps to avoid =onfusion Sturman (1993) defines whole hand input as the full and direct =se of the hand's capabilities for the control of computer-mediated tasks, =hereby making a more precise indication of which type of human movement is =nvolved (i.e. not just positioning but also hand shape) as well as for which =urpose they are applied. By using the word input the association is made with information theory, conceiving of hands as devices which output =nformation to be received and interpreted by another device. Although the link is made =ith the semiotic function of hand movements, the hand is really a =ommunication device, i.e. it can both receive and send information. The limitation to =capabilities of the hand only however, excludes the suggestion that =emiotic hand movements should be considered part of an integrated system for =uman expression. In fact capabilities of the hand strongly suggests a =ocus on musculo-skeletal capabilities since those are literally the capabilities =f the hand. As such the definition oversees the neuro-motor systems and =ognitive abilities involved in hand movements. Somewhat more appropriate, hand =entred input, as a shorthand for human computer interaction involving hand movements, emphasizes the context as created by other modalities. =owever, it is not specific as to the type of hand movements; gesturing with a mouse, empty-handed gestures or movements with a joystick are all included.

None of these terms address the distinction between semiotic and =rgotic hand movements, let alone refer to the existence of epistemic hand movements. =n the following the focus will be on the use of hand movements to their full =xtent, ie. as few limitations as permitted by the current state of the art =ovement tracking technology will be imposed on the hand movements, in =omputer-mediated tasks. Such applications of hand movements will be called whole hand =entred input. To include a reference to epistemic hand movements, =I>input could be replaced by communication. A hand gesture =nterface will mean an HCI system employing whole hand centred input which specifically =exploits the semiotic function of hand movements. Certain applications =ay however not exploit the full capabilities of the hand. Mouse gesturing =nd joystick controlling amongst others will not be examined.

 

Prior Art
The following applications that mainly exploit the ergotic function =f hand movements have been found in the literature:

Sculpturing and design of 3D surfaces (Kramer, 1995)
Generic object manipulation (Bergamasco, 1994)
Robot control (Brooks, 1988; Sturman, 1993; Katkere et al, 1994)
Virtual panel control (Augustine Su et al, 1994)
Molecule manipulation (Brooks, 1988)
3D interaction in (scientific) visualization (Bryson, 19??; =rooks, 1988)
Financial data manipulation (Feiner & Beshers, 1990)
Musical instrument performance (Machover & Chung, 1989; =ulder, 1994)
Hand impairment measurement (Zimmerman, 1987)
Other =ossible applications not found in the literature include sound design, stage =ound mixing and game playing.
The following applications that mainly exploit the semiotic function =f hand movements have been found in the literature:

Navigation (Bolt, 19?? and many others)
Articulated figure animation (Sturman, 1993)
Music conducting (Machover & Chung, 1989; Morita et al, 1991)
Sign language interpretation (Kramer, 1995; Fels, 1994; Starner =995)
Robot control (Pook, 1995; Papper, 1993; Katkere et al, 1994)
Audiovisual device control (Baudel, 1993)
CAD design (Harrison, 1993)
Other possible applications =ot found in the literature include airplane and other traffic control, game =coring and playing as welll as stock trading and legal transactions (Hibbits, =995).
Often applications only implement the use of hand signs or postures. =n the case of Pook (1995), the use of hand signs which indicate to the robot =o execute a movement pattern to fulfill a task, is questionable since such = mapping could be more easily implemented using simple function keys on a =keyboard. The value of the approach is that it allows the user to act =ore naturally since no cognitive effort is required in mapping function keys =o robotic hand actions. Applications involving navigation essentially =mplement deictic gestures. The applications involving signlanguage are =echnically impressive, but, since they basically implement a nicely formalized =ystem of human communication of use for a relatively small (but important) group =f people only, they do not provide us with much clues as to the =nterpretation, definition or modeling of gestures in computer-mediated tasks.

The research into the use of hand gestures with speech (or speech =ith gestures) has gained special attention. Cavazza (1995), Wexelblatt =1995), Hauptmann & McAvinney (1993), Sparrel (1993), Cassell et al (1994) =ll implement gestural communication theory as developed by McNeill, Kendon, =Birdwhistell and others. Sparrel (1993) implemented a system for =nterpretation co-verbal iconic gestures as defined by McNeill (see above).

Multimodal interaction, involving not only hand gestures and speech, =ut also facial expressions and body posture is another distinguishable subject researched by Gao (1995), Maggioni (1995), Hataoka et al (1995) and Bohm =1995) amongst others. These research efforts are generally rather poor from a =uman behaviour researcher's point of view, since they simply make a system =hich is able to detect each modality, which information is then checked against =ach other, so that each modality is basically interpreted as communicating =he same information. Little or no effort is made to integrate the system's =bilities using a somewhat sophisticated model of human expression, e.g. where =ach modality is deemed to express specific aspects, which cannot simply be =sed to confirm the correct interpretation of another modality, of the =xpression such as discussed above.

 

Hand Gesture Interface Design

Using the above classification of and investigation into hand =ovements, we can now proceed with evaluating and analysing how hand movements have =een used in processes mediated through computers. Such an evaluation is of =nterest since the following problems in the design and implementation of whole hand =entred input applications have remained unsolved:

tracking needs: for most applications it is still not known what =hese needs are (e.g. how much accuracy is needed and how much latency is acceptable) when measuring hand motions
occlusion: since fingers can be easily hidden by other bodyparts, =ensors have to be placed nearby in order to capture their movements =ccurately

size of fingers: fingers are relatively small, so that sensors, if =hey are placed nearby, need to be small too, or, if sensors are placed =emotely, need a lot of detail
context detection: at what level of abstraction should the =ovement data be interpreted, how to detect whether a hand movement is semiotic or =rgotic
gesture segmentation: how to detect when a gesture starts/ends, =ow to detect the difference between a (dynamic) gesture, where the path of =ach limb is relevant, and a (static) posture, where only one particular =osition of the limbs is relevant
feature description: which features optimally distinguish the =ariety of gestures and postures from each other and make recognition of similar =estures and postures simpler
gesture identification: what do certain gestures mean, how can =hey be reliably interpreted so that the correct actions are undertaken =/LI>
In general, many applications have been technology driven, and not based on =knowledge of human behaviour and/or a proper task analysis. In many =ases there is no consideration given to the different functions of hand movements, particular the semiotic and ergotic functions.
From the many aspects discussed in the aforegoing some basic comments =an be made for better design of hand gesture interfaces:

a better analysis of the functionality (ergotic, semiotic, or =oth) that is to be exploited of the hand movements will help to establish the =inimal tracking needs; if only the semiotic function is exploited, =equirements for the resolution may be much less than necessary for ergotic functions, =hich quite often require the ability of the hand to perform accurate =ontinuously changing positioning movements, whereas, for example hand postures may =equire only the detection of a few discrete positions
hand gestures must be interpreted in the context of other =odalities of expression; the monitoring of activity of each modality will provide =aluable information for the detection of the context, ie. which functions are exploited of the hand movements, e.g. the mere presence of speech =ombined with absence of prehension points to expression in the form of =o-verbal gesticulation
the inclusion of the speech channel in the gestural analysis will =implify gesture segmentation and feature detection, since these processes can =hen address the fact that hand gestures contain a varying amount of =inguistic features depending on the presence of speech, e.g. the implementation =f the various levels of gesturing during discourse as initially proposed by =endon, may provide a method for approaching the segmentation problem in cases =f co-verbal gesticulation
hand gestures must be interpreted in the context of the culture of =he gesturing person; gesture segmentation and feature detection will be simplified since there are less possible alternatives for the =nterpretation of the gestures
If the emotional aspect in movement is defined as the deviation =rom normal movement patterns (compared against either personal, =ocial or cultural standards), such as larger amplitudes, abnormal accelerations =nd velocities, this information could be used to detect context switching =etween ergotic and semiotic functions of hand movements; fels (1993) has used = acceleration for segment detection

Standard Hand Gestures

Many applications can be critisized for their idiosyncratic choice of =and gestures or postures to control or direct the computer-mediated task =Baudel, Harrison). However, the choice was probably perfectly natural for the =eveloper of the application ! This shows the dependence of gestures on their =ultural and social environment. For specialized, frequent tasks, where the learning =f a particular set of gestures and postures is worth the investment, such applications may have a value. In everyday life, however, it is quite =nlikely that users will be interested in a device for which they have to learn =ome specific set of gestures and postures, unless there is an obvious =ncrease in efficiency or ease of use over existing methods of hand centred input in =he adoption of such a gestural protocol. On the other hand, the economics =f the marketplace may dictate such a set independent of its compatibility with =existing cultural and/or social standards, just like the keyboard and =ouse have set a standard. Especially when users are allowed to expand or create =heir own sets such a protocol may gain some acceptance.

The problem in defining a possible standard for a gesture set is that =t is very easy to let the graphical user interface dictate which type of hand =movements are required to complete certain computer-mediated tasks. =owever, the computer can be programmed to present tasks in a variety of graphical =nd auditory ways. The aim is to make the computer and its peripherals =ransparent, meaning that the tasks are presented in a way that execution of these =asks is most natural. Unfortunately the concept of naturalness may apply =or ergotic hand movements but semiotic hand movements prohibit such =aturalness for all of humankind, unless the system knows which culture the user belongs =o.

Zimmerman, in a personal communication, suggested only seven =arameters are needed to describe hand movements for the bulk of the applications: hand =position and orientation and hand grip aperture, where the last =arameter could be just binary, ie. open/close. Such a standard seems to put a lot of =mphasis on the ergotic function only of hand movements and will severly restrict =ccess to the semiotic function. Augustine Su (1994) suggests touching, =ointing and gripping as a minimal set of gestures that need to be distinguished. At =east they include a reference to the tasks involved, but they do not make any =reference to the semiotic function of hand movements. An analysis of =ign language is needed to extract the very basic gestures and postures that =inimize the amount of learning required of the user. An initial parametrization =s proposed by Stokoe (1980), as discussed above. Interestingly, his 4 =eatures are mostly body-centred, which would suggest that for the semiotic function = body-centred coordinate system is more appropriate when analysing =estures. On the contrary, for ergotic functions, a world-based coordinate system =eems more obvious.

These suggestions for standards may be combined if some way is found =o recognize the relevance of either the ergotic or semiotic function or =oth during a specific hand movement. Perhaps the analysis of emotional =ehaviour can provide clues. For a gesture set to gain major acceptance in the market =lace, it is advisable to examine the tasks and semiotic functions most =requently executed and then choose a hand gesture set that seems to appear =atural, at least to a number of different people within a social group or even a =ulture, when executing those tasks and functions. Simple market economics will =hen do the rest.

References

Augustine Su , S. and Richard Furuta (1993). A Logical Hand Device in =irtual Environments Virtual Reality Software & Technology: Proceedings of =he ACM VRST'94 Conference edited by Gurminder Singh, Steven K. Feiner, & =aniel Thalmann pages 33-42, World Scientific Publishing Co., Singapore, 1994 (Singapore, August 23-26, 1994)

Augustine Su , S. and Richard Furuta (1994). A Specification of 3D Manipulations in Virtual Environments ISMCR'94: Topical Workshop on =irtual Reality Proceedings of the Fourth International Symposium on Measurement =nd Control in Robotics pages 64-68, NASA Conference Publication 10163, =ovember 1994 (Houston, Texas, November 30 - December 3, 1994)

Augustine Su, S. (1993). Hand Modeling in Virtual Environment. Master =Scholarly Paper (No presentation) Department of Computer Science, =niversity of Maryland, College Park, Maryland, 1993

Augustine Su, S. and Richard Furuta (1993). The Virtual Panel =rchitecture: A 3D Gesture Framework Proceedings of the 1993 IEEE Virtual Reality Annual =International Symposium (VRAIS'93) pages 387-393, IEEE, 1993 (Seattle, Washington, September 18-22, 1993)

Augustine Su, S., Furuta, R. (1994). A logical hand device in virtual =environments. Conference proceedings VRST 94. Available through =nonymous ftp.

Baudel, T., Beaudoin-Lafon, M. (1993). Charade: Remote control of =bjects using free-hand gestures. Communications of the ACM 36(7) p29-35.

Baudel, Thomas (1991). Sp閏ificit閟 de l'interaction gestuelle =ans un environnement multimodal IHM'91, p. 11-16, 1991

Baudel, Thomas (1994). A Mark-Based Interaction Paradigm for =ree-Hand Drawing ACM-SIGGRAPH & SIGCHI, Proc. ACM Symposium on User Interface =Software and Technology (UIST), 1994

Baudel, Thomas and Annelies Braffort (1993). Reconnaissance de gestes =e la main en environnement re鑕l EC2, L'interface des mondes r鑕ls et =irtuels, Montpellier, France.

Baudel, Thomas and Beaudouin-Lafon, Michel and Annelies Braffort and =aniel Teil (1992). An Interaction Model Designed for Hand gesture Input. =echnical report no. 772, LRI, Universit?de Paris-Sud. Available through =nonymous ftp.

Baudel, Thomas and Yacine Bellik and Jean Caelen and Chatty, =t閜hane and Joelle Coutaz and Francis Jambon and Solange Karsenty and Daniel Teil =1993). Syst萴es d'analyse des interactions Homme-Ordinateur IHM'93.

Bergamasco, M. (1994). Manipulation and exploration of virtual =bjects. Magnenat-Thalmann, N., Thalmann, D., Artificial life and virtual =eality. Wiley.

Bohm, K.; V. Kuehn, J. Zedler (1995). Multimodal interaction in =irtual environments. Position paper for the workshop gesture at the user =nterface, CHI 95, Denver, CO, USA, May 1995.

Bohm, K.; Vaananen, K. (1993). Gesture driven interaction as a human =actor in virtual environments - an approach with neural networks. In: =arnshaw, R. (ed) Virtual reality systems, p 93-107. London, UK: Academic press.

Brooks, F. (1988). Grasping reality through illusion: Interactive =raphics serving science. Proceedings CHI '88 Conference - Human Factors in =omputing Systems, p 1-11. New York, USA: ACM.

Cadoz, C. (1994). Les realites virtuelles. Dominos, Flammarion.

Cassell, J.; M. Steedman, N.I. Badler, C. Pelachaud, M. Stone, B. =ouville, S. Prevost, B. Achorn (1994). "Modeling the Interaction between Speech =nd Gesture", Proceedings of the 16th Annual Conference of the Cognitive =cience Society, Georgia Institute of Technology, Atlanta, USA.

Cavazza, M. (1995). Integrated semantic processing of speech and =estures. Position paper for the workshop gesture at the user interface, CHI 95, =enver, CO, USA, May 1995.

Coutaz, J., Crowley, J. (1995). Interpreting human gesture with =omputer vision. Position paper for the workshop gesture at the user interface, =HI 95, Denver, CO, USA, May 1995.

Feiner, S.; Beshers, C. (1990). Worlds within worlds: metaphors for =xploring n-dimensional virtual worlds. Proceedings User Interface Software and =echnology '90, p 76-83. New York, USA: ACM.

Fels, S. Sidney (1990). Building adaptive interfaces with neural =etworks: the glove-talk pilot study. Technical report CRG-TRG-90-1. University of =Toronto, Toronto, Canada.

Fels, S. Sidney; Hinton, Geoffrey E. (1990). Building Adaptive =nterfaces with Neural Networks: the Glove-Talk Pilot Study Human Computer =nteraction - INTERACT '90, D. Diaper et al (editors), IFIP, pp 683-688. Elsevier =cience Publishers Amsterdam, NL

Fels, S.S., (1994). Glove-Talk II: Mapping hand gestures to speech =sing neural networks - An approach to building adaptive interfaces. PhD =hesis, University of Toronto, Toronto, Canada ftp cs.toronto.edu in pub/ssfels/phdthesis.short.ps.Z

Fels, S.S., Hinton, G.E., (1993). Glove-Talk: a neural network =nterface between a data-glove and a speech synthesizer. IEEE Transactions on =eural Networks, 4 (1): 2-8.

Fels, S.S., Hinton, G.E., (1994). Glove-Talk II: Mapping hand =estures to speech using neural networks. Proceedings of the Conference on Neural Information Processing Systems (NIPS).

Gao, W. (1995). Enhancement of human-computer interaction by hand =esture recognition. Position paper for the workshop gesture at the user =nterface, CHI 95, Denver, CO, USA, May 1995.

Gao, W. (??) On human body language understanding. ???

Gao, W., Brooks, R. (??). Hand gesture recognition for enhanced human =computer interaction. ???

Harrison, D., Jaques, M., Strickland, P. (1993). Design by =anufacture simulation using a glove input. In: Warwick, K., Gray, J. and Roberts, =., Virtual reality in engineering. UK: The institution of electrical =ngineers.

Hataoka, N., Ando, H. (1995). Prototype development of multimodal =nterfaces using speech and pointing gestures. Position paper for the workshop =esture at the user interface, CHI 95, Denver, CO, USA, May 1995.

Hauptmann, A.G.; McAvinney, P. (1993). Gestures with speech for =raphic manipulation. International Journal of Man-Machine Studies Vol: 38 Iss: = p. 231-49.

Hibbits, B.J. (1995). (no title) Position paper for the workshop =esture at the user interface, CHI 95, Denver, CO, USA, May 1995.

Katkere, Arun, Hunter, Edward, Kuramura, Don, Schlenzig, Jennifer, =oezzi, Saied, Jain, Ramesh . ROBOGEST: Telepresence using Hand Gestures. =echnical Report VCL-94-104, Visual Computing Laboratory, University of =alifornia, San Diego, December 1994. PostScript version.

Kendon, A. (1988). How gestures can become like words. In Potyatos, =. (ed), Crosscultural perspectives in nonverbal communication, p 131-141. =oronto, Canada: Hogrefe.

Kendon, A. (1980). Gesticulation and speech: two aspects of the =rocess of utterance. In: Key, M.R., The relation of verbal and nonverbal =ommunication. The Hague, The Netherlands: Mouton.

Kimura, D. (1993). Neuromotor mechanisms in human communication. =xford, UK: Oxford University Press.

Klima, E. & Bellugi, U. (1979). The signs of language. Cambridge, =A, USA: Harvard university press.

Kramer, J., Leifer, L. (1987). The "Talking Glove": An expressive and =receptive "verbal" communication aid for the deaf, deaf-blind, and =onvocal. In: Murphy, H.J., Proceedings of the third annual conference "computer =echnology / special education / rehabilitation", California state university, =orthridge, October 15-17, 1987, p335-340.

Kramer, J., Leifer, L. (1989). The "Talking Glove": A speaking aid =or nonvocal deaf and deaf-blind individuals. RESNA 12th Annual conference, =ew Orleans, Louisiana, USA, p471-472.

Kramer, J., Leifer, L. (1990). A "talking glove" for nonverbal deaf individuals. Technical report CDR-19900312, Center for design research, =tanford university, CA, USA.

Kramer, J.F. (1995). The CyberGlove and it's many uses as a gestural =nput device. Position paper for the workshop gesture at the user interface, =HI 95, Denver, CO, USA, May 1995.

Lederman, S.J. & Klatzky, R.L. (1987). Hand movements: A window =nto haptic object recognition. Cognitive Psychology, 19, 342-368.

Machover, T. & Chung, J. (1989). Hyperinstruments: Musically =ntelligent and interactive performance and creativity systems. Proceedings =nternational Computer Music Conference, Columbus, Ohio, USA. San Fransisco CA, USA: International Computer Music Association.

Maggioni, C. (1995). Gesture computer. Position paper for the =orkshop gesture at the user interface, CHI 95, Denver, CO, USA, May 1995.

Malek, R., Harrison, S. and Thieffry, S. (1981). Prehension and =estures. In: Tubiana, R., The hand. Philadelphia, USA: Saunders.

McNeil, David; E.T. Levy, L.L. Pedelty (1990). Speech and gesture In: =Cerebral control of speech and limb movements Hammond G.E. (editor) pp =03-256 Elsevier Science Publishers Amsterdam, Nederland.

McNeill, D. (1992). Hand and mind: what gestures reveal about =hought. Chicago, USA: University of chicago press.

Morita, H., Hashimoto, S. & Ohteru, S. (1991). A computer music =ystem that follows a human conductor. IEEE Computer, July, p 44-53.

Mulder, A.G.E. (1994). Virtual Musical Instruments: Accessing the =ound synthesis universe as a performer.

Mulder, A.G.E. (1994). Human Movement Tracking Technology.

Napier, J.R. (1993). Hands. Princeton, N.J.: Princeton University =ress.

Nespoulos, J.L., Roch Lecours, A., (1986). Gestures: nature and =unction. In: Nespoulos, J.L., Perron, P., Roch Lecours, A., The biological =oundations of gestures: motor and semiotic aspects, p 49-62. Hillsdale, New Jersey, =SA: Lawrence Erlbaum Associates.

Papper, M.J., Gigante, M.A. (1993). Using gestures to control a =irtual arm. In: Earnshaw, R., Virtual reality systems. UK: Academic press.

Pook, P.K.; D.H. Ballard (1995). Teleassistance: A gestural sign =anguage for teleoperation. Position paper for the workshop gesture at the user =nterface, CHI 95, Denver, CO, USA, May 1995.

Pressing, J. (1991). Synthesizer performance and real-time =echniques. Madison, WI, USA: A-R editions.

Saito (). The Saito conducting method.

Sparrell, C.J. (1993). Coverbal iconic gesture in human-computer =nteraction. Msc. Thesis, Media arts and sciences, MIT. Available through anonymous =tp.

Starner, T.E. (1995). Visual recognition of american sign language =sing hidden markov models. MSc. thesis, MIT Medialab. Available through =nonymus ftp.

Stokoe, W.C. (1980). Sign language structure. Annual review of =nthropology, 9, 365-390.

Sturman, D.J. (1992). Whole Hand Input. Ph.D. Thesis. [Available via anonymous ftp at media-lab.mit.edu, ./pub/sturman/WholeHandInput]. =ambridge, MA: Massachusetts Institute of Technology.

Sturman, D.J., and Zeltzer, D. (1994). A Survey of Glove-Based Input. =EEE Computer Graphics and Applications, 14 (1) (january), 30-39.

Sturman, D.J., Zeltzer, D. (1993). A design method for "whole-hand =uman computer interaction". ACM transactions on information systems, 11(3), =219-238.

Sturman, D.J., Zeltzer, D., Pieper, S. (1989). Hands-on interaction =ith virtual environments. Proceedings ACM SIGGRAPH symposium on user =nterface software and technology, Williamsburg, VA, USA, 13-15 november 1989, =19-24.

Watson, Richard (1993). A Survey of Gesture Recognition Techniques. =echnical Report TCD-CS-93-11, Department of Computer Science, University of =ublin, Trinity College, July 1993.

Weimer, D., Ganapathy, S.K. (1992). Interaction techniques using hand =tracking and speech recognition. In: Blattner, M.M., Dannenberg, R.B. =eds), Multimedia interface design. New York, NY, USA.

Appendix A: Descriptions of Hand Movements

Goal directed manipulation

Changing position: lift, move, heave, raise, move, translate, =ush, pull, draw, tug, haul, jerk, toss, throw, cast, fling, hurl, pitch, depress, =am, thrust, shake, shove, shift, shuffle, jumble, crank, drag, drop, pick =p, slip
Changing orientation: turn, spin, rotate, revolve, twist
Changing shape: mold, squeeze, pinch, wrench, stretch, extend, =witch, smash, thrash, break, crack, bend, bow, curve, deflect, tweak, cut, =pread, stab, crumble, rumple, crumple up, smooth, fold, wrinkle, wave, =racture, rupture
Contact with the object: grasp, seize, grab, catch, embrace, grip, =ay hold of, hold, snatch, clutch, take, hug, cuddle, hold, cling, =upport, uphold
Joining objects: tie, pinion, nail, sew, button up, shackle, =uckle, hook, rivet, fasten, chain up, bind, attach, stick, fit, tighten, wriggle, =in, wrap, envelop
代写留学生论文Indirect manipulation: whet, set, strop
Empty-handed gestures

twiddle, wave, snap, point, hand over, give, take, urge, show, size, =ount, wring, draw, tickle, fondle, nod, wriggle, shrug

Haptic exploration

touch, stroke, strum, thrum, twang, knock, throb, tickle, strike, =eat, hit, slam, tap, nudge, jog, clink, bump, brush, kick, prick, poke, pat, =lick, rap, whip, hit, slap, struck, caress, pluck, drub, wallop, whop, thwack, rub, =wathe