Gesture Modeling and Recognition using Finite State Machines. Hong, P., Turk, M., & Huang, T. S. In IEEE International Conference on Automatic Face and Gesture Recognition, pages 410--415, 2000.
doi  abstract   bibtex   
We propose a state-based approach to gesture learning and recognition. Using spatial clustering and temporal alignment, each gesture is defined to be an ordered sequence of states in spatial-temporal space. The 2D image positions of the centers of the head and both hands of the user are used as features; these are located by a color-based tracking method. From training data of a given gesture, we first learn the spatial information and then group the data into segments that are automatically aligned temporally. The temporal information is further integrated to build a finite state machine (FSM) recognizer. Each gesture has a FSM corresponding to it. The computational efficiency of the FSM recognizers allows us to achieve real-time on-line performance. We apply this technique to build an experimental system that plays a game of �Simon Says� with the user
@InProceedings{Hong2000,
  author    = {Hong, P. and Turk, M. and Huang, T. S.},
  title     = {Gesture Modeling and Recognition using Finite State Machines},
  booktitle = {IEEE International Conference on Automatic Face and Gesture Recognition},
  year      = {2000},
  pages     = {410--415},
  abstract  = {We propose a state-based approach to gesture learning and recognition. Using spatial clustering and temporal alignment, each gesture is defined to be an ordered sequence of states in spatial-temporal space. The 2D image positions of the centers of the head and both hands of the user are used as features; these are located by a color-based tracking method. From training data of a given gesture, we first learn the spatial information and then group the data into segments that are automatically aligned temporally. The temporal information is further integrated to build a finite state machine (FSM) recognizer. Each gesture has a FSM corresponding to it. The computational efficiency of the FSM recognizers allows us to achieve real-time on-line performance. We apply this technique to build an experimental system that plays a game of �Simon Says� with the user},
  doi       = {10.1109/AFGR.2000.840667},
  groups    = {Lit Review 2013-09},
  keywords  = {feature extraction;finite state machines;gesture recognition;image colour analysis;learning (artificial intelligence);real-time systems;sequences;tracking;2D image positions;FSM;color-based tracking method;feature location;finite state machines;gesture learning;gesture modeling;gesture recognition;hands;head;ordered state sequence;real-time on-line performance;spatial clustering;state-based approach;temporal alignment;Automata;Decision support systems;Fiber reinforced plastics},
  review    = {want to recognize and learn from video sequences. learning from 2D videos is hard due to movement variation and tracking error. We assume that the
trajectories of a gesture are set of points distributed
spatially. The distribution of the data can be represented
by a set of Gaussian spatial regions. A threshold is
selected to represent the spatial variance allowed for each
state. These thresholds determine the spatial variance of
the gesture. The number of the states and their coarse
spatial parameters are calculated by dynamic k-means
clustering on the training data of the gesture without
temporal information. Training is done offline. 

Sounds like they're using Forward algorithm with HMM as the FSM. Err. Not sure. An FSM is trained for each motion. 

Input is video images with face and hand tracking, in position. A "gesture" is stored as a 2D mean and covariance, distance threshold (not sure what this is - though calc from the state mean and sd), and length of action. Training is separated into it's spatial component and temporal component, to try to reduce the impact of spatial-temporal variations. Uses Mahalanobis distance to generate distance to windowed data, and k-means is used to determine the states. The temporal alignment is done by determining how many sample is in a given state, and lining them up along the state transitions. FSM is used to train the state information.

when data is coming in, state transition occurs by examining the FSM's state thresholds.

Tested by a "simon says" program, but no segmentation accuracy is reported.},
  timestamp = {2013.09.30},
}

Downloads: 0