Temporal task segmentation

In order to analyze a given sequence of a set of sensory data of an entire task, we have to preprocess it by dividing it into constituent parts which we can then individually analyze. The basic phases that compose a grasping task are the pregrasp, grasp and manipulation phases.

The sensory data that are available from our observation module are the human finger joint angles and the 3D pose (position and orientation) of the hand relative to a global coordinate system. The questions then are: What are the features that can be used as discriminants in segmenting the task? and How can these features be used to segment the task?

To answer these questions, we refer to the literature on human hand motion, primarily in the psychology circles. Many of the these studies concentrate on the reaching action (i.e., pregrasp phase) of the human hand and the effect of differing object sizes and visual conditions (occluded or unoccluded view). The two most frequently used features are the hand speed and the grip aperture (which is the distance between the tips of the thumb and the index finger). Typical profiles of these features are shown below.

As can be seen, both the speed and grip aperture profiles have the characteristic inverted bell shapes. This is not too surprising, since the hand undergoes an acceleration phase and decceleration phase in reaching for an object; in addition, the hand fingers move to widen (which, at its widest, should be greater than the width of the object at the intended grasp positions) in anticipation of the grasp. It is interesting to note that the peak of the grip aperture profile normally occurs after that of the speed profile.

Features for task segmentation

In light of the evidence offered by studies on human hand motion, we chose the following features:

The segmentation algorithm

The assumptions made are:

The segmentation algorithm is a relatively simple one. It comprises the following steps:

  1. Hypothesize the task breakpoints separating the phases.
    The task breakpoints are initially hypothesized from the local minima of the hand speed profile.
  2. Calculate the RMS error of fitting parabolas to hypothesized pregrasp curves (using the volume sweep rate profile).
    For each set of breakpoints, the mean of this RMS fit error is calculated.
  3. Find the combination of the breakpoints which yields the minimum mean RMS fit error.

To illustrate, consider the two sets of hypothesized breakpoints below. The first one yields a good fit, as can be seen (the e's are the RMS errors of parabolic fit to the volume sweep rate profile during the hypothesized pregrasp phase). The result of a bad choice of breakpoints can also be seen in the second example.


  1. S.B. Kang and K. Ikeuchi, "Determination of motion breakpoints in a task sequence from human hand motion," to appear in Proc. IEEE Int'l Conf. on Robotics and Automation, San Diego, CA, May 1994.
  2. S.B. Kang and K. Ikeuchi, Temporal segmentation of tasks from human hand motion, Tech. Rep. CMU-CS-93-150, Carnegie Mellon University, April 1993.

Return to Research interests