Segmenting Motion Capture Data into Distinct Behaviors

Segmenting Motion Capture Data into Distinct Behaviors. Barbič, J., Safonova, A., Pan, J., Faloutsos, C., Hodgins, J. K., & Pollard, N. S. In Proceedings of Graphics Interface, pages 185--194, 2004.
abstract bibtex

Much of the motion capture data used in animations, commercials, and video games is carefully segmented into distinct motions either at the time of capture or by hand after the capture session. As we move toward collecting more and longer motion sequences, however, automatic segmentation techniques will become important for processing the results in a reasonable time frame.We have found that straightforward, easy to implement segmentation techniques can be very effective for segmenting motion sequences into distinct behaviors. In this paper, we present three approaches for automatic segmentation. The first two approaches are online, meaning that the algorithm traverses the motion from beginning to end, creating the segmentation as it proceeds. The first assigns a cut when the intrinsic dimensionality of a local model of the motion suddenly increases. The second places a cut when the distribution of poses is observed to change. The third approach is a batch process and segments the sequence where consecutive frames belong to different elements of a Gaussian mixture model. We assess these three methods on fourteen motion sequences and compare the performance of the automatic methods to that of transitions selected manually.

@InProceedings{Barbic2004,
  author    = {Barbi\v{c}, J. and Safonova, A. and Pan, J.-Y. and Faloutsos, C. and Hodgins, J. K. and Pollard, N. S.},
  title     = {Segmenting Motion Capture Data into Distinct Behaviors},
  booktitle = {Proceedings of Graphics Interface},
  year      = {2004},
  pages     = {185--194},
  abstract  = {Much of the motion capture data used in animations, commercials, and video games is carefully segmented into distinct motions either at the time of capture or by hand after the capture session. As we move toward collecting more and longer motion sequences, however, automatic segmentation techniques will become important for processing the results in a reasonable time frame.We have found that straightforward, easy to implement segmentation techniques can be very effective for segmenting motion sequences into distinct behaviors. In this paper, we present three approaches for automatic segmentation. The first two approaches are online, meaning that the algorithm traverses the motion from beginning to end, creating the segmentation as it proceeds. The first assigns a cut when the intrinsic dimensionality of a local model of the motion suddenly increases. The second places a cut when the distribution of poses is observed to change. The third approach is a batch process and segments the sequence where consecutive frames belong to different elements of a Gaussian mixture model. We assess these three methods on fourteen motion sequences and compare the performance of the automatic methods to that of transitions selected manually.},
  acmid     = {1006081},
  groups    = {Lit Review 2013-09},
  isbn      = {1-56881-227-2},
  keywords  = {PCA, human motion, motion capture, motion segmentation},
  location  = {London, Ontario, Canada},
  numpages  = {10},
  review    = {Needs to perform a lot of hand segmentation. Hand segmentation not time efficient. Proposees 3 unsupervised learning algorithms. First two online. Third offline. 

Alg 1 - segment when intrinsic dimensionalty of a window increases. uses PCA
Alg 2 - segment when distrubiton of poses is observed to change. uses probablistic PCA
Alg 3 - segment when GMMs change

Alg 1 looks at the joint angles of the body, and puts a windowed data into SVD/PCA. They look at the error (denote e_i) between x_i (data frame) and x_i' (data frame recon from PCA, to the r^th dim), where r is selected by a ratio (denote E_r) between the 1:r^th singular value (diagonal term in the Sigma term in SVD) and all the terms, when this ratio is above some threshold. First, the system is simplified by taking the top singular values (E_r > 0.9, denoted tau) over 2 seconds of data, and discarding the rest of the dim. Then they calculate the SVD and e_i of each frame. If the motion is not changing, the error will increase steadily, but not significantly. If a new motion occurs, the recon is not able to capture it sufficently, and the error will jump greatly. This jump is detected by thresholding the derivative of the e_i term...if it exceeds 3 stddev from the avg(diff(e_i)), then segment is declared. the 2 seconds is reset there -> though, ti seems like their method would be very sensitive to the thresholds... the 2 seconds, the e_i deriv threshold, etc.

Alg 2 uses the probablistic PCA, which is normal PCA, but also the discarded PCA dims are modeled as noise. The mean and the stddev of the discarded singular values can be calculated and its Gaussian distribution calculated. The first K frames are modeled as PPCA, then calculate the Mahalanobis distance for datapoints beyond K (ie K+1 to K+T, to some T), and incriment K until a segment is found. The Mah distance curve will form valleys and peaks. At the top of the peak, it can be assumed that the area covered in the first K entires are all the same, and thus the subsequent motions to be observed are different. A segment can be declared here. -> however, how well this works depends on how big these windows are set, and the Gaussian. If two subsequent motions have smiliar Gaussians (ie the subj is doing the same motion twice), this may not work out well.

Alg 3 uses GMM. The entire obs is modeled as a GMM, and separate the sequence by looking at which sequence belongs to different Gaussian cluster in the GMM. This assumes that the Gaussians are separable. You would also need to know how many clusters are needed for the GMM. 

Tested this on 14 sequences, with 8000 frames each, consisting of 10 simple motions (walking, running, standing, exercising, sweeping, etc). PPCA preformed the best, at 92% precision and 95% recall. GMM performed the worst. They were not able to determine a single frame of transition since the transition was smooth, so a transiton window was allowed instead of a single frame. Tuning was done on a subset of the database, then applied to the whole thing.




Barbi\v{c} \etal \cite{Barbic2004} calculate the principal component analysis (PCA) transformation matrix from a given frame of data, and only the top \emph{r} dimensions are retained. The subsequent frames are transformed using the original PCA transformation matrix, and the reconstruction error between the original data and the PCA data is calculated. If the underlying motion changed, the PCA-projected data will differ greatly when compared to the original data, and a segment is declared. However, this method is very sensitive to the \emph{r} and error threshold used. This algorithm was tested on 14 different motion sequences, consisting of 10 simple motions such as walking, running, sweeping and standing. Video playback was used to generate ground truth data, but a range of data frames were declared to be a manual segment, instead of a single point, which effectively means that $Ver_{temporalInv}$ was employed, but with a variable $t_{error}$. $Acc_{precision}$ of 79\% and $Acc_{recall}$ of 88\% was reported. \todo{would this more be a distance metric or a variance one? }

Barbi\v{c} \etal \cite{Barbic2004} propose the probabilistic PCA (pPCA), and is used to observe the changes in the pose distribution. The pPCA is PCA, but with the discarded dimensions modeled as Gaussian noise. A \emph{K} amount of data frames are modeled using pPCA, than the Mahalanobis distance between the pPCA frames and frames beyond \emph{K} is calculated. When the Mahalanobis distance forms a local maximum, implying that the discarded terms between the pPCA frames and the subsequent frames have peaked, a segment is declared. This method may not work well if the subsequent motions are all similar in nature. When tested on 14 sequences of 10 different motions performed sequentially, pPCA reported $Acc_{precision}$ of 92\% and $Acc_{recall}$ of 95\% was reported, using $Ver_{temporalInv}$. Ground truth was hand labelled, but the specific mechanism of labelling was unspecified.},
  timestamp = {2011.06.11},
}

Downloads: 0

{"_id":"iyGx82Bu4uFcwhWTm","bibbaseid":"barbi-safonova-pan-faloutsos-hodgins-pollard-segmentingmotioncapturedataintodistinctbehaviors-2004","downloads":0,"creationDate":"2017-09-14T16:34:36.194Z","title":"Segmenting Motion Capture Data into Distinct Behaviors","author_short":["Barbič, J.","Safonova, A.","Pan, J.","Faloutsos, C.","Hodgins, J. K.","Pollard, N. S."],"year":2004,"bibtype":"inproceedings","biburl":"https://raw.githubusercontent.com/jfslin/jfslin.github.io/master/jf2lin.bib","bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"propositions":[],"lastnames":["Barbič"],"firstnames":["J."],"suffixes":[]},{"propositions":[],"lastnames":["Safonova"],"firstnames":["A."],"suffixes":[]},{"propositions":[],"lastnames":["Pan"],"firstnames":["J.-Y."],"suffixes":[]},{"propositions":[],"lastnames":["Faloutsos"],"firstnames":["C."],"suffixes":[]},{"propositions":[],"lastnames":["Hodgins"],"firstnames":["J.","K."],"suffixes":[]},{"propositions":[],"lastnames":["Pollard"],"firstnames":["N.","S."],"suffixes":[]}],"title":"Segmenting Motion Capture Data into Distinct Behaviors","booktitle":"Proceedings of Graphics Interface","year":"2004","pages":"185--194","abstract":"Much of the motion capture data used in animations, commercials, and video games is carefully segmented into distinct motions either at the time of capture or by hand after the capture session. As we move toward collecting more and longer motion sequences, however, automatic segmentation techniques will become important for processing the results in a reasonable time frame.We have found that straightforward, easy to implement segmentation techniques can be very effective for segmenting motion sequences into distinct behaviors. In this paper, we present three approaches for automatic segmentation. The first two approaches are online, meaning that the algorithm traverses the motion from beginning to end, creating the segmentation as it proceeds. The first assigns a cut when the intrinsic dimensionality of a local model of the motion suddenly increases. The second places a cut when the distribution of poses is observed to change. The third approach is a batch process and segments the sequence where consecutive frames belong to different elements of a Gaussian mixture model. We assess these three methods on fourteen motion sequences and compare the performance of the automatic methods to that of transitions selected manually.","acmid":"1006081","groups":"Lit Review 2013-09","isbn":"1-56881-227-2","keywords":"PCA, human motion, motion capture, motion segmentation","location":"London, Ontario, Canada","numpages":"10","review":"Needs to perform a lot of hand segmentation. Hand segmentation not time efficient. Proposees 3 unsupervised learning algorithms. First two online. Third offline. Alg 1 - segment when intrinsic dimensionalty of a window increases. uses PCA Alg 2 - segment when distrubiton of poses is observed to change. uses probablistic PCA Alg 3 - segment when GMMs change Alg 1 looks at the joint angles of the body, and puts a windowed data into SVD/PCA. They look at the error (denote e_i) between x_i (data frame) and x_i' (data frame recon from PCA, to the r^th dim), where r is selected by a ratio (denote E_r) between the 1:r^th singular value (diagonal term in the Sigma term in SVD) and all the terms, when this ratio is above some threshold. First, the system is simplified by taking the top singular values (E_r > 0.9, denoted tau) over 2 seconds of data, and discarding the rest of the dim. Then they calculate the SVD and e_i of each frame. If the motion is not changing, the error will increase steadily, but not significantly. If a new motion occurs, the recon is not able to capture it sufficently, and the error will jump greatly. This jump is detected by thresholding the derivative of the e_i term...if it exceeds 3 stddev from the avg(diff(e_i)), then segment is declared. the 2 seconds is reset there -> though, ti seems like their method would be very sensitive to the thresholds... the 2 seconds, the e_i deriv threshold, etc. Alg 2 uses the probablistic PCA, which is normal PCA, but also the discarded PCA dims are modeled as noise. The mean and the stddev of the discarded singular values can be calculated and its Gaussian distribution calculated. The first K frames are modeled as PPCA, then calculate the Mahalanobis distance for datapoints beyond K (ie K+1 to K+T, to some T), and incriment K until a segment is found. The Mah distance curve will form valleys and peaks. At the top of the peak, it can be assumed that the area covered in the first K entires are all the same, and thus the subsequent motions to be observed are different. A segment can be declared here. -> however, how well this works depends on how big these windows are set, and the Gaussian. If two subsequent motions have smiliar Gaussians (ie the subj is doing the same motion twice), this may not work out well. Alg 3 uses GMM. The entire obs is modeled as a GMM, and separate the sequence by looking at which sequence belongs to different Gaussian cluster in the GMM. This assumes that the Gaussians are separable. You would also need to know how many clusters are needed for the GMM. Tested this on 14 sequences, with 8000 frames each, consisting of 10 simple motions (walking, running, standing, exercising, sweeping, etc). PPCA preformed the best, at 92% precision and 95% recall. GMM performed the worst. They were not able to determine a single frame of transition since the transition was smooth, so a transiton window was allowed instead of a single frame. Tuning was done on a subset of the database, then applied to the whole thing. Barbič \\etal i̧teBarbic2004 calculate the principal component analysis (PCA) transformation matrix from a given frame of data, and only the top \\emphr dimensions are retained. The subsequent frames are transformed using the original PCA transformation matrix, and the reconstruction error between the original data and the PCA data is calculated. If the underlying motion changed, the PCA-projected data will differ greatly when compared to the original data, and a segment is declared. However, this method is very sensitive to the \\emphr and error threshold used. This algorithm was tested on 14 different motion sequences, consisting of 10 simple motions such as walking, running, sweeping and standing. Video playback was used to generate ground truth data, but a range of data frames were declared to be a manual segment, instead of a single point, which effectively means that $Ver_{temporalInv}$ was employed, but with a variable $t_{error}$. $Acc_{precision}$ of 79% and $Acc_{recall}$ of 88% was reported. \\todowould this more be a distance metric or a variance one? Barbič \\etal i̧teBarbic2004 propose the probabilistic PCA (pPCA), and is used to observe the changes in the pose distribution. The pPCA is PCA, but with the discarded dimensions modeled as Gaussian noise. A \\emphK amount of data frames are modeled using pPCA, than the Mahalanobis distance between the pPCA frames and frames beyond \\emphK is calculated. When the Mahalanobis distance forms a local maximum, implying that the discarded terms between the pPCA frames and the subsequent frames have peaked, a segment is declared. This method may not work well if the subsequent motions are all similar in nature. When tested on 14 sequences of 10 different motions performed sequentially, pPCA reported $Acc_{precision}$ of 92% and $Acc_{recall}$ of 95% was reported, using $Ver_{temporalInv}$. Ground truth was hand labelled, but the specific mechanism of labelling was unspecified.","timestamp":"2011.06.11","bibtex":"@InProceedings{Barbic2004,\n author = {Barbi\\v{c}, J. and Safonova, A. and Pan, J.-Y. and Faloutsos, C. and Hodgins, J. K. and Pollard, N. S.},\n title = {Segmenting Motion Capture Data into Distinct Behaviors},\n booktitle = {Proceedings of Graphics Interface},\n year = {2004},\n pages = {185--194},\n abstract = {Much of the motion capture data used in animations, commercials, and video games is carefully segmented into distinct motions either at the time of capture or by hand after the capture session. As we move toward collecting more and longer motion sequences, however, automatic segmentation techniques will become important for processing the results in a reasonable time frame.We have found that straightforward, easy to implement segmentation techniques can be very effective for segmenting motion sequences into distinct behaviors. In this paper, we present three approaches for automatic segmentation. The first two approaches are online, meaning that the algorithm traverses the motion from beginning to end, creating the segmentation as it proceeds. The first assigns a cut when the intrinsic dimensionality of a local model of the motion suddenly increases. The second places a cut when the distribution of poses is observed to change. The third approach is a batch process and segments the sequence where consecutive frames belong to different elements of a Gaussian mixture model. We assess these three methods on fourteen motion sequences and compare the performance of the automatic methods to that of transitions selected manually.},\n acmid = {1006081},\n groups = {Lit Review 2013-09},\n isbn = {1-56881-227-2},\n keywords = {PCA, human motion, motion capture, motion segmentation},\n location = {London, Ontario, Canada},\n numpages = {10},\n review = {Needs to perform a lot of hand segmentation. Hand segmentation not time efficient. Proposees 3 unsupervised learning algorithms. First two online. Third offline. \n\nAlg 1 - segment when intrinsic dimensionalty of a window increases. uses PCA\nAlg 2 - segment when distrubiton of poses is observed to change. uses probablistic PCA\nAlg 3 - segment when GMMs change\n\nAlg 1 looks at the joint angles of the body, and puts a windowed data into SVD/PCA. They look at the error (denote e_i) between x_i (data frame) and x_i' (data frame recon from PCA, to the r^th dim), where r is selected by a ratio (denote E_r) between the 1:r^th singular value (diagonal term in the Sigma term in SVD) and all the terms, when this ratio is above some threshold. First, the system is simplified by taking the top singular values (E_r > 0.9, denoted tau) over 2 seconds of data, and discarding the rest of the dim. Then they calculate the SVD and e_i of each frame. If the motion is not changing, the error will increase steadily, but not significantly. If a new motion occurs, the recon is not able to capture it sufficently, and the error will jump greatly. This jump is detected by thresholding the derivative of the e_i term...if it exceeds 3 stddev from the avg(diff(e_i)), then segment is declared. the 2 seconds is reset there -> though, ti seems like their method would be very sensitive to the thresholds... the 2 seconds, the e_i deriv threshold, etc.\n\nAlg 2 uses the probablistic PCA, which is normal PCA, but also the discarded PCA dims are modeled as noise. The mean and the stddev of the discarded singular values can be calculated and its Gaussian distribution calculated. The first K frames are modeled as PPCA, then calculate the Mahalanobis distance for datapoints beyond K (ie K+1 to K+T, to some T), and incriment K until a segment is found. The Mah distance curve will form valleys and peaks. At the top of the peak, it can be assumed that the area covered in the first K entires are all the same, and thus the subsequent motions to be observed are different. A segment can be declared here. -> however, how well this works depends on how big these windows are set, and the Gaussian. If two subsequent motions have smiliar Gaussians (ie the subj is doing the same motion twice), this may not work out well.\n\nAlg 3 uses GMM. The entire obs is modeled as a GMM, and separate the sequence by looking at which sequence belongs to different Gaussian cluster in the GMM. This assumes that the Gaussians are separable. You would also need to know how many clusters are needed for the GMM. \n\nTested this on 14 sequences, with 8000 frames each, consisting of 10 simple motions (walking, running, standing, exercising, sweeping, etc). PPCA preformed the best, at 92% precision and 95% recall. GMM performed the worst. They were not able to determine a single frame of transition since the transition was smooth, so a transiton window was allowed instead of a single frame. Tuning was done on a subset of the database, then applied to the whole thing.\n\n\n\n\nBarbi\\v{c} \\etal \\cite{Barbic2004} calculate the principal component analysis (PCA) transformation matrix from a given frame of data, and only the top \\emph{r} dimensions are retained. The subsequent frames are transformed using the original PCA transformation matrix, and the reconstruction error between the original data and the PCA data is calculated. If the underlying motion changed, the PCA-projected data will differ greatly when compared to the original data, and a segment is declared. However, this method is very sensitive to the \\emph{r} and error threshold used. This algorithm was tested on 14 different motion sequences, consisting of 10 simple motions such as walking, running, sweeping and standing. Video playback was used to generate ground truth data, but a range of data frames were declared to be a manual segment, instead of a single point, which effectively means that $Ver_{temporalInv}$ was employed, but with a variable $t_{error}$. $Acc_{precision}$ of 79\\% and $Acc_{recall}$ of 88\\% was reported. \\todo{would this more be a distance metric or a variance one? }\n\nBarbi\\v{c} \\etal \\cite{Barbic2004} propose the probabilistic PCA (pPCA), and is used to observe the changes in the pose distribution. The pPCA is PCA, but with the discarded dimensions modeled as Gaussian noise. A \\emph{K} amount of data frames are modeled using pPCA, than the Mahalanobis distance between the pPCA frames and frames beyond \\emph{K} is calculated. When the Mahalanobis distance forms a local maximum, implying that the discarded terms between the pPCA frames and the subsequent frames have peaked, a segment is declared. This method may not work well if the subsequent motions are all similar in nature. When tested on 14 sequences of 10 different motions performed sequentially, pPCA reported $Acc_{precision}$ of 92\\% and $Acc_{recall}$ of 95\\% was reported, using $Ver_{temporalInv}$. Ground truth was hand labelled, but the specific mechanism of labelling was unspecified.},\n timestamp = {2011.06.11},\n}\n\n","author_short":["Barbič, J.","Safonova, A.","Pan, J.","Faloutsos, C.","Hodgins, J. K.","Pollard, N. S."],"key":"Barbic2004","id":"Barbic2004","bibbaseid":"barbi-safonova-pan-faloutsos-hodgins-pollard-segmentingmotioncapturedataintodistinctbehaviors-2004","role":"author","urls":{},"keyword":["PCA","human motion","motion capture","motion segmentation"],"downloads":0},"search_terms":["segmenting","motion","capture","data","distinct","behaviors","barbič","safonova","pan","faloutsos","hodgins","pollard"],"keywords":["pca","human motion","motion capture","motion segmentation"],"authorIDs":[],"dataSources":["iCsmKnycRmHPxmhBd"]}