TSC-DL: Unsupervised trajectory segmentation of multi-modal surgical demonstrations with Deep Learning. Murali, A., Garg, A., Krishnan, S., Pokorny, F. T., Abbeel, P., Darrell, T., & Goldberg, K. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 4150--4157, 2016.
doi  abstract   bibtex   
The growth of robot-assisted minimally invasive surgery has led to sizable datasets of fixed-camera video and kinematic recordings of surgical subtasks. Segmentation of these trajectories into locally-similar contiguous sections can facilitate learning from demonstrations, skill assessment, and salvaging good segments from otherwise inconsistent demonstrations. Manual, or supervised, segmentation can be prone to error and impractical for large datasets. We present Transition State Clustering with Deep Learning (TSC-DL), a new unsupervised algorithm that leverages video and kinematic data for task-level segmentation, and finds regions of the visual feature space that correlate with transition events using features constructed from layers of pre-trained image classification Deep Convolutional Neural Networks (CNNs). We report results on three datasets comparing Deep Learning architectures (AlexNet and VGG), choice of convolutional layer, dimensionality reduction techniques, visual encoding, and the use of Scale Invariant Feature Transforms (SIFT). We find that the deep architectures extract features that result in up-to a 30.4% improvement in Silhouette Score (a measure of cluster tightness) over the traditional �shallow� features from SIFT. We also present cases where TSC-DL discovers human annotator omissions. Supplementary material, data and code is available at: http://berkeleyautomation.github.io/tsc-dl/.
@InProceedings{Murali2016,
  Title                    = {TSC-DL: Unsupervised trajectory segmentation of multi-modal surgical demonstrations with Deep Learning},
  Author                   = {A. Murali and A. Garg and S. Krishnan and F. T. Pokorny and P. Abbeel and T. Darrell and K. Goldberg},
  Booktitle                = {Proceedings of the IEEE International Conference on Robotics and Automation},
  Year                     = {2016},
  Pages                    = {4150--4157},

  Abstract                 = {The growth of robot-assisted minimally invasive surgery has led to sizable datasets of fixed-camera video and kinematic recordings of surgical subtasks. Segmentation of these trajectories into locally-similar contiguous sections can facilitate learning from demonstrations, skill assessment, and salvaging good segments from otherwise inconsistent demonstrations. Manual, or supervised, segmentation can be prone to error and impractical for large datasets. We present Transition State Clustering with Deep Learning (TSC-DL), a new unsupervised algorithm that leverages video and kinematic data for task-level segmentation, and finds regions of the visual feature space that correlate with transition events using features constructed from layers of pre-trained image classification Deep Convolutional Neural Networks (CNNs). We report results on three datasets comparing Deep Learning architectures (AlexNet and VGG), choice of convolutional layer, dimensionality reduction techniques, visual encoding, and the use of Scale Invariant Feature Transforms (SIFT). We find that the deep architectures extract features that result in up-to a 30.4% improvement in Silhouette Score (a measure of cluster tightness) over the traditional �shallow� features from SIFT. We also present cases where TSC-DL discovers human annotator omissions. Supplementary material, data and code is available at: http://berkeleyautomation.github.io/tsc-dl/.},
  Doi                      = {10.1109/ICRA.2016.7487607},
  Keywords                 = {control engineering computing;convolution;feature extraction;image classification;image segmentation;learning (artificial intelligence);medical image processing;medical robotics;neural nets;pattern clustering;robot vision;surgery;transforms;video signal processing;TSC-DL unsupervised algorithm;dimensionality reduction;feature extraction;fixed-camera video;image classification deep convolutional neural networks;learning from demonstrations;multimodal surgical demonstrations;robot-assisted minimally invasive surgery;scale invariant feature transforms;surgical subtask kinematic recordings;task-level segmentation;transition state clustering with deep learning;unsupervised trajectory segmentation;visual encoding;visual feature space;Feature extraction;Hidden Markov models;Kinematics;Machine learning;Motion segmentation;Visualization},
  Timestamp                = {2017.01.02}
}
Downloads: 0