Performance evaluation of deep feature learning for RGB-D image/video classification. Shao, L., Cai, Z., Liu, L., & Lu, K. Information Sciences, 385-386:266--283, April, 2017.
Performance evaluation of deep feature learning for RGB-D image/video classification [link]Paper  doi  abstract   bibtex   
Deep Neural Networks for image/video classification have obtained much success in various computer vision applications. Existing deep learning algorithms are widely used on RGB images or video data. Meanwhile, with the development of low-cost RGB-D sensors (such as Microsoft Kinect and Xtion Pro-Live), high-quality RGB-D data can be easily acquired and used to enhance computer vision algorithms [14]. It would be interesting to investigate how deep learning can be employed for extracting and fusing features from RGB-D data. In this paper, after briefly reviewing the basic concepts of RGB-D information and four prevalent deep learning models (i.e., Deep Belief Networks (DBNs), Stacked Denoising Auto-Encoders (SDAE), Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) Neural Networks), we conduct extensive experiments on five popular RGB-D datasets including three image datasets and two video datasets. We then present a detailed analysis about the comparison between the learned feature representations from the four deep learning models. In addition, a few suggestions on how to adjust hyper-parameters for learning deep neural networks are made in this paper. According to the extensive experimental results, we believe that this evaluation will provide insights and a deeper understanding of different deep learning algorithms for RGB-D feature extraction and fusion.
@article{shao_performance_2017,
	title = {Performance evaluation of deep feature learning for {RGB}-{D} image/video classification},
	volume = {385-386},
	issn = {0020-0255},
	url = {http://www.sciencedirect.com/science/article/pii/S0020025517300191},
	doi = {10.1016/j.ins.2017.01.013},
	abstract = {Deep Neural Networks for image/video classification have obtained much success in various computer vision applications. Existing deep learning algorithms are widely used on RGB images or video data. Meanwhile, with the development of low-cost RGB-D sensors (such as Microsoft Kinect and Xtion Pro-Live), high-quality RGB-D data can be easily acquired and used to enhance computer vision algorithms [14]. It would be interesting to investigate how deep learning can be employed for extracting and fusing features from RGB-D data. In this paper, after briefly reviewing the basic concepts of RGB-D information and four prevalent deep learning models (i.e., Deep Belief Networks (DBNs), Stacked Denoising Auto-Encoders (SDAE), Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) Neural Networks), we conduct extensive experiments on five popular RGB-D datasets including three image datasets and two video datasets. We then present a detailed analysis about the comparison between the learned feature representations from the four deep learning models. In addition, a few suggestions on how to adjust hyper-parameters for learning deep neural networks are made in this paper. According to the extensive experimental results, we believe that this evaluation will provide insights and a deeper understanding of different deep learning algorithms for RGB-D feature extraction and fusion.},
	urldate = {2018-03-25TZ},
	journal = {Information Sciences},
	author = {Shao, Ling and Cai, Ziyun and Liu, Li and Lu, Ke},
	month = apr,
	year = {2017},
	keywords = {Deep neural networks, Feature learning, Performance evaluation, RGB-D data},
	pages = {266--283}
}

Downloads: 0