Deep learning for Vietnamese Sign Language recognition in video sequence

Deep learning for Vietnamese Sign Language recognition in video sequence. Vo, A. H., Pham, V. H., & Nguyen, B. T. International Journal of Machine Learning and Computing, 9(4):440–445, August, 2019. Publisher: International Association of Computer Science and Information Technology

Paper doi abstract bibtex

With most of Vietnamese hearing impaired individuals, Vietnamese Sign Language (VSL) is the only choice for communication. Thus, there are more and more study about the automatic translation of VSL to make a bridge between hearing impaired people and normal ones. However, automatic VSL recognition in video brings many challenges due to the orientation of camera, hand position and movement, inter hand relation, etc. In this paper, we present some feature extraction approaches for VSL recognition including spatial and scene-based features. Instead of relying on a static image, we specifically capture motion information between frames in a video sequence. For the recognition task, beside the traditional method of sign language recognition such as SVM, we additionally propose to use deep learning technique for VSL recognition for finding the dependence of each frame in video sequences. We collected two VSL datasets of the relative family topic (VSL-WRF) like father, mother, uncle, aunt.... The first one includes 12 words in Vietnamese language which only have a little change between frames. While the second one contains 15 with gestures involving the relative position of the body parts and orientation of the motion. Moreover, the data augmentation technique is proposed to gain more information of hand movement and hand position. The experiments achieved the satisfactory results with accuracy of 88.5% (traditional SVM) and 95.83% (deep learning). It indicates that deep learning combining with data augmentation technique provides more information about the orientation or movement of hand, and it would be able to improve the performance of VSL recognition system.

@article{Vo2019a,
	title = {Deep learning for {Vietnamese} {Sign} {Language} recognition in video sequence},
	volume = {9},
	issn = {20103700},
	url = {http://www.scopus.com/inward/record.url?eid=2-s2.0-85071329211%7B%5C&%7DpartnerID=MN8TOARS},
	doi = {10.18178/ijmlc.2019.9.4.823},
	abstract = {With most of Vietnamese hearing impaired individuals, Vietnamese Sign Language (VSL) is the only choice for communication. Thus, there are more and more study about the automatic translation of VSL to make a bridge between hearing impaired people and normal ones. However, automatic VSL recognition in video brings many challenges due to the orientation of camera, hand position and movement, inter hand relation, etc. In this paper, we present some feature extraction approaches for VSL recognition including spatial and scene-based features. Instead of relying on a static image, we specifically capture motion information between frames in a video sequence. For the recognition task, beside the traditional method of sign language recognition such as SVM, we additionally propose to use deep learning technique for VSL recognition for finding the dependence of each frame in video sequences. We collected two VSL datasets of the relative family topic (VSL-WRF) like father, mother, uncle, aunt.... The first one includes 12 words in Vietnamese language which only have a little change between frames. While the second one contains 15 with gestures involving the relative position of the body parts and orientation of the motion. Moreover, the data augmentation technique is proposed to gain more information of hand movement and hand position. The experiments achieved the satisfactory results with accuracy of 88.5\% (traditional SVM) and 95.83\% (deep learning). It indicates that deep learning combining with data augmentation technique provides more information about the orientation or movement of hand, and it would be able to improve the performance of VSL recognition system.},
	number = {4},
	journal = {International Journal of Machine Learning and Computing},
	author = {Vo, Anh H. and Pham, Van Huy and Nguyen, Bao T.},
	month = aug,
	year = {2019},
	note = {Publisher: International Association of Computer Science and Information Technology},
	keywords = {Deep learning, Local descriptors, Motion-based feature, Scene-based feature, Spatial feature, VSL recognition, Vietnamese sign language (VSL)},
	pages = {440--445},
}

Downloads: 0

{"_id":"Sio8pTmsB3DdzDgyT","bibbaseid":"vo-pham-nguyen-deeplearningforvietnamesesignlanguagerecognitioninvideosequence-2019","author_short":["Vo, A. H.","Pham, V. H.","Nguyen, B. T."],"bibdata":{"bibtype":"article","type":"article","title":"Deep learning for Vietnamese Sign Language recognition in video sequence","volume":"9","issn":"20103700","url":"http://www.scopus.com/inward/record.url?eid=2-s2.0-85071329211%7B%5C&%7DpartnerID=MN8TOARS","doi":"10.18178/ijmlc.2019.9.4.823","abstract":"With most of Vietnamese hearing impaired individuals, Vietnamese Sign Language (VSL) is the only choice for communication. Thus, there are more and more study about the automatic translation of VSL to make a bridge between hearing impaired people and normal ones. However, automatic VSL recognition in video brings many challenges due to the orientation of camera, hand position and movement, inter hand relation, etc. In this paper, we present some feature extraction approaches for VSL recognition including spatial and scene-based features. Instead of relying on a static image, we specifically capture motion information between frames in a video sequence. For the recognition task, beside the traditional method of sign language recognition such as SVM, we additionally propose to use deep learning technique for VSL recognition for finding the dependence of each frame in video sequences. We collected two VSL datasets of the relative family topic (VSL-WRF) like father, mother, uncle, aunt.... The first one includes 12 words in Vietnamese language which only have a little change between frames. While the second one contains 15 with gestures involving the relative position of the body parts and orientation of the motion. Moreover, the data augmentation technique is proposed to gain more information of hand movement and hand position. The experiments achieved the satisfactory results with accuracy of 88.5% (traditional SVM) and 95.83% (deep learning). It indicates that deep learning combining with data augmentation technique provides more information about the orientation or movement of hand, and it would be able to improve the performance of VSL recognition system.","number":"4","journal":"International Journal of Machine Learning and Computing","author":[{"propositions":[],"lastnames":["Vo"],"firstnames":["Anh","H."],"suffixes":[]},{"propositions":[],"lastnames":["Pham"],"firstnames":["Van","Huy"],"suffixes":[]},{"propositions":[],"lastnames":["Nguyen"],"firstnames":["Bao","T."],"suffixes":[]}],"month":"August","year":"2019","note":"Publisher: International Association of Computer Science and Information Technology","keywords":"Deep learning, Local descriptors, Motion-based feature, Scene-based feature, Spatial feature, VSL recognition, Vietnamese sign language (VSL)","pages":"440–445","bibtex":"@article{Vo2019a,\n\ttitle = {Deep learning for {Vietnamese} {Sign} {Language} recognition in video sequence},\n\tvolume = {9},\n\tissn = {20103700},\n\turl = {http://www.scopus.com/inward/record.url?eid=2-s2.0-85071329211%7B%5C&%7DpartnerID=MN8TOARS},\n\tdoi = {10.18178/ijmlc.2019.9.4.823},\n\tabstract = {With most of Vietnamese hearing impaired individuals, Vietnamese Sign Language (VSL) is the only choice for communication. Thus, there are more and more study about the automatic translation of VSL to make a bridge between hearing impaired people and normal ones. However, automatic VSL recognition in video brings many challenges due to the orientation of camera, hand position and movement, inter hand relation, etc. In this paper, we present some feature extraction approaches for VSL recognition including spatial and scene-based features. Instead of relying on a static image, we specifically capture motion information between frames in a video sequence. For the recognition task, beside the traditional method of sign language recognition such as SVM, we additionally propose to use deep learning technique for VSL recognition for finding the dependence of each frame in video sequences. We collected two VSL datasets of the relative family topic (VSL-WRF) like father, mother, uncle, aunt.... The first one includes 12 words in Vietnamese language which only have a little change between frames. While the second one contains 15 with gestures involving the relative position of the body parts and orientation of the motion. Moreover, the data augmentation technique is proposed to gain more information of hand movement and hand position. The experiments achieved the satisfactory results with accuracy of 88.5\\% (traditional SVM) and 95.83\\% (deep learning). It indicates that deep learning combining with data augmentation technique provides more information about the orientation or movement of hand, and it would be able to improve the performance of VSL recognition system.},\n\tnumber = {4},\n\tjournal = {International Journal of Machine Learning and Computing},\n\tauthor = {Vo, Anh H. and Pham, Van Huy and Nguyen, Bao T.},\n\tmonth = aug,\n\tyear = {2019},\n\tnote = {Publisher: International Association of Computer Science and Information Technology},\n\tkeywords = {Deep learning, Local descriptors, Motion-based feature, Scene-based feature, Spatial feature, VSL recognition, Vietnamese sign language (VSL)},\n\tpages = {440--445},\n}\n\n","author_short":["Vo, A. H.","Pham, V. H.","Nguyen, B. T."],"key":"Vo2019a","id":"Vo2019a","bibbaseid":"vo-pham-nguyen-deeplearningforvietnamesesignlanguagerecognitioninvideosequence-2019","role":"author","urls":{"Paper":"http://www.scopus.com/inward/record.url?eid=2-s2.0-85071329211%7B%5C&%7DpartnerID=MN8TOARS"},"keyword":["Deep learning","Local descriptors","Motion-based feature","Scene-based feature","Spatial feature","VSL recognition","Vietnamese sign language (VSL)"],"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://api.zotero.org/groups/2168152/items?key=VCdsaROd5deDY3prqqG8kI0c&format=bibtex&limit=100","dataSources":["syJjwTDDM32TsM2iF","QwrFbRJvXF69SEShv","HbngRCZLbLed2q9QT","LtEFvT85hYpNg4Esp","iHfnnAr7wKJJxkNMt","PrvBTxn4Zgeep29e5","78Yd9ZHcx783Wkffe","SKRhTA7ok4L4waPkZ","GfrMfnKTkYdcYTRsy","RqqCdXGEyWH4dZ76k","cbiwaQPQJSZeJDDY9","2Jak7xK39ytqcgqQ4","CDfDBPD6CDScj6Ty4","WgiCycoQjRx6KArBy","KBdipwowTNXWiKqYd","yjd6eECyb3TYZpZ3R","D9jmZ7aoHfJnYQ4ES","R8dLFAvyQ2oFRijDJ","dc6SzEK4S9LfC3XpA","kGWABmrDfhF29uibh","YE9GesxGLCsBc3vvC","v3qfuosZ66nvD85FK","BSxBG5ms26R2teZn9"],"keywords":["deep learning","local descriptors","motion-based feature","scene-based feature","spatial feature","vsl recognition","vietnamese sign language (vsl)"],"search_terms":["deep","learning","vietnamese","sign","language","recognition","video","sequence","vo","pham","nguyen"],"title":"Deep learning for Vietnamese Sign Language recognition in video sequence","year":2019}