Lip-Reading with Limited-Data Network

Lip-Reading with Limited-Data Network. Fernandez-Lopez, A. & Sukno, F. M. In 2019 27th European Signal Processing Conference (EUSIPCO), pages 1-5, Sep., 2019.

Paper doi abstract bibtex

The development of Automatic Lip-Reading (ALR) systems is currently dominated by Deep Learning (DL) approaches. However, DL systems generally face two main issues related to the amount of data and the complexity of the model. To find a balance between the amount of available training data and the number of parameters of the model, in this work we introduce an end-to-end ALR system that combines CNNs and LSTMs and can be trained without large-scale databases. To this end, we propose to split the training by modules, by automatically generating weak labels per frames, termed visual units. These weak visual units are representative enough to guide the CNN to extract meaningful features that when combined with the context provided by the temporal module, are sufficiently informative to train an ALR system in a very short time and with no need for manual labeling. The system is evaluated in the well-known OuluVS2 database to perform sentence-level classification. We obtain an accuracy of 91.38% which is comparable to state-of the-art results but, differently from most previous approaches, we do not require the use of external training data.

@InProceedings{8902572,
  author = {A. Fernandez-Lopez and F. M. Sukno},
  booktitle = {2019 27th European Signal Processing Conference (EUSIPCO)},
  title = {Lip-Reading with Limited-Data Network},
  year = {2019},
  pages = {1-5},
  abstract = {The development of Automatic Lip-Reading (ALR) systems is currently dominated by Deep Learning (DL) approaches. However, DL systems generally face two main issues related to the amount of data and the complexity of the model. To find a balance between the amount of available training data and the number of parameters of the model, in this work we introduce an end-to-end ALR system that combines CNNs and LSTMs and can be trained without large-scale databases. To this end, we propose to split the training by modules, by automatically generating weak labels per frames, termed visual units. These weak visual units are representative enough to guide the CNN to extract meaningful features that when combined with the context provided by the temporal module, are sufficiently informative to train an ALR system in a very short time and with no need for manual labeling. The system is evaluated in the well-known OuluVS2 database to perform sentence-level classification. We obtain an accuracy of 91.38% which is comparable to state-of the-art results but, differently from most previous approaches, we do not require the use of external training data.},
  keywords = {feature extraction;image classification;image motion analysis;learning (artificial intelligence);DL systems;end-to-end ALR system;large-scale databases;OuluVS2 database;limited-data network;automatic lip-reading systems;deep learning;sentence-level classification;Visualization;Databases;Training;Feature extraction;Training data;Data models;Labeling;Lip-reading;Visual Speech;Deep Learning},
  doi = {10.23919/EUSIPCO.2019.8902572},
  issn = {2076-1465},
  month = {Sep.},
  url = {https://www.eurasip.org/proceedings/eusipco/eusipco2019/proceedings/papers/1570532622.pdf},
}

Downloads: 0

{"_id":"pd8pTMXqxZtLjnTX4","bibbaseid":"fernandezlopez-sukno-lipreadingwithlimiteddatanetwork-2019","authorIDs":[],"author_short":["Fernandez-Lopez, A.","Sukno, F. M."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["A."],"propositions":[],"lastnames":["Fernandez-Lopez"],"suffixes":[]},{"firstnames":["F.","M."],"propositions":[],"lastnames":["Sukno"],"suffixes":[]}],"booktitle":"2019 27th European Signal Processing Conference (EUSIPCO)","title":"Lip-Reading with Limited-Data Network","year":"2019","pages":"1-5","abstract":"The development of Automatic Lip-Reading (ALR) systems is currently dominated by Deep Learning (DL) approaches. However, DL systems generally face two main issues related to the amount of data and the complexity of the model. To find a balance between the amount of available training data and the number of parameters of the model, in this work we introduce an end-to-end ALR system that combines CNNs and LSTMs and can be trained without large-scale databases. To this end, we propose to split the training by modules, by automatically generating weak labels per frames, termed visual units. These weak visual units are representative enough to guide the CNN to extract meaningful features that when combined with the context provided by the temporal module, are sufficiently informative to train an ALR system in a very short time and with no need for manual labeling. The system is evaluated in the well-known OuluVS2 database to perform sentence-level classification. We obtain an accuracy of 91.38% which is comparable to state-of the-art results but, differently from most previous approaches, we do not require the use of external training data.","keywords":"feature extraction;image classification;image motion analysis;learning (artificial intelligence);DL systems;end-to-end ALR system;large-scale databases;OuluVS2 database;limited-data network;automatic lip-reading systems;deep learning;sentence-level classification;Visualization;Databases;Training;Feature extraction;Training data;Data models;Labeling;Lip-reading;Visual Speech;Deep Learning","doi":"10.23919/EUSIPCO.2019.8902572","issn":"2076-1465","month":"Sep.","url":"https://www.eurasip.org/proceedings/eusipco/eusipco2019/proceedings/papers/1570532622.pdf","bibtex":"@InProceedings{8902572,\n author = {A. Fernandez-Lopez and F. M. Sukno},\n booktitle = {2019 27th European Signal Processing Conference (EUSIPCO)},\n title = {Lip-Reading with Limited-Data Network},\n year = {2019},\n pages = {1-5},\n abstract = {The development of Automatic Lip-Reading (ALR) systems is currently dominated by Deep Learning (DL) approaches. However, DL systems generally face two main issues related to the amount of data and the complexity of the model. To find a balance between the amount of available training data and the number of parameters of the model, in this work we introduce an end-to-end ALR system that combines CNNs and LSTMs and can be trained without large-scale databases. To this end, we propose to split the training by modules, by automatically generating weak labels per frames, termed visual units. These weak visual units are representative enough to guide the CNN to extract meaningful features that when combined with the context provided by the temporal module, are sufficiently informative to train an ALR system in a very short time and with no need for manual labeling. The system is evaluated in the well-known OuluVS2 database to perform sentence-level classification. We obtain an accuracy of 91.38% which is comparable to state-of the-art results but, differently from most previous approaches, we do not require the use of external training data.},\n keywords = {feature extraction;image classification;image motion analysis;learning (artificial intelligence);DL systems;end-to-end ALR system;large-scale databases;OuluVS2 database;limited-data network;automatic lip-reading systems;deep learning;sentence-level classification;Visualization;Databases;Training;Feature extraction;Training data;Data models;Labeling;Lip-reading;Visual Speech;Deep Learning},\n doi = {10.23919/EUSIPCO.2019.8902572},\n issn = {2076-1465},\n month = {Sep.},\n url = {https://www.eurasip.org/proceedings/eusipco/eusipco2019/proceedings/papers/1570532622.pdf},\n}\n\n","author_short":["Fernandez-Lopez, A.","Sukno, F. M."],"key":"8902572","id":"8902572","bibbaseid":"fernandezlopez-sukno-lipreadingwithlimiteddatanetwork-2019","role":"author","urls":{"Paper":"https://www.eurasip.org/proceedings/eusipco/eusipco2019/proceedings/papers/1570532622.pdf"},"keyword":["feature extraction;image classification;image motion analysis;learning (artificial intelligence);DL systems;end-to-end ALR system;large-scale databases;OuluVS2 database;limited-data network;automatic lip-reading systems;deep learning;sentence-level classification;Visualization;Databases;Training;Feature extraction;Training data;Data models;Labeling;Lip-reading;Visual Speech;Deep Learning"],"metadata":{"authorlinks":{}},"downloads":0},"bibtype":"inproceedings","biburl":"https://raw.githubusercontent.com/Roznn/EUSIPCO/main/eusipco2019url.bib","creationDate":"2021-02-11T19:15:21.913Z","downloads":0,"keywords":["feature extraction;image classification;image motion analysis;learning (artificial intelligence);dl systems;end-to-end alr system;large-scale databases;ouluvs2 database;limited-data network;automatic lip-reading systems;deep learning;sentence-level classification;visualization;databases;training;feature extraction;training data;data models;labeling;lip-reading;visual speech;deep learning"],"search_terms":["lip","reading","limited","data","network","fernandez-lopez","sukno"],"title":"Lip-Reading with Limited-Data Network","year":2019,"dataSources":["NqWTiMfRR56v86wRs","r6oz3cMyC99QfiuHW"]}