Event-Driven Video Frame Synthesis

Event-Driven Video Frame Synthesis. Wang, Z. W., Jiang, W., He, K., Shi, B., Katsaggelos, A., & Cossairt, O. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 4320–4329, oct, 2019. IEEE.

Paper doi abstract bibtex

Temporal Video Frame Synthesis (TVFS) aims at synthesizing novel frames at timestamps different from existing frames, which has wide applications in video codec, editing and analysis. In this paper, we propose a high frame-rate TVFS framework which takes hybrid input data from a low-speed frame-based sensor and a high-speed event-based sensor. Compared to frame-based sensors, event-based sensors report brightness changes at very high speed, which may well provide useful spatio-temoral information for high frame-rate TVFS. Therefore, we first introduce a differentiable fusion model to approximate the dual-modal physical sensing process, unifying a variety of TVFS scenarios, e.g., interpolation, prediction and motion deblur. Our differentiable model enables iterative optimization of the latent video tensor via autodifferentiation, which propagates the gradients of a loss function defined on the measured data. Our differentiable model-based reconstruction does not involve training, yet is parallelizable and can be implemented on machine learning platforms (such as TensorFlow). Second, we develop a deep learning strategy to enhance the results from the first step, which we refer as a residual 'denoising' process. Our trained 'denoiser' is beyond Gaussian denoising and shows properties such as contrast enhancement and motion awareness. We show that our framework is capable of handling challenging scenes including both fast motion and strong occlusions.

@inproceedings{Zihao,
abstract = {Temporal Video Frame Synthesis (TVFS) aims at synthesizing novel frames at timestamps different from existing frames, which has wide applications in video codec, editing and analysis. In this paper, we propose a high frame-rate TVFS framework which takes hybrid input data from a low-speed frame-based sensor and a high-speed event-based sensor. Compared to frame-based sensors, event-based sensors report brightness changes at very high speed, which may well provide useful spatio-temoral information for high frame-rate TVFS. Therefore, we first introduce a differentiable fusion model to approximate the dual-modal physical sensing process, unifying a variety of TVFS scenarios, e.g., interpolation, prediction and motion deblur. Our differentiable model enables iterative optimization of the latent video tensor via autodifferentiation, which propagates the gradients of a loss function defined on the measured data. Our differentiable model-based reconstruction does not involve training, yet is parallelizable and can be implemented on machine learning platforms (such as TensorFlow). Second, we develop a deep learning strategy to enhance the results from the first step, which we refer as a residual 'denoising' process. Our trained 'denoiser' is beyond Gaussian denoising and shows properties such as contrast enhancement and motion awareness. We show that our framework is capable of handling challenging scenes including both fast motion and strong occlusions.},
archivePrefix = {arXiv},
arxivId = {1902.09680},
author = {Wang, Zihao W. and Jiang, Weixin and He, Kuan and Shi, Boxin and Katsaggelos, Aggelos and Cossairt, Oliver},
booktitle = {2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)},
doi = {10.1109/ICCVW.2019.00532},
eprint = {1902.09680},
isbn = {978-1-7281-5023-9},
keywords = {Event based vision,Motion deblur,Multi modal sensor fusion,Video frame interpolation,Video frame prediction},
month = {oct},
pages = {4320--4329},
publisher = {IEEE},
title = {{Event-Driven Video Frame Synthesis}},
url = {https://ieeexplore.ieee.org/document/9022389/},
year = {2019}
}

Downloads: 0

{"_id":"dazTqSFMW843ksnH9","bibbaseid":"wang-jiang-he-shi-katsaggelos-cossairt-eventdrivenvideoframesynthesis-2019","author_short":["Wang, Z. W.","Jiang, W.","He, K.","Shi, B.","Katsaggelos, A.","Cossairt, O."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","abstract":"Temporal Video Frame Synthesis (TVFS) aims at synthesizing novel frames at timestamps different from existing frames, which has wide applications in video codec, editing and analysis. In this paper, we propose a high frame-rate TVFS framework which takes hybrid input data from a low-speed frame-based sensor and a high-speed event-based sensor. Compared to frame-based sensors, event-based sensors report brightness changes at very high speed, which may well provide useful spatio-temoral information for high frame-rate TVFS. Therefore, we first introduce a differentiable fusion model to approximate the dual-modal physical sensing process, unifying a variety of TVFS scenarios, e.g., interpolation, prediction and motion deblur. Our differentiable model enables iterative optimization of the latent video tensor via autodifferentiation, which propagates the gradients of a loss function defined on the measured data. Our differentiable model-based reconstruction does not involve training, yet is parallelizable and can be implemented on machine learning platforms (such as TensorFlow). Second, we develop a deep learning strategy to enhance the results from the first step, which we refer as a residual 'denoising' process. Our trained 'denoiser' is beyond Gaussian denoising and shows properties such as contrast enhancement and motion awareness. We show that our framework is capable of handling challenging scenes including both fast motion and strong occlusions.","archiveprefix":"arXiv","arxivid":"1902.09680","author":[{"propositions":[],"lastnames":["Wang"],"firstnames":["Zihao","W."],"suffixes":[]},{"propositions":[],"lastnames":["Jiang"],"firstnames":["Weixin"],"suffixes":[]},{"propositions":[],"lastnames":["He"],"firstnames":["Kuan"],"suffixes":[]},{"propositions":[],"lastnames":["Shi"],"firstnames":["Boxin"],"suffixes":[]},{"propositions":[],"lastnames":["Katsaggelos"],"firstnames":["Aggelos"],"suffixes":[]},{"propositions":[],"lastnames":["Cossairt"],"firstnames":["Oliver"],"suffixes":[]}],"booktitle":"2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)","doi":"10.1109/ICCVW.2019.00532","eprint":"1902.09680","isbn":"978-1-7281-5023-9","keywords":"Event based vision,Motion deblur,Multi modal sensor fusion,Video frame interpolation,Video frame prediction","month":"oct","pages":"4320–4329","publisher":"IEEE","title":"Event-Driven Video Frame Synthesis","url":"https://ieeexplore.ieee.org/document/9022389/","year":"2019","bibtex":"@inproceedings{Zihao,\nabstract = {Temporal Video Frame Synthesis (TVFS) aims at synthesizing novel frames at timestamps different from existing frames, which has wide applications in video codec, editing and analysis. In this paper, we propose a high frame-rate TVFS framework which takes hybrid input data from a low-speed frame-based sensor and a high-speed event-based sensor. Compared to frame-based sensors, event-based sensors report brightness changes at very high speed, which may well provide useful spatio-temoral information for high frame-rate TVFS. Therefore, we first introduce a differentiable fusion model to approximate the dual-modal physical sensing process, unifying a variety of TVFS scenarios, e.g., interpolation, prediction and motion deblur. Our differentiable model enables iterative optimization of the latent video tensor via autodifferentiation, which propagates the gradients of a loss function defined on the measured data. Our differentiable model-based reconstruction does not involve training, yet is parallelizable and can be implemented on machine learning platforms (such as TensorFlow). Second, we develop a deep learning strategy to enhance the results from the first step, which we refer as a residual 'denoising' process. Our trained 'denoiser' is beyond Gaussian denoising and shows properties such as contrast enhancement and motion awareness. We show that our framework is capable of handling challenging scenes including both fast motion and strong occlusions.},\narchivePrefix = {arXiv},\narxivId = {1902.09680},\nauthor = {Wang, Zihao W. and Jiang, Weixin and He, Kuan and Shi, Boxin and Katsaggelos, Aggelos and Cossairt, Oliver},\nbooktitle = {2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)},\ndoi = {10.1109/ICCVW.2019.00532},\neprint = {1902.09680},\nisbn = {978-1-7281-5023-9},\nkeywords = {Event based vision,Motion deblur,Multi modal sensor fusion,Video frame interpolation,Video frame prediction},\nmonth = {oct},\npages = {4320--4329},\npublisher = {IEEE},\ntitle = {{Event-Driven Video Frame Synthesis}},\nurl = {https://ieeexplore.ieee.org/document/9022389/},\nyear = {2019}\n}\n","author_short":["Wang, Z. W.","Jiang, W.","He, K.","Shi, B.","Katsaggelos, A.","Cossairt, O."],"key":"Zihao","id":"Zihao","bibbaseid":"wang-jiang-he-shi-katsaggelos-cossairt-eventdrivenvideoframesynthesis-2019","role":"author","urls":{"Paper":"https://ieeexplore.ieee.org/document/9022389/"},"keyword":["Event based vision","Motion deblur","Multi modal sensor fusion","Video frame interpolation","Video frame prediction"],"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://sites.northwestern.edu/ivpl/files/2023/06/IVPL_Updated_publications-1.bib","dataSources":["AKozWMfNreYTpJHSd","ya2CyA73rpZseyrZ8","KTWAakbPXLGfYseXn","ePKPjG8C6yvpk4mEK","E6Bth2QB5BYjBMZE7","nbnEjsN7MJhurAK9x","PNQZj6FjzoxxJk4Yi","7FpDWDGJ4KgpDiGfB","bod9ms4MQJHuJgPpp","QR9t5P2cLdJuzhfzK","D8k2SxfC5dKNRFgro","7Dwzbxq93HWrJEhT6","qhF8zxmGcJfvtdeAg","fvDEHD49E2ZRwE3fb","H7crv8NWhZup4d4by","DHqokWsryttGh7pJE","vRJd4wNg9HpoZSMHD","sYxQ6pxFgA59JRhxi","w2WahSbYrbcCKBDsC","XasdXLL99y5rygCmq","3gkSihZQRfAD2KBo3","t5XMbyZbtPBo4wBGS","bEpHM2CtrwW2qE8FP","teJzFLHexaz5AQW5z"],"keywords":["event based vision","motion deblur","multi modal sensor fusion","video frame interpolation","video frame prediction"],"search_terms":["event","driven","video","frame","synthesis","wang","jiang","he","shi","katsaggelos","cossairt"],"title":"Event-Driven Video Frame Synthesis","year":2019}