Paper abstract bibtex

The exponential growth of devices and data at the edges of the Internet is rising scalability and privacy concerns on approaches based exclusively on remote cloud platforms. Data gravity, a fundamental concept in Fog Computing, points towards decentralisation of computation for data analysis, as a viable alternative to address those concerns. Decentralising AI tasks on several cooperative devices means identifying the optimal set of locations or Collection Points (CP for short) to use, in the continuum between full centralisation (i.e., all data on a single device) and full decentralisation (i.e., data on source locations). We propose an analytical framework able to find the optimal operating point in this continuum, linking the accuracy of the learning task with the corresponding \textbackslashemph\network\ and \textbackslashemph\computational\ cost for moving data and running the distributed training at the CPs. We show through simulations that the model accurately predicts the optimal trade-off, quite often an \textbackslashemph\intermediate\ point between full centralisation and full decentralisation, showing also a significant cost saving w.r.t. both of them. Finally, the analytical model admits closed-form or numeric solutions, making it not only a performance evaluation instrument but also a design tool to configure a given distributed learning task optimally before its deployment.

@article{valerio_optimising_2020, title = {Optimising cost vs accuracy of decentralised analytics in fog computing environments}, url = {http://arxiv.org/abs/2012.05266}, abstract = {The exponential growth of devices and data at the edges of the Internet is rising scalability and privacy concerns on approaches based exclusively on remote cloud platforms. Data gravity, a fundamental concept in Fog Computing, points towards decentralisation of computation for data analysis, as a viable alternative to address those concerns. Decentralising AI tasks on several cooperative devices means identifying the optimal set of locations or Collection Points (CP for short) to use, in the continuum between full centralisation (i.e., all data on a single device) and full decentralisation (i.e., data on source locations). We propose an analytical framework able to find the optimal operating point in this continuum, linking the accuracy of the learning task with the corresponding {\textbackslash}emph\{network\} and {\textbackslash}emph\{computational\} cost for moving data and running the distributed training at the CPs. We show through simulations that the model accurately predicts the optimal trade-off, quite often an {\textbackslash}emph\{intermediate\} point between full centralisation and full decentralisation, showing also a significant cost saving w.r.t. both of them. Finally, the analytical model admits closed-form or numeric solutions, making it not only a performance evaluation instrument but also a design tool to configure a given distributed learning task optimally before its deployment.}, urldate = {2020-12-15}, journal = {arXiv:2012.05266 [cs]}, author = {Valerio, Lorenzo and Passarella, Andrea and Conti, Marco}, month = dec, year = {2020}, note = {arXiv: 2012.05266}, keywords = {computer science, machine learning, mentions sympy, networks}, }

Downloads: 0