Statistical and Machine Learning Forecasting Methods: Concerns and Ways Forward. Makridakis, S., Spiliotis, E., & Assimakopoulos, V. 13(3):e0194889.
Statistical and Machine Learning Forecasting Methods: Concerns and Ways Forward [link]Paper  doi  abstract   bibtex   
Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions. [Excerpt: Conclusions: The state of the art and ways forward] [...] A major innovation that has distinguished forecasting from other fields has been the good number of empirical studies aimed at both the academic community as well as the practitioners interested in utilizing the most accurate methods for their various applications and reducing cost or maximizing benefits by doing so. These studies contributed to establishing two major changes in the attitudes towards forecasting: [::] First, it was established that methods or models, that best fitted available data, did not necessarily result in more accurate post sample predictions (a common belief until then). [::] Second, the post-sample predictions of simple statistical methods were found to be at least as accurate as the sophisticated ones. [\n] This finding was furiously objected to by theoretical statisticians, who claimed that a simple method being a special case of e.g. ARIMA models, could not be more accurate than the ARIMA one, refusing to accept the empirical evidence proving the opposite. These two findings have fundamentally changed the field of forecasting and are also evident in this paper both in Fig 3, showing post-sample versus in-sample accuracy, as well as in Fig 2, displaying the accuracy level of various statistical and ML methods, with the latter being much more sophisticated and computationally demanding than the former. [\n] Knowing that a certain sophisticated method is not as accurate as a much simpler one is upsetting from a scientific point of view as the former requires a great deal of academic expertise and ample computer time to be applied. At the same time, understanding the reasons of their underperformance is the only way to improve them. [...] [\n] ML theorists working on forecasting applications need to do something to improve the accuracy of their methods. For instance, the only thing exponential smoothing methods do is smoothen the most recent errors exponentially and then extrapolate the latest pattern in order to forecast. Given their ability to learn, ML methods should do better than simple benchmarks, like exponential smoothing. Accepting the problem is the first step in devising workable solutions and we hope that those in the field of AI and ML will accept the empirical findings and work to improve the forecasting accuracy of their methods. [\n] A problem with the academic ML forecasting literature is that the majority of published studies provide forecasts and claim satisfactory accuracies without comparing them with simple statistical methods or even naive benchmarks. Doing so raises expectations that ML methods provide accurate predictions, but without any empirical proof that this is the case. In our view, this situation is the same with what was happening in statistical literature in the late 1970s and 1980s. At that time, it was thought that forecasting methods were of superior accuracy simply because of their sophistication and their mathematical elegance. Now it is obvious that their value must be empirically proven in an objective, indisputable manner through large scale competitions. Thus, when it comes to papers proposing new ML methods, or effective ways to use them, academic journals must demand comparisons with alternative methods or at least benchmarks and require that the data of the articles being published be made available for those who want to replicate the results. In our experience, this has not been the case at present, making replications practically impossible and allowing conclusions that may not hold. [\n] In addition to empirical testing, research work is needed to help users understand how the forecasts of ML methods are generated (this is the same problem with all AI models whose output cannot be explained). Obtaining numbers from a black box is not acceptable to practitioners who need to know how forecasts arise and how they can be influenced or adjusted to arrive at workable predictions. [\n] A final, equally important concern is that in addition to point forecasts, ML methods must also be capable of specifying the uncertainty around them, or alternatively providing confidence intervals. At present, the issue of uncertainty has not been included in the research agenda of the ML field, leaving a huge vacuum that must be filled as estimating the uncertainty in future predictions is as important as the forecasts themselves. To overcome this issue, many researchers propose simulating the intervals by iteratively generating multiple future sample paths. Yet, even in that case, the forecast distribution of the methods is empirically and not analytically derived, raising many doubts about its quality. [\n] To summarize, according the results of this study, ML methods need to become more accurate, requiring less computer time, and be less of a black box. A major contribution of this paper is in showing that traditional statistical methods are more accurate than ML ones and pointing out the need to discover the reasons involved, as well as devising ways to reverse the situation. However, in the comparisons of the statistical and ML methods reported in this paper, it must be made clear that the results may be related to the specific data set being used. [...] [\n] At this point, the following suggestions/speculations, that must be empirically verified, can be made about the way forward regarding the ML methods, while these can be enriched by future research topics proposed in relative surveys [80]: [::] Obtain more information about the unknown future values of the data rather than their past ones and base the optimization/learning on such future values as much as possible. [::] Deseasonalize the data before using ML methods. This will result to a simpler one, reducing the computational time required to arrive at optimal weights and, therefore, learn faster. [::] Use a sliding simulation approach to gain as much information as possible about future values and the resulting uncertainty and learn more effectively how to minimize them. [::] Cluster the series into various homogeneous categories and/or types of data and developing ML methods that optimally extrapolate them. [::] Avoid over-fitting as it is not clear if ML models can correctly distinguish the noise from the pattern of the data. [::] Automate preprocessing and avoid the extra decisions required from the part of the user. [::] Allow the estimation of uncertainty for the point forecasts and provide information for the construction of confidence intervals around such forecasts. [\n] Although the conclusion of our paper that the forecasting accuracy of ML models is lower to that of statistical methods may seem disappointing, we are extremely positive about the great potential of ML ones for forecasting applications. Clearly, more work is needed to improve such methods but the same has been the case with all new techniques, including the complex forecasting methods that have improved their accuracy considerably over time. [...]
@article{makridakisStatisticalMachineLearning2018,
  title = {Statistical and Machine Learning Forecasting Methods: Concerns and Ways Forward},
  shorttitle = {Statistical and {{Machine Learning}} Forecasting Methods},
  author = {Makridakis, Spyros and Spiliotis, Evangelos and Assimakopoulos, Vassilios},
  date = {2018-03-27},
  journaltitle = {PLOS ONE},
  shortjournal = {PLOS ONE},
  volume = {13},
  pages = {e0194889},
  issn = {1932-6203},
  doi = {10.1371/journal.pone.0194889},
  url = {https://doi.org/10.1371/journal.pone.0194889},
  urldate = {2020-01-09},
  abstract = {Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions.

[Excerpt: Conclusions: The state of the art and ways forward] [...] A major innovation that has distinguished forecasting from other fields has been the good number of empirical studies aimed at both the academic community as well as the practitioners interested in utilizing the most accurate methods for their various applications and reducing cost or maximizing benefits by doing so. These studies contributed to establishing two major changes in the attitudes towards forecasting: 
[::] First, it was established that methods or models, that best fitted available data, did not necessarily result in more accurate post sample predictions (a common belief until then). 
[::] Second, the post-sample predictions of simple statistical methods were found to be at least as accurate as the sophisticated ones. 
[\textbackslash n] This finding was furiously objected to by theoretical statisticians, who claimed that a simple method being a special case of e.g. ARIMA models, could not be more accurate than the ARIMA one, refusing to accept the empirical evidence proving the opposite. These two findings have fundamentally changed the field of forecasting and are also evident in this paper both in Fig 3, showing post-sample versus in-sample accuracy, as well as in Fig 2, displaying the accuracy level of various statistical and ML methods, with the latter being much more sophisticated and computationally demanding than the former.

[\textbackslash n] Knowing that a certain sophisticated method is not as accurate as a much simpler one is upsetting from a scientific point of view as the former requires a great deal of academic expertise and ample computer time to be applied. At the same time, understanding the reasons of their underperformance is the only way to improve them. [...]

[\textbackslash n] ML theorists working on forecasting applications need to do something to improve the accuracy of their methods. For instance, the only thing exponential smoothing methods do is smoothen the most recent errors exponentially and then extrapolate the latest pattern in order to forecast. Given their ability to learn, ML methods should do better than simple benchmarks, like exponential smoothing. Accepting the problem is the first step in devising workable solutions and we hope that those in the field of AI and ML will accept the empirical findings and work to improve the forecasting accuracy of their methods.

[\textbackslash n] A problem with the academic ML forecasting literature is that the majority of published studies provide forecasts and claim satisfactory accuracies without comparing them with simple statistical methods or even naive benchmarks. Doing so raises expectations that ML methods provide accurate predictions, but without any empirical proof that this is the case. In our view, this situation is the same with what was happening in statistical literature in the late 1970s and 1980s. At that time, it was thought that forecasting methods were of superior accuracy simply because of their sophistication and their mathematical elegance. Now it is obvious that their value must be empirically proven in an objective, indisputable manner through large scale competitions. Thus, when it comes to papers proposing new ML methods, or effective ways to use them, academic journals must demand comparisons with alternative methods or at least benchmarks and require that the data of the articles being published be made available for those who want to replicate the results. In our experience, this has not been the case at present, making replications practically impossible and allowing conclusions that may not hold.

[\textbackslash n] In addition to empirical testing, research work is needed to help users understand how the forecasts of ML methods are generated (this is the same problem with all AI models whose output cannot be explained). Obtaining numbers from a black box is not acceptable to practitioners who need to know how forecasts arise and how they can be influenced or adjusted to arrive at workable predictions.

[\textbackslash n] A final, equally important concern is that in addition to point forecasts, ML methods must also be capable of specifying the uncertainty around them, or alternatively providing confidence intervals. At present, the issue of uncertainty has not been included in the research agenda of the ML field, leaving a huge vacuum that must be filled as estimating the uncertainty in future predictions is as important as the forecasts themselves. To overcome this issue, many researchers propose simulating the intervals by iteratively generating multiple future sample paths. Yet, even in that case, the forecast distribution of the methods is empirically and not analytically derived, raising many doubts about its quality.

[\textbackslash n] To summarize, according the results of this study, ML methods need to become more accurate, requiring less computer time, and be less of a black box. A major contribution of this paper is in showing that traditional statistical methods are more accurate than ML ones and pointing out the need to discover the reasons involved, as well as devising ways to reverse the situation. However, in the comparisons of the statistical and ML methods reported in this paper, it must be made clear that the results may be related to the specific data set being used. [...]

[\textbackslash n] At this point, the following suggestions/speculations, that must be empirically verified, can be made about the way forward regarding the ML methods, while these can be enriched by future research topics proposed in relative surveys [80]:

[::] Obtain more information about the unknown future values of the data rather than their past ones and base the optimization/learning on such future values as much as possible.
[::] Deseasonalize the data before using ML methods. This will result to a simpler one, reducing the computational time required to arrive at optimal weights and, therefore, learn faster.
[::] Use a sliding simulation approach to gain as much information as possible about future values and the resulting uncertainty and learn more effectively how to minimize them.
[::] Cluster the series into various homogeneous categories and/or types of data and developing ML methods that optimally extrapolate them.
[::] Avoid over-fitting as it is not clear if ML models can correctly distinguish the noise from the pattern of the data.
[::] Automate preprocessing and avoid the extra decisions required from the part of the user.
[::] Allow the estimation of uncertainty for the point forecasts and provide information for the construction of confidence intervals around such forecasts.

[\textbackslash n] Although the conclusion of our paper that the forecasting accuracy of ML models is lower to that of statistical methods may seem disappointing, we are extremely positive about the great potential of ML ones for forecasting applications. Clearly, more work is needed to improve such methods but the same has been the case with all new techniques, including the complex forecasting methods that have improved their accuracy considerably over time. [...]},
  keywords = {~INRMM-MiD:z-9X3LLKRY,bias-toward-primacy-of-theory-over-reality,cognitive-biases,epistemology,local-over-complication,machine-learning,model-assessment,model-comparison,modelling-uncertainty,statistics,uncertainty,unexpected-effect},
  langid = {english},
  number = {3}
}

Downloads: 0