Data Mining in Time Series Databases

Segmenting Time Series: A Survey and Novel Approach. Keogh, E. J., Chu, S., Hart, D., & Pazzani, M. Volume 57. Data Mining in Time Series Databases, pages 1--22. World Scientific Publishing, 2004.
abstract bibtex

In recent years, there has been an explosion of interest in mining time series databases. As with most computer science problems, representation of the data is the key to efficient and effective solutions. One of the most commonly used representations is piecewise linear approximation. This representation has been used by various researchers to support clustering, classification, indexing and association rule mining of time series data. A variety of algorithms have been proposed to obtain this representation, with several algorithms having been independently rediscovered several times. In this paper, we undertake the first extensive review and empirical comparison of all proposed techniques. We show that all these algorithms have fatal flaws from a data mining perspective. We introduce a novel algorithm that we empirically show to be superior to all others in the literature.

@InBook{Keogh2004,
Title = {Data Mining in Time Series Databases},
Author = {Keogh, E. J. and Chu, S. and Hart, D. and Pazzani, M.},
Chapter = {Segmenting Time Series: A Survey and Novel Approach},
Pages = {1--22},
Publisher = {World Scientific Publishing},
Year = {2004},
Volume = {57},

Abstract = {In recent years, there has been an explosion of interest in mining time series databases. As with most
computer science problems, representation of the data is the key to efficient and effective solutions. One of
the most commonly used representations is piecewise linear approximation. This representation has been
used by various researchers to support clustering, classification, indexing and association rule mining of
time series data. A variety of algorithms have been proposed to obtain this representation, with several
algorithms having been independently rediscovered several times. In this paper, we undertake the first
extensive review and empirical comparison of all proposed techniques. We show that all these algorithms
have fatal flaws from a data mining perspective. We introduce a novel algorithm that we empirically show
to be superior to all others in the literature.},
Journal = {Data mining in time series databases},
Review = {Keogh \etal \cite{Keogh2004} looked at different ways of segmenting by employing piecewise linear representation, which consists of representing the observed data with \emph{K} straight lines. The observation data is examined by a windowing technique, such as a sliding window, if on-line constraints are required, or by top-down or bottom-up window partitioning. The windowed data is estimated by linear interpolation or regression, and a segment is declared when the line produced by the linear interpolation exceeds some threshold. Keogh reports that sliding window and top-down approaches tend to over-segment any data with noise, making it difficult to use. Sliding window and bottom-up approaches can also become computationally expensive, based on the window sizing selected. They propose a sliding window bottom-up approach, where a large sliding window is used, and bottom-up approach is used inside the sliding window. This allows the system to retain its on-line nature, and also be computationally light. The segmentation accuracy was given as the MSE between the interpolated lines and the actual data, as a percentage of the poorest algorithm tested, so no temporal accuracy is reported. This algorithm may be difficult to generalize into rehabilitation motions},
Timestamp = {2013.09.18}
}

Downloads: 0

{"_id":"eGrQpqQsik2pj4QeE","bibbaseid":"keogh-chu-hart-pazzani-dataminingintimeseriesdatabases-2004","downloads":0,"creationDate":"2017-09-14T16:34:36.620Z","title":"Data Mining in Time Series Databases","author_short":["Keogh, E. J.","Chu, S.","Hart, D.","Pazzani, M."],"year":2004,"bibtype":"inbook","biburl":"https://raw.githubusercontent.com/jfslin/jfslin.github.io/master/jf2lin.bib","bibdata":{"bibtype":"inbook","type":"inbook","title":"Data Mining in Time Series Databases","author":[{"propositions":[],"lastnames":["Keogh"],"firstnames":["E.","J."],"suffixes":[]},{"propositions":[],"lastnames":["Chu"],"firstnames":["S."],"suffixes":[]},{"propositions":[],"lastnames":["Hart"],"firstnames":["D."],"suffixes":[]},{"propositions":[],"lastnames":["Pazzani"],"firstnames":["M."],"suffixes":[]}],"chapter":"Segmenting Time Series: A Survey and Novel Approach","pages":"1--22","publisher":"World Scientific Publishing","year":"2004","volume":"57","abstract":"In recent years, there has been an explosion of interest in mining time series databases. As with most computer science problems, representation of the data is the key to efficient and effective solutions. One of the most commonly used representations is piecewise linear approximation. This representation has been used by various researchers to support clustering, classification, indexing and association rule mining of time series data. A variety of algorithms have been proposed to obtain this representation, with several algorithms having been independently rediscovered several times. In this paper, we undertake the first extensive review and empirical comparison of all proposed techniques. We show that all these algorithms have fatal flaws from a data mining perspective. We introduce a novel algorithm that we empirically show to be superior to all others in the literature.","journal":"Data mining in time series databases","review":"Keogh \\etal i̧teKeogh2004 looked at different ways of segmenting by employing piecewise linear representation, which consists of representing the observed data with \\emphK straight lines. The observation data is examined by a windowing technique, such as a sliding window, if on-line constraints are required, or by top-down or bottom-up window partitioning. The windowed data is estimated by linear interpolation or regression, and a segment is declared when the line produced by the linear interpolation exceeds some threshold. Keogh reports that sliding window and top-down approaches tend to over-segment any data with noise, making it difficult to use. Sliding window and bottom-up approaches can also become computationally expensive, based on the window sizing selected. They propose a sliding window bottom-up approach, where a large sliding window is used, and bottom-up approach is used inside the sliding window. This allows the system to retain its on-line nature, and also be computationally light. The segmentation accuracy was given as the MSE between the interpolated lines and the actual data, as a percentage of the poorest algorithm tested, so no temporal accuracy is reported. This algorithm may be difficult to generalize into rehabilitation motions","timestamp":"2013.09.18","bibtex":"@InBook{Keogh2004,\n Title = {Data Mining in Time Series Databases},\n Author = {Keogh, E. J. and Chu, S. and Hart, D. and Pazzani, M.},\n Chapter = {Segmenting Time Series: A Survey and Novel Approach},\n Pages = {1--22},\n Publisher = {World Scientific Publishing},\n Year = {2004},\n Volume = {57},\n\n Abstract = {In recent years, there has been an explosion of interest in mining time series databases. As with most \ncomputer science problems, representation of the data is the key to efficient and effective solutions. One of \nthe most commonly used representations is piecewise linear approximation. This representation has been \nused by various researchers to support clustering, classification, indexing and association rule mining of \ntime series data. A variety of algorithms have been proposed to obtain this representation, with several \nalgorithms having been independently rediscovered several times. In this paper, we undertake the first \nextensive review and empirical comparison of all proposed techniques. We show that all these algorithms \nhave fatal flaws from a data mining perspective. We introduce a novel algorithm that we empirically show \nto be superior to all others in the literature.},\n Journal = {Data mining in time series databases},\n Review = {Keogh \\etal \\cite{Keogh2004} looked at different ways of segmenting by employing piecewise linear representation, which consists of representing the observed data with \\emph{K} straight lines. The observation data is examined by a windowing technique, such as a sliding window, if on-line constraints are required, or by top-down or bottom-up window partitioning. The windowed data is estimated by linear interpolation or regression, and a segment is declared when the line produced by the linear interpolation exceeds some threshold. Keogh reports that sliding window and top-down approaches tend to over-segment any data with noise, making it difficult to use. Sliding window and bottom-up approaches can also become computationally expensive, based on the window sizing selected. They propose a sliding window bottom-up approach, where a large sliding window is used, and bottom-up approach is used inside the sliding window. This allows the system to retain its on-line nature, and also be computationally light. The segmentation accuracy was given as the MSE between the interpolated lines and the actual data, as a percentage of the poorest algorithm tested, so no temporal accuracy is reported. This algorithm may be difficult to generalize into rehabilitation motions},\n Timestamp = {2013.09.18}\n}\n\n","author_short":["Keogh, E. J.","Chu, S.","Hart, D.","Pazzani, M."],"key":"Keogh2004","id":"Keogh2004","bibbaseid":"keogh-chu-hart-pazzani-dataminingintimeseriesdatabases-2004","role":"author","urls":{},"downloads":0},"search_terms":["data","mining","time","series","databases","keogh","chu","hart","pazzani"],"keywords":[],"authorIDs":[],"dataSources":["iCsmKnycRmHPxmhBd"]}