Segmenting Time Series: A Survey and Novel Approach. Keogh, E. J., Chu, S., Hart, D., & Pazzani, M. Volume 57. Data Mining in Time Series Databases, pages 1--22. World Scientific Publishing, 2004.
abstract   bibtex   
In recent years, there has been an explosion of interest in mining time series databases. As with most computer science problems, representation of the data is the key to efficient and effective solutions. One of the most commonly used representations is piecewise linear approximation. This representation has been used by various researchers to support clustering, classification, indexing and association rule mining of time series data. A variety of algorithms have been proposed to obtain this representation, with several algorithms having been independently rediscovered several times. In this paper, we undertake the first extensive review and empirical comparison of all proposed techniques. We show that all these algorithms have fatal flaws from a data mining perspective. We introduce a novel algorithm that we empirically show to be superior to all others in the literature.
@InBook{Keogh2004,
  Title                    = {Data Mining in Time Series Databases},
  Author                   = {Keogh, E. J. and Chu, S. and Hart, D. and Pazzani, M.},
  Chapter                  = {Segmenting Time Series: A Survey and Novel Approach},
  Pages                    = {1--22},
  Publisher                = {World Scientific Publishing},
  Year                     = {2004},
  Volume                   = {57},

  Abstract                 = {In recent years, there has been an explosion of interest in mining time series databases. As with most 
computer science problems, representation of the data is the key to efficient and effective solutions. One of 
the most commonly used representations is piecewise linear approximation. This representation has been 
used by various researchers to support clustering, classification, indexing and association rule mining of 
time series data. A variety of algorithms have been proposed to obtain this representation, with several 
algorithms having been independently rediscovered several times. In this paper, we undertake the first 
extensive review and empirical comparison of all proposed techniques. We show that all these algorithms 
have fatal flaws from a data mining perspective. We introduce a novel algorithm that we empirically show 
to be superior to all others in the literature.},
  Journal                  = {Data mining in time series databases},
  Review                   = {Keogh \etal \cite{Keogh2004} looked at different ways of segmenting by employing piecewise linear representation, which consists of representing the observed data with \emph{K} straight lines. The observation data is examined by a windowing technique, such as a sliding window, if on-line constraints are required, or by top-down or bottom-up window partitioning. The windowed data is estimated by linear interpolation or regression, and a segment is declared when the line produced by the linear interpolation exceeds some threshold. Keogh reports that sliding window and top-down approaches tend to over-segment any data with noise, making it difficult to use. Sliding window and bottom-up approaches can also become computationally expensive, based on the window sizing selected. They propose a sliding window bottom-up approach, where a large sliding window is used, and bottom-up approach is used inside the sliding window. This allows the system to retain its on-line nature, and also be computationally light. The segmentation accuracy was given as the MSE between the interpolated lines and the actual data, as a percentage of the poorest algorithm tested, so no temporal accuracy is reported. This algorithm may be difficult to generalize into rehabilitation motions},
  Timestamp                = {2013.09.18}
}

Downloads: 0