A Probabilistic Approach to Fast Pattern Matching in Time Series Databases. Keogh, E. & Smyth, P.
abstract   bibtex   
The problem of efficiently and accurately locating patterns of interest in massive time series data sets is an important and non-trivial problem in a wide variety of applications, including diagnosis and monitoring of complex systems, biomedical data analysis, and exploratory data analysis in scientific and business time series. In this paper a probabilistic approach is taken to this problem. Using piecewise linear segmentations as the mlderlying rcpresentation, local features (such ,as peaks, troughs, and platcaus) are defined using prior distribution on expected deformations from a basic tcmplate. Global shape information is rcpresented using another prior on the relative locations of the individual features. An appropriately defined probabilistic model integrates the local and global information and directly leads to an overall distance measure between sequence patterns based on prior knowledge. A search algorithm using this distance measure is shownto efficiently and accurately find matches for a variety of patterns on a numberof data sets, including engineering sensor data from space Shuttle mission archives. The proposed approach provides a natural framework to support user-customizable "query by content" on time series data, taking prior domain information into account in a principled manner.
@article{keogh_probabilistic_nodate,
	title = {A {Probabilistic} {Approach} to {Fast} {Pattern} {Matching} in {Time} {Series} {Databases}},
	abstract = {The problem of efficiently and accurately locating patterns of interest in massive time series data sets is an important and non-trivial problem in a wide variety of applications, including diagnosis and monitoring of complex systems, biomedical data analysis, and exploratory data analysis in scientific and business time series. In this paper a probabilistic approach is taken to this problem. Using piecewise linear segmentations as the mlderlying rcpresentation, local features (such ,as peaks, troughs, and platcaus) are defined using prior distribution on expected deformations from a basic tcmplate. Global shape information is rcpresented using another prior on the relative locations of the individual features. An appropriately defined probabilistic model integrates the local and global information and directly leads to an overall distance measure between sequence patterns based on prior knowledge. A search algorithm using this distance measure is shownto efficiently and accurately find matches for a variety of patterns on a numberof data sets, including engineering sensor data from space Shuttle mission archives. The proposed approach provides a natural framework to support user-customizable "query by content" on time series data, taking prior domain information into account in a principled manner.},
	language = {en},
	author = {Keogh, Eamonn and Smyth, Padhraic},
	pages = {7}
}
Downloads: 0