In pages 1–32. American Mathematical Society.

abstract bibtex

abstract bibtex

The coming century is surely the century of data. A combination of blind faith and serious purpose makes our society invest massively in the collection and processing of data of all kinds, on scales unimaginable until recently. Hyperspectral Imagery, Internet Portals, Financial tick-by-tick data, and DNA Microarrays are just a few of the betterknown sources, feeding data in torrential streams into scientific and business databases worldwide. In traditional statistical data analysis, we think of observations of instances of particular phenomena, these observations being a vector of values we measured on several variables (e.g. blood pressure, weight, height, ...). In traditional statistical methodology, we assumed many observations and a few, wellchosen variables. The trend today is towards more observations but even more so, to radically larger numbers of variables voracious, automatic, systematic collection of hyper-informative detail about each observed instance. We are seeing examples where the observations gathered on individual instances are curves, or spectra, or images, or even movies, so that a single observation has dimensions in the thousands or billions, while there are only tens or hundreds of instances available for study. Classical methods are simply not designed to cope with this kind of explosive growth of dimensionality of the observation vector. We can say with complete confidence that in the coming century, high-dimensional data analysis will be a very significant activity, and completely new methods of high-dimensional data analysis will be developed; we just don't know what they are yet. Mathematicians are ideally prepared for appreciating the abstract issues involved in finding patterns in such high-dimensional data. Two of the most influential principles in the coming century will be principles originally discovered and cultivated by mathematicians: the blessings of dimensionality and the curse of dimensionality. The curse of dimensionality is a phrase used by several subfields in the mathematical sciences; I use it here to refer to the apparent intractability of systematically searching through a high-dimensional space, the apparent intractability of accurately approximating a general high-dimensional function, the apparent intractability of integrating a high-dimensional function. The blessings of dimensionality are less widely noted, but they include the concentration of measure phenomenon (so-called in the geometry of Banach spaces), which means that certain random fluctuations are very well controlled in high dimensions and the success of asymptotic methods, used widely in mathematical statistics and statistical physics, which suggest that statements about very high-dimensional settings may be made where moderate dimensions would be too complicated. There is a large body of interesting work going on in the mathematical sciences, both to attack the curse of dimensionality in specific ways, and to extend the benefits of dimensionality. I will mention work in high-dimensional approximation theory, in probability theory, and in mathematical statistics. I expect to see in the coming decades many further mathematical elaborations to our inventory of Blessings and Curses, and I expect such contributions to have a broad impact on societys ability to extract meaning from the massive datasets it has decided to compile. In my talk, I will also draw on my personal research experiences which suggest to me (1) there are substantial chances that by interpreting ongoing development in high-dimensional data analysis, mathematicians can become aware of new problems in harmonic analysis; and (2) that many of the problems of data analysis even in fairly low dimensions are unsolved and are similar to problems in mathematics which have only recently been attacked, and for which only the merest beginnings have been made. Both fields can progress together.

@inproceedings{donohoHighdimensionalDataAnalysis2000, title = {High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality}, author = {Donoho, David}, date = {2000-01-01}, pages = {1--32}, publisher = {{American Mathematical Society}}, abstract = {The coming century is surely the century of data. A combination of blind faith and serious purpose makes our society invest massively in the collection and processing of data of all kinds, on scales unimaginable until recently. Hyperspectral Imagery, Internet Portals, Financial tick-by-tick data, and DNA Microarrays are just a few of the betterknown sources, feeding data in torrential streams into scientific and business databases worldwide. In traditional statistical data analysis, we think of observations of instances of particular phenomena, these observations being a vector of values we measured on several variables (e.g. blood pressure, weight, height, ...). In traditional statistical methodology, we assumed many observations and a few, wellchosen variables. The trend today is towards more observations but even more so, to radically larger numbers of variables voracious, automatic, systematic collection of hyper-informative detail about each observed instance. We are seeing examples where the observations gathered on individual instances are curves, or spectra, or images, or even movies, so that a single observation has dimensions in the thousands or billions, while there are only tens or hundreds of instances available for study. Classical methods are simply not designed to cope with this kind of explosive growth of dimensionality of the observation vector. We can say with complete confidence that in the coming century, high-dimensional data analysis will be a very significant activity, and completely new methods of high-dimensional data analysis will be developed; we just don't know what they are yet. Mathematicians are ideally prepared for appreciating the abstract issues involved in finding patterns in such high-dimensional data. Two of the most influential principles in the coming century will be principles originally discovered and cultivated by mathematicians: the blessings of dimensionality and the curse of dimensionality. The curse of dimensionality is a phrase used by several subfields in the mathematical sciences; I use it here to refer to the apparent intractability of systematically searching through a high-dimensional space, the apparent intractability of accurately approximating a general high-dimensional function, the apparent intractability of integrating a high-dimensional function. The blessings of dimensionality are less widely noted, but they include the concentration of measure phenomenon (so-called in the geometry of Banach spaces), which means that certain random fluctuations are very well controlled in high dimensions and the success of asymptotic methods, used widely in mathematical statistics and statistical physics, which suggest that statements about very high-dimensional settings may be made where moderate dimensions would be too complicated. There is a large body of interesting work going on in the mathematical sciences, both to attack the curse of dimensionality in specific ways, and to extend the benefits of dimensionality. I will mention work in high-dimensional approximation theory, in probability theory, and in mathematical statistics. I expect to see in the coming decades many further mathematical elaborations to our inventory of Blessings and Curses, and I expect such contributions to have a broad impact on societys ability to extract meaning from the massive datasets it has decided to compile. In my talk, I will also draw on my personal research experiences which suggest to me (1) there are substantial chances that by interpreting ongoing development in high-dimensional data analysis, mathematicians can become aware of new problems in harmonic analysis; and (2) that many of the problems of data analysis even in fairly low dimensions are unsolved and are similar to problems in mathematics which have only recently been attacked, and for which only the merest beginnings have been made. Both fields can progress together.}, eventtitle = {Mathematical {{Challenges}} of the 21st {{Century}}}, keywords = {~INRMM-MiD:z-QFLQ4B6U,blessing-of-dimensionality,computational-science,curse-of-dimensionality,data-transformation-modelling,dimensionality-reduction,high-dimensionality,machine-learning,statistics} }

Downloads: 0