Physics is the New Data. Kalinin, S. V., Ziatdinov, M., Sumpter, B. G., & White, A. D. April, 2022. arXiv:2204.05095 [physics]
Physics is the New Data [link]Paper  doi  abstract   bibtex   
The rapid development of machine learning (ML) methods has fundamentally affected numerous applications ranging from computer vision, biology, and medicine to accounting and text analytics. Until now, it was the availability of large and often labeled data sets that enabled significant breakthroughs. However, the adoption of these methods in classical physical disciplines has been relatively slow, a tendency that can be traced to the intrinsic differences between correlative approaches of purely data-based ML and the causal hypothesis-driven nature of physical sciences. Furthermore, anomalous behaviors of classical ML necessitate addressing issues such as explainability and fairness of ML. We also note the sequence in which deep learning became mainstream in different scientific disciplines - starting from medicine and biology and then towards theoretical chemistry, and only after that, physics - is rooted in the progressively more complex level of descriptors, constraints, and causal structures available for incorporation in ML architectures. Here we put forth that over the next decade, physics will become a new data, and this will continue the transition from dot-coms and scientific computing concepts of the 90ies to big data of 2000-2010 to deep learning of 2010-2020 to physics-enabled scientific ML.
@misc{kalinin_physics_2022,
	title = {Physics is the {New} {Data}},
	url = {http://arxiv.org/abs/2204.05095},
	doi = {10.48550/arXiv.2204.05095},
	abstract = {The rapid development of machine learning (ML) methods has fundamentally affected numerous applications ranging from computer vision, biology, and medicine to accounting and text analytics. Until now, it was the availability of large and often labeled data sets that enabled significant breakthroughs. However, the adoption of these methods in classical physical disciplines has been relatively slow, a tendency that can be traced to the intrinsic differences between correlative approaches of purely data-based ML and the causal hypothesis-driven nature of physical sciences. Furthermore, anomalous behaviors of classical ML necessitate addressing issues such as explainability and fairness of ML. We also note the sequence in which deep learning became mainstream in different scientific disciplines - starting from medicine and biology and then towards theoretical chemistry, and only after that, physics - is rooted in the progressively more complex level of descriptors, constraints, and causal structures available for incorporation in ML architectures. Here we put forth that over the next decade, physics will become a new data, and this will continue the transition from dot-coms and scientific computing concepts of the 90ies to big data of 2000-2010 to deep learning of 2010-2020 to physics-enabled scientific ML.},
	urldate = {2023-03-02},
	publisher = {arXiv},
	author = {Kalinin, Sergei V. and Ziatdinov, Maxim and Sumpter, Bobby G. and White, Andrew D.},
	month = apr,
	year = {2022},
	note = {arXiv:2204.05095 [physics]},
	keywords = {0},
}

Downloads: 0