Simulation-Based Inference via Regression Projection and Batched Discrepancies. Farahi, A., Rose, J., & Torrey, P. February, 2026. arXiv:2602.03613 [stat]
Simulation-Based Inference via Regression Projection and Batched Discrepancies [link]Paper  doi  abstract   bibtex   
We analyze a lightweight simulation-based inference method that infers simulator parameters using only a regression-based projection of the observed data. After fitting a surrogate linear regression once, the procedure simulates small batches at proposed parameter values and assigns kernel weights based on the resulting batch residual discrepancy, producing a self-normalized pseudo-posterior that is simple, parallelizable, and requires access only to the fitted regression coefficients rather than raw observations. We formalize the construction as an importance-sampling approximation to a population target that averages over simulator randomness, prove consistency as the number of parameter draws grows, and establish stability to estimating the surrogate regression from finite samples. We then characterize asymptotic concentration as batch size increases and bandwidth shrinks, showing that the pseudo-posterior concentrates on an identified set determined by the chosen projection, thereby clarifying when the method yields point versus set identification. Experiments in a tractable nonlinear model and a cosmological calibration task using the DREAMS simulation suite illustrate the computational advantages of regression-based projections and the identifiability limitations that arise from low-information summaries.
@misc{farahi_simulation-based_2026,
	title = {Simulation-{Based} {Inference} via {Regression} {Projection} and {Batched} {Discrepancies}},
	url = {http://arxiv.org/abs/2602.03613},
	doi = {10.48550/arXiv.2602.03613},
	abstract = {We analyze a lightweight simulation-based inference method that infers simulator parameters using only a regression-based projection of the observed data. After fitting a surrogate linear regression once, the procedure simulates small batches at proposed parameter values and assigns kernel weights based on the resulting batch residual discrepancy, producing a self-normalized pseudo-posterior that is simple, parallelizable, and requires access only to the fitted regression coefficients rather than raw observations. We formalize the construction as an importance-sampling approximation to a population target that averages over simulator randomness, prove consistency as the number of parameter draws grows, and establish stability to estimating the surrogate regression from finite samples. We then characterize asymptotic concentration as batch size increases and bandwidth shrinks, showing that the pseudo-posterior concentrates on an identified set determined by the chosen projection, thereby clarifying when the method yields point versus set identification. Experiments in a tractable nonlinear model and a cosmological calibration task using the DREAMS simulation suite illustrate the computational advantages of regression-based projections and the identifiability limitations that arise from low-information summaries.},
	language = {en},
	urldate = {2026-02-17},
	publisher = {arXiv},
	author = {Farahi, Arya and Rose, Jonah and Torrey, Paul},
	month = feb,
	year = {2026},
	note = {arXiv:2602.03613 [stat]},
	keywords = {Computer Science - Machine Learning, Statistics - Machine Learning, Statistics - Methodology},
}

Downloads: 0