A First Look at Creating Mock Catalogs with Machine Learning Techniques

A First Look at Creating Mock Catalogs with Machine Learning Techniques. Xu, X., Ho, S., Trac, H., Schneider, J., Poczos, B., & Ntampaka, M. The Astrophysical Journal, 772(2):147, 2013.

Paper doi abstract bibtex

We investigate machine learning (ML) techniques for predicting the number of galaxies ( N gal ) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N gal . In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test two algorithms: support vector machines (SVM) and k -nearest-neighbor (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N gal by training our algorithms on the following six halo properties: number of particles, M 200 , σ v , v max , half-mass radius, and spin. For Millennium, our predicted N gal values have a mean-squared error (MSE) of 0.16 for both SVM and kNN. Our predictions match the overall distribution of halos reasonably well and the galaxy correlation function at large scales to 5%-10%. In addition, we demonstrate a feature selection algorithm to isolate the halo parameters that are most predictive, a useful technique for understanding the mapping between halo properties and N gal . Lastly, we investigate these ML-based approaches in making mock catalogs for different galaxy subpopulations (e.g., blue, red, high M star , low M star ). Given its non-parametric nature as well as its powerful predictive and feature selection capabilities, ML offers an interesting alternative for creating mock catalogs.

@article{xu_first_2013,
	title = {A {First} {Look} at {Creating} {Mock} {Catalogs} with {Machine} {Learning} {Techniques}},
	volume = {772},
	issn = {0004-637X},
	url = {http://stacks.iop.org/0004-637X/772/i=2/a=147},
	doi = {10.1088/0004-637X/772/2/147},
	abstract = {We investigate machine learning (ML) techniques for predicting the number of galaxies ( N gal ) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N gal . In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test two algorithms: support vector machines (SVM) and k -nearest-neighbor (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N gal by training our algorithms on the following six halo properties: number of particles, M 200 , σ v , v max , half-mass radius, and spin. For Millennium, our predicted N gal values have a mean-squared error (MSE) of 0.16 for both SVM and kNN. Our predictions match the overall distribution of halos reasonably well and the galaxy correlation function at large scales to 5\%-10\%. In addition, we demonstrate a feature selection algorithm to isolate the halo parameters that are most predictive, a useful technique for understanding the mapping between halo properties and N gal . Lastly, we investigate these ML-based approaches in making mock catalogs for different galaxy subpopulations (e.g., blue, red, high M star , low M star ). Given its non-parametric nature as well as its powerful predictive and feature selection capabilities, ML offers an interesting alternative for creating mock catalogs.},
	language = {en},
	number = {2},
	urldate = {2016-08-24},
	journal = {The Astrophysical Journal},
	author = {Xu, Xiaoying and Ho, Shirley and Trac, Hy and Schneider, Jeff and Poczos, Barnabas and Ntampaka, Michelle},
	year = {2013},
	pages = {147},
}

Downloads: 0

{"_id":"DetohSpa4nx5amoho","bibbaseid":"xu-ho-trac-schneider-poczos-ntampaka-afirstlookatcreatingmockcatalogswithmachinelearningtechniques-2013","author_short":["Xu, X.","Ho, S.","Trac, H.","Schneider, J.","Poczos, B.","Ntampaka, M."],"bibdata":{"bibtype":"article","type":"article","title":"A First Look at Creating Mock Catalogs with Machine Learning Techniques","volume":"772","issn":"0004-637X","url":"http://stacks.iop.org/0004-637X/772/i=2/a=147","doi":"10.1088/0004-637X/772/2/147","abstract":"We investigate machine learning (ML) techniques for predicting the number of galaxies ( N gal ) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N gal . In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test two algorithms: support vector machines (SVM) and k -nearest-neighbor (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N gal by training our algorithms on the following six halo properties: number of particles, M 200 , σ v , v max , half-mass radius, and spin. For Millennium, our predicted N gal values have a mean-squared error (MSE) of 0.16 for both SVM and kNN. Our predictions match the overall distribution of halos reasonably well and the galaxy correlation function at large scales to 5%-10%. In addition, we demonstrate a feature selection algorithm to isolate the halo parameters that are most predictive, a useful technique for understanding the mapping between halo properties and N gal . Lastly, we investigate these ML-based approaches in making mock catalogs for different galaxy subpopulations (e.g., blue, red, high M star , low M star ). Given its non-parametric nature as well as its powerful predictive and feature selection capabilities, ML offers an interesting alternative for creating mock catalogs.","language":"en","number":"2","urldate":"2016-08-24","journal":"The Astrophysical Journal","author":[{"propositions":[],"lastnames":["Xu"],"firstnames":["Xiaoying"],"suffixes":[]},{"propositions":[],"lastnames":["Ho"],"firstnames":["Shirley"],"suffixes":[]},{"propositions":[],"lastnames":["Trac"],"firstnames":["Hy"],"suffixes":[]},{"propositions":[],"lastnames":["Schneider"],"firstnames":["Jeff"],"suffixes":[]},{"propositions":[],"lastnames":["Poczos"],"firstnames":["Barnabas"],"suffixes":[]},{"propositions":[],"lastnames":["Ntampaka"],"firstnames":["Michelle"],"suffixes":[]}],"year":"2013","pages":"147","bibtex":"@article{xu_first_2013,\n\ttitle = {A {First} {Look} at {Creating} {Mock} {Catalogs} with {Machine} {Learning} {Techniques}},\n\tvolume = {772},\n\tissn = {0004-637X},\n\turl = {http://stacks.iop.org/0004-637X/772/i=2/a=147},\n\tdoi = {10.1088/0004-637X/772/2/147},\n\tabstract = {We investigate machine learning (ML) techniques for predicting the number of galaxies ( N gal ) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N gal . In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test two algorithms: support vector machines (SVM) and k -nearest-neighbor (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N gal by training our algorithms on the following six halo properties: number of particles, M 200 , σ v , v max , half-mass radius, and spin. For Millennium, our predicted N gal values have a mean-squared error (MSE) of 0.16 for both SVM and kNN. Our predictions match the overall distribution of halos reasonably well and the galaxy correlation function at large scales to 5\\%-10\\%. In addition, we demonstrate a feature selection algorithm to isolate the halo parameters that are most predictive, a useful technique for understanding the mapping between halo properties and N gal . Lastly, we investigate these ML-based approaches in making mock catalogs for different galaxy subpopulations (e.g., blue, red, high M star , low M star ). Given its non-parametric nature as well as its powerful predictive and feature selection capabilities, ML offers an interesting alternative for creating mock catalogs.},\n\tlanguage = {en},\n\tnumber = {2},\n\turldate = {2016-08-24},\n\tjournal = {The Astrophysical Journal},\n\tauthor = {Xu, Xiaoying and Ho, Shirley and Trac, Hy and Schneider, Jeff and Poczos, Barnabas and Ntampaka, Michelle},\n\tyear = {2013},\n\tpages = {147},\n}\n\n","author_short":["Xu, X.","Ho, S.","Trac, H.","Schneider, J.","Poczos, B.","Ntampaka, M."],"key":"xu_first_2013","id":"xu_first_2013","bibbaseid":"xu-ho-trac-schneider-poczos-ntampaka-afirstlookatcreatingmockcatalogswithmachinelearningtechniques-2013","role":"author","urls":{"Paper":"http://stacks.iop.org/0004-637X/772/i=2/a=147"},"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://bibbase.org/zotero/polyphant","dataSources":["7gvjSdWrEu7z5vjjj"],"keywords":[],"search_terms":["first","look","creating","mock","catalogs","machine","learning","techniques","xu","ho","trac","schneider","poczos","ntampaka"],"title":"A First Look at Creating Mock Catalogs with Machine Learning Techniques","year":2013}