Name-ethnicity classification from open sources

Name-ethnicity classification from open sources. Ambekar, A., Ward, C., Mohammed, J., Male, S., & Skiena, S. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, of KDD '09, pages 49–58, New York, NY, USA, June, 2009. Association for Computing Machinery.

Paper doi abstract bibtex

The problem of ethnicity identification from names has a variety of important applications, including biomedical research, demographic studies, and marketing. Here we report on the development of an ethnicity classifier where all training data is extracted from public, non-confidential (and hence somewhat unreliable) sources. Our classifier uses hidden Markov models (HMMs) and decision trees to classify names into 13 cultural/ethnic groups with individual group accuracy comparable accuracy to earlier binary (e.g., Spanish/non-Spanish) classifiers. We have applied this classifier to over 20 million names from a large-scale news corpus, identifying interesting temporal and spatial trends on the representation of particular cultural/ethnic groups.

@inproceedings{ambekar_name-ethnicity_2009,
	address = {New York, NY, USA},
	series = {{KDD} '09},
	title = {Name-ethnicity classification from open sources},
	isbn = {978-1-60558-495-9},
	url = {https://doi.org/10.1145/1557019.1557032},
	doi = {10.1145/1557019.1557032},
	abstract = {The problem of ethnicity identification from names has a variety of important applications, including biomedical research, demographic studies, and marketing. Here we report on the development of an ethnicity classifier where all training data is extracted from public, non-confidential (and hence somewhat unreliable) sources. Our classifier uses hidden Markov models (HMMs) and decision trees to classify names into 13 cultural/ethnic groups with individual group accuracy comparable accuracy to earlier binary (e.g., Spanish/non-Spanish) classifiers. We have applied this classifier to over 20 million names from a large-scale news corpus, identifying interesting temporal and spatial trends on the representation of particular cultural/ethnic groups.},
	urldate = {2021-03-22},
	booktitle = {Proceedings of the 15th {ACM} {SIGKDD} international conference on {Knowledge} discovery and data mining},
	publisher = {Association for Computing Machinery},
	author = {Ambekar, Anurag and Ward, Charles and Mohammed, Jahangir and Male, Swapna and Skiena, Steven},
	month = jun,
	year = {2009},
	keywords = {ethnicity detection, name classification, news analysis, social science research},
	pages = {49--58},
}

Downloads: 0

{"_id":"Mkxv7rzyJCswpbpSH","bibbaseid":"ambekar-ward-mohammed-male-skiena-nameethnicityclassificationfromopensources-2009","author_short":["Ambekar, A.","Ward, C.","Mohammed, J.","Male, S.","Skiena, S."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","address":"New York, NY, USA","series":"KDD '09","title":"Name-ethnicity classification from open sources","isbn":"978-1-60558-495-9","url":"https://doi.org/10.1145/1557019.1557032","doi":"10.1145/1557019.1557032","abstract":"The problem of ethnicity identification from names has a variety of important applications, including biomedical research, demographic studies, and marketing. Here we report on the development of an ethnicity classifier where all training data is extracted from public, non-confidential (and hence somewhat unreliable) sources. Our classifier uses hidden Markov models (HMMs) and decision trees to classify names into 13 cultural/ethnic groups with individual group accuracy comparable accuracy to earlier binary (e.g., Spanish/non-Spanish) classifiers. We have applied this classifier to over 20 million names from a large-scale news corpus, identifying interesting temporal and spatial trends on the representation of particular cultural/ethnic groups.","urldate":"2021-03-22","booktitle":"Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining","publisher":"Association for Computing Machinery","author":[{"propositions":[],"lastnames":["Ambekar"],"firstnames":["Anurag"],"suffixes":[]},{"propositions":[],"lastnames":["Ward"],"firstnames":["Charles"],"suffixes":[]},{"propositions":[],"lastnames":["Mohammed"],"firstnames":["Jahangir"],"suffixes":[]},{"propositions":[],"lastnames":["Male"],"firstnames":["Swapna"],"suffixes":[]},{"propositions":[],"lastnames":["Skiena"],"firstnames":["Steven"],"suffixes":[]}],"month":"June","year":"2009","keywords":"ethnicity detection, name classification, news analysis, social science research","pages":"49–58","bibtex":"@inproceedings{ambekar_name-ethnicity_2009,\n\taddress = {New York, NY, USA},\n\tseries = {{KDD} '09},\n\ttitle = {Name-ethnicity classification from open sources},\n\tisbn = {978-1-60558-495-9},\n\turl = {https://doi.org/10.1145/1557019.1557032},\n\tdoi = {10.1145/1557019.1557032},\n\tabstract = {The problem of ethnicity identification from names has a variety of important applications, including biomedical research, demographic studies, and marketing. Here we report on the development of an ethnicity classifier where all training data is extracted from public, non-confidential (and hence somewhat unreliable) sources. Our classifier uses hidden Markov models (HMMs) and decision trees to classify names into 13 cultural/ethnic groups with individual group accuracy comparable accuracy to earlier binary (e.g., Spanish/non-Spanish) classifiers. We have applied this classifier to over 20 million names from a large-scale news corpus, identifying interesting temporal and spatial trends on the representation of particular cultural/ethnic groups.},\n\turldate = {2021-03-22},\n\tbooktitle = {Proceedings of the 15th {ACM} {SIGKDD} international conference on {Knowledge} discovery and data mining},\n\tpublisher = {Association for Computing Machinery},\n\tauthor = {Ambekar, Anurag and Ward, Charles and Mohammed, Jahangir and Male, Swapna and Skiena, Steven},\n\tmonth = jun,\n\tyear = {2009},\n\tkeywords = {ethnicity detection, name classification, news analysis, social science research},\n\tpages = {49--58},\n}\n\n\n\n\n\n\n\n","author_short":["Ambekar, A.","Ward, C.","Mohammed, J.","Male, S.","Skiena, S."],"key":"ambekar_name-ethnicity_2009","id":"ambekar_name-ethnicity_2009","bibbaseid":"ambekar-ward-mohammed-male-skiena-nameethnicityclassificationfromopensources-2009","role":"author","urls":{"Paper":"https://doi.org/10.1145/1557019.1557032"},"keyword":["ethnicity detection","name classification","news analysis","social science research"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"inproceedings","biburl":"https://bibbase.org/zotero/pab2163","dataSources":["fB4GuzdCZcPR6LeBn"],"keywords":["ethnicity detection","name classification","news analysis","social science research"],"search_terms":["name","ethnicity","classification","open","sources","ambekar","ward","mohammed","male","skiena"],"title":"Name-ethnicity classification from open sources","year":2009}