The impact of using different annotation schemes on named entity recognition. Alshammari, N. & Alanazi, S. Egyptian Informatics Journal, 22(3):295–302, 2021.
The impact of using different annotation schemes on named entity recognition [link]Paper  doi  abstract   bibtex   
Named entity recognition (NER) is a subfield of information extraction, which aims to detect and classify predefined named entities (e.g., people, locations, organizations, etc.) in a body of text. In the literature, many researchers have studied the application of different machine learning models and features to NER. However, few research efforts have been devoted to studying annotation schemes used to label multi-token named entities. In this research, we studied seven annotation schemes (IO, IOB, IOE, IOBES, BI, IE, and BIES) and their impact on the task of NER using five different classifiers. Our experiment was conducted on an in–house dataset that consists of 27 medical Arabic articles with more than 62,000 tokens. The IO annotation scheme outperformed other schemes with an F-measure score of 84.44%. The closest competitor is the BIES scheme, which scored 72.78%. The rest of the schemes’ scores ranged from 60.38% to 69.18%. Although the IO scheme achieved the best results, comparing it to the other schemes is not reasonable because it cannot identify consecutive entities, which the other schemes can do. Therefore, we also investigated the ability of recognizing consecutive entities and provided an analysis of the running-time complexity.
@article{alshammari_impact_2021,
	title = {The impact of using different annotation schemes on named entity recognition},
	volume = {22},
	issn = {1110-8665},
	url = {https://www.sciencedirect.com/science/article/pii/S1110866520301596},
	doi = {https://doi.org/10.1016/j.eij.2020.10.004},
	abstract = {Named entity recognition (NER) is a subfield of information extraction, which aims to detect and classify predefined named entities (e.g., people, locations, organizations, etc.) in a body of text. In the literature, many researchers have studied the application of different machine learning models and features to NER. However, few research efforts have been devoted to studying annotation schemes used to label multi-token named entities. In this research, we studied seven annotation schemes (IO, IOB, IOE, IOBES, BI, IE, and BIES) and their impact on the task of NER using five different classifiers. Our experiment was conducted on an in–house dataset that consists of 27 medical Arabic articles with more than 62,000 tokens. The IO annotation scheme outperformed other schemes with an F-measure score of 84.44\%. The closest competitor is the BIES scheme, which scored 72.78\%. The rest of the schemes’ scores ranged from 60.38\% to 69.18\%. Although the IO scheme achieved the best results, comparing it to the other schemes is not reasonable because it cannot identify consecutive entities, which the other schemes can do. Therefore, we also investigated the ability of recognizing consecutive entities and provided an analysis of the running-time complexity.},
	number = {3},
	journal = {Egyptian Informatics Journal},
	author = {Alshammari, Nasser and Alanazi, Saad},
	year = {2021},
	keywords = {Annotation schemes, Named entity recognition, Natural language processing, Segment representation},
	pages = {295--302},
}

Downloads: 0