Identifying discourse connectives in biomedical text

Identifying discourse connectives in biomedical text. Ramesh, B. P. & Yu, H. AMIA ... Annual Symposium proceedings. AMIA Symposium, 2010:657–661, November, 2010.

Paper abstract bibtex

Discourse connectives are words or phrases that connect or relate two coherent sentences or phrases and indicate the presence of discourse relations. Automatic recognition of discourse connectives may benefit many natural language processing applications. In this pilot study, we report the development of the supervised machine-learning classifiers with conditional random fields (CRFs) for automatically identifying discourse connectives in full-text biomedical articles. Our first classifier was trained on the open-domain 1 million token Penn Discourse Tree Bank (PDTB). We performed cross validation on biomedical articles (approximately 100K word tokens) that we annotated. The results show that the classifier trained on PDTB data attained a 0.55 F1-score for identifying discourse connectives in biomedical text, while the cross-validation results in the biomedical text attained a 0.69 F1-score, a much better performance despite a much smaller training size. Our preliminary analysis suggests the existence of domain-specific features, and we speculate that domain-adaption approaches may further improve performance.

@article{ramesh_identifying_2010,
	title = {Identifying discourse connectives in biomedical text},
	volume = {2010},
	issn = {1942-597X},
	url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041460/},
	abstract = {Discourse connectives are words or phrases that connect or relate two coherent sentences or phrases and indicate the presence of discourse relations. Automatic recognition of discourse connectives may benefit many natural language processing applications. In this pilot study, we report the development of the supervised machine-learning classifiers with conditional random fields (CRFs) for automatically identifying discourse connectives in full-text biomedical articles. Our first classifier was trained on the open-domain 1 million token Penn Discourse Tree Bank (PDTB). We performed cross validation on biomedical articles (approximately 100K word tokens) that we annotated. The results show that the classifier trained on PDTB data attained a 0.55 F1-score for identifying discourse connectives in biomedical text, while the cross-validation results in the biomedical text attained a 0.69 F1-score, a much better performance despite a much smaller training size. Our preliminary analysis suggests the existence of domain-specific features, and we speculate that domain-adaption approaches may further improve performance.},
	language = {ENG},
	journal = {AMIA ... Annual Symposium proceedings. AMIA Symposium},
	author = {Ramesh, Balaji Polepalli and Yu, Hong},
	month = nov,
	year = {2010},
	pmid = {21347060 PMCID: PMC3041460},
	keywords = {Algorithms, Artificial Intelligence, Databases, Factual, Humans, Pilot Projects, Supervised Machine Learning, natural language processing},
	pages = {657--661},
}

Downloads: 0

{"_id":"SqjFfdv39mXNXfyGx","bibbaseid":"ramesh-yu-identifyingdiscourseconnectivesinbiomedicaltext-2010","author_short":["Ramesh, B. P.","Yu, H."],"bibdata":{"bibtype":"article","type":"article","title":"Identifying discourse connectives in biomedical text","volume":"2010","issn":"1942-597X","url":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041460/","abstract":"Discourse connectives are words or phrases that connect or relate two coherent sentences or phrases and indicate the presence of discourse relations. Automatic recognition of discourse connectives may benefit many natural language processing applications. In this pilot study, we report the development of the supervised machine-learning classifiers with conditional random fields (CRFs) for automatically identifying discourse connectives in full-text biomedical articles. Our first classifier was trained on the open-domain 1 million token Penn Discourse Tree Bank (PDTB). We performed cross validation on biomedical articles (approximately 100K word tokens) that we annotated. The results show that the classifier trained on PDTB data attained a 0.55 F1-score for identifying discourse connectives in biomedical text, while the cross-validation results in the biomedical text attained a 0.69 F1-score, a much better performance despite a much smaller training size. Our preliminary analysis suggests the existence of domain-specific features, and we speculate that domain-adaption approaches may further improve performance.","language":"ENG","journal":"AMIA ... Annual Symposium proceedings. AMIA Symposium","author":[{"propositions":[],"lastnames":["Ramesh"],"firstnames":["Balaji","Polepalli"],"suffixes":[]},{"propositions":[],"lastnames":["Yu"],"firstnames":["Hong"],"suffixes":[]}],"month":"November","year":"2010","pmid":"21347060 PMCID: PMC3041460","keywords":"Algorithms, Artificial Intelligence, Databases, Factual, Humans, Pilot Projects, Supervised Machine Learning, natural language processing","pages":"657–661","bibtex":"@article{ramesh_identifying_2010,\n\ttitle = {Identifying discourse connectives in biomedical text},\n\tvolume = {2010},\n\tissn = {1942-597X},\n\turl = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041460/},\n\tabstract = {Discourse connectives are words or phrases that connect or relate two coherent sentences or phrases and indicate the presence of discourse relations. Automatic recognition of discourse connectives may benefit many natural language processing applications. In this pilot study, we report the development of the supervised machine-learning classifiers with conditional random fields (CRFs) for automatically identifying discourse connectives in full-text biomedical articles. Our first classifier was trained on the open-domain 1 million token Penn Discourse Tree Bank (PDTB). We performed cross validation on biomedical articles (approximately 100K word tokens) that we annotated. The results show that the classifier trained on PDTB data attained a 0.55 F1-score for identifying discourse connectives in biomedical text, while the cross-validation results in the biomedical text attained a 0.69 F1-score, a much better performance despite a much smaller training size. Our preliminary analysis suggests the existence of domain-specific features, and we speculate that domain-adaption approaches may further improve performance.},\n\tlanguage = {ENG},\n\tjournal = {AMIA ... Annual Symposium proceedings. AMIA Symposium},\n\tauthor = {Ramesh, Balaji Polepalli and Yu, Hong},\n\tmonth = nov,\n\tyear = {2010},\n\tpmid = {21347060 PMCID: PMC3041460},\n\tkeywords = {Algorithms, Artificial Intelligence, Databases, Factual, Humans, Pilot Projects, Supervised Machine Learning, natural language processing},\n\tpages = {657--661},\n}\n\n","author_short":["Ramesh, B. P.","Yu, H."],"key":"ramesh_identifying_2010","id":"ramesh_identifying_2010","bibbaseid":"ramesh-yu-identifyingdiscourseconnectivesinbiomedicaltext-2010","role":"author","urls":{"Paper":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041460/"},"keyword":["Algorithms","Artificial Intelligence","Databases","Factual","Humans","Pilot Projects","Supervised Machine Learning","natural language processing"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"article","biburl":"http://fenway.cs.uml.edu/papers/pubs-all.bib","dataSources":["TqaA9miSB65nRfS5H"],"keywords":["algorithms","artificial intelligence","databases","factual","humans","pilot projects","supervised machine learning","natural language processing"],"search_terms":["identifying","discourse","connectives","biomedical","text","ramesh","yu"],"title":"Identifying discourse connectives in biomedical text","year":2010}