Cryptonite: A Cryptic Crossword Benchmark for Extreme Ambiguity in Language. Efrat, A., Shaham, U., Kilman, D., & Levy, O. arXiv:2103.01242 [cs, stat], November, 2021. arXiv: 2103.01242
Cryptonite: A Cryptic Crossword Benchmark for Extreme Ambiguity in Language [link]Paper  abstract   bibtex   
Current NLP datasets targeting ambiguity can be solved by a native speaker with relative ease. We present Cryptonite, a large-scale dataset based on cryptic crosswords, which is both linguistically complex and naturally sourced. Each example in Cryptonite is a cryptic clue, a short phrase or sentence with a misleading surface reading, whose solving requires disambiguating semantic, syntactic, and phonetic wordplays, as well as world knowledge. Cryptic clues pose a challenge even for experienced solvers, though top-tier experts can solve them with almost 100% accuracy. Cryptonite is a challenging task for current models; fine-tuning T5-Large on 470k cryptic clues achieves only 7.6% accuracy, on par with the accuracy of a rule-based clue solver (8.6%).
@article{efrat_cryptonite_2021,
	title = {Cryptonite: {A} {Cryptic} {Crossword} {Benchmark} for {Extreme} {Ambiguity} in {Language}},
	shorttitle = {Cryptonite},
	url = {http://arxiv.org/abs/2103.01242},
	abstract = {Current NLP datasets targeting ambiguity can be solved by a native speaker with relative ease. We present Cryptonite, a large-scale dataset based on cryptic crosswords, which is both linguistically complex and naturally sourced. Each example in Cryptonite is a cryptic clue, a short phrase or sentence with a misleading surface reading, whose solving requires disambiguating semantic, syntactic, and phonetic wordplays, as well as world knowledge. Cryptic clues pose a challenge even for experienced solvers, though top-tier experts can solve them with almost 100\% accuracy. Cryptonite is a challenging task for current models; fine-tuning T5-Large on 470k cryptic clues achieves only 7.6\% accuracy, on par with the accuracy of a rule-based clue solver (8.6\%).},
	urldate = {2022-02-14},
	journal = {arXiv:2103.01242 [cs, stat]},
	author = {Efrat, Avia and Shaham, Uri and Kilman, Dan and Levy, Omer},
	month = nov,
	year = {2021},
	note = {arXiv: 2103.01242},
	keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning, Statistics - Machine Learning},
}

Downloads: 0