Building a Lexicon of Formulaic Language for Language Learners

Building a Lexicon of Formulaic Language for Language Learners. Brooke, J., Hammond, A., Jacob, D., Tsang, V., Hirst, G., & Shein, F. In Proceedings, 11th Workshop on Multiword Expressions, pages 96--104, Denver, Colorado, June, 2015.
abstract bibtex

Though the multiword lexicon has long been of interest in computational linguistics, most relevant work is targeted at only a small portion of it. Our work is motivated by the needs of learners for more comprehensive resources reflecting formulaic language that goes beyond what is likely to be codified in a dictionary. Working from an initial sequential segmentation approach, we present two enhancements: the use of a new measure to promote the identification of lexicalized sequences, and an expansion to include sequences with gaps. We evaluate using a novel method that allows us to calculate an estimate of recall without a reference lexicon, showing that good performance in the second enhancement depends crucially on the first, and that our lexicon conforms much more with human judgment of formulaic language than alternatives.

@inproceedings{Brookeetal2015MWE,
   author = {Julian Brooke and Adam Hammond and David Jacob and Vivian
                  Tsang  and Graeme Hirst and Fraser Shein},
   title = {Building a Lexicon of Formulaic Language for Language Learners},
   address = {Denver, Colorado},
   booktitle = {Proceedings, 11th Workshop on Multiword Expressions},
   pages = {96--104},
   year = {2015},
   month = {June},
   download = {http://ftp.cs.toronto.edu/pub/gh/Brooke-etal-2015-MWE.pdf},
   abstract = { Though the multiword lexicon has long been of interest
                  in computational linguistics, most relevant work is
                  targeted at only a small portion of it. Our work is
                  motivated by the needs of learners for more
                  comprehensive resources reflecting formulaic
                  language that goes beyond what is likely to be
                  codified in a dictionary. Working from an initial
                  sequential segmentation approach, we present two
                  enhancements: the use of a new measure to promote
                  the identification of lexicalized sequences, and an
                  expansion to include sequences with gaps. We
                  evaluate using a novel method that allows us to
                  calculate an estimate of recall without a reference
                  lexicon, showing that good performance in the second
                  enhancement depends crucially on the first, and that
                  our lexicon conforms much more with human judgment
                  of formulaic language than alternatives.}
}

Downloads: 0

{"_id":"pKDxzFK5eRawsEgbs","bibbaseid":"brooke-hammond-jacob-tsang-hirst-shein-buildingalexiconofformulaiclanguageforlanguagelearners-2015","downloads":0,"creationDate":"2016-10-19T19:03:53.873Z","title":"Building a Lexicon of Formulaic Language for Language Learners","author_short":["Brooke, J.","Hammond, A.","Jacob, D.","Tsang, V.","Hirst, G.","Shein, F."],"year":2015,"bibtype":"inproceedings","biburl":"http://www.cs.toronto.edu/compling/all_bib.bib","bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["Julian"],"propositions":[],"lastnames":["Brooke"],"suffixes":[]},{"firstnames":["Adam"],"propositions":[],"lastnames":["Hammond"],"suffixes":[]},{"firstnames":["David"],"propositions":[],"lastnames":["Jacob"],"suffixes":[]},{"firstnames":["Vivian"],"propositions":[],"lastnames":["Tsang"],"suffixes":[]},{"firstnames":["Graeme"],"propositions":[],"lastnames":["Hirst"],"suffixes":[]},{"firstnames":["Fraser"],"propositions":[],"lastnames":["Shein"],"suffixes":[]}],"title":"Building a Lexicon of Formulaic Language for Language Learners","address":"Denver, Colorado","booktitle":"Proceedings, 11th Workshop on Multiword Expressions","pages":"96--104","year":"2015","month":"June","download":"http://ftp.cs.toronto.edu/pub/gh/Brooke-etal-2015-MWE.pdf","abstract":"Though the multiword lexicon has long been of interest in computational linguistics, most relevant work is targeted at only a small portion of it. Our work is motivated by the needs of learners for more comprehensive resources reflecting formulaic language that goes beyond what is likely to be codified in a dictionary. Working from an initial sequential segmentation approach, we present two enhancements: the use of a new measure to promote the identification of lexicalized sequences, and an expansion to include sequences with gaps. We evaluate using a novel method that allows us to calculate an estimate of recall without a reference lexicon, showing that good performance in the second enhancement depends crucially on the first, and that our lexicon conforms much more with human judgment of formulaic language than alternatives.","bibtex":"@inproceedings{Brookeetal2015MWE,\n author = {Julian Brooke and Adam Hammond and David Jacob and Vivian\n Tsang and Graeme Hirst and Fraser Shein},\n title = {Building a Lexicon of Formulaic Language for Language Learners},\n address = {Denver, Colorado},\n booktitle = {Proceedings, 11th Workshop on Multiword Expressions},\n pages = {96--104},\n year = {2015},\n month = {June},\n download = {http://ftp.cs.toronto.edu/pub/gh/Brooke-etal-2015-MWE.pdf},\n abstract = { Though the multiword lexicon has long been of interest\n in computational linguistics, most relevant work is\n targeted at only a small portion of it. Our work is\n motivated by the needs of learners for more\n comprehensive resources reflecting formulaic\n language that goes beyond what is likely to be\n codified in a dictionary. Working from an initial\n sequential segmentation approach, we present two\n enhancements: the use of a new measure to promote\n the identification of lexicalized sequences, and an\n expansion to include sequences with gaps. We\n evaluate using a novel method that allows us to\n calculate an estimate of recall without a reference\n lexicon, showing that good performance in the second\n enhancement depends crucially on the first, and that\n our lexicon conforms much more with human judgment\n of formulaic language than alternatives.}\n}\n\n\n\n","author_short":["Brooke, J.","Hammond, A.","Jacob, D.","Tsang, V.","Hirst, G.","Shein, F."],"key":"Brookeetal2015MWE","id":"Brookeetal2015MWE","bibbaseid":"brooke-hammond-jacob-tsang-hirst-shein-buildingalexiconofformulaiclanguageforlanguagelearners-2015","role":"author","urls":{},"downloads":0},"search_terms":["building","lexicon","formulaic","language","language","learners","brooke","hammond","jacob","tsang","hirst","shein"],"keywords":[],"authorIDs":[],"dataSources":["2vBSdbWEoTEQZtb6g"]}