ChatGPT may Pass the Bar Exam soon, but has a Long Way to Go for the LexGLUE benchmark. Chalkidis, I. March, 2023. arXiv:2304.12202 [cs]
Paper doi abstract bibtex Following the hype around OpenAI's ChatGPT conversational agent, the last straw in the recent development of Large Language Models (LLMs) that demonstrate emergent unprecedented zero-shot capabilities, we audit the latest OpenAI's GPT-3.5 model, `gpt-3.5-turbo', the first available ChatGPT model, in the LexGLUE benchmark in a zero-shot fashion providing examples in a templated instruction-following format. The results indicate that ChatGPT achieves an average micro-F1 score of 47.6% across LexGLUE tasks, surpassing the baseline guessing rates. Notably, the model performs exceptionally well in some datasets, achieving micro-F1 scores of 62.8% and 70.2% in the ECtHR B and LEDGAR datasets, respectively. The code base and model predictions are available for review on https://github.com/coastalcph/zeroshot_lexglue.
@misc{chalkidisChatGPTMayPass2023,
title = {{ChatGPT} may {Pass} the {Bar} {Exam} soon, but has a {Long} {Way} to {Go} for the {LexGLUE} benchmark},
url = {http://arxiv.org/abs/2304.12202},
doi = {10.48550/arXiv.2304.12202},
abstract = {Following the hype around OpenAI's ChatGPT conversational agent, the last straw in the recent development of Large Language Models (LLMs) that demonstrate emergent unprecedented zero-shot capabilities, we audit the latest OpenAI's GPT-3.5 model, `gpt-3.5-turbo', the first available ChatGPT model, in the LexGLUE benchmark in a zero-shot fashion providing examples in a templated instruction-following format. The results indicate that ChatGPT achieves an average micro-F1 score of 47.6\% across LexGLUE tasks, surpassing the baseline guessing rates. Notably, the model performs exceptionally well in some datasets, achieving micro-F1 scores of 62.8\% and 70.2\% in the ECtHR B and LEDGAR datasets, respectively. The code base and model predictions are available for review on https://github.com/coastalcph/zeroshot\_lexglue.},
urldate = {2023-06-12},
publisher = {arXiv},
author = {Chalkidis, Ilias},
month = mar,
year = {2023},
note = {arXiv:2304.12202 [cs]},
keywords = {Computer Science - Computation and Language},
annote = {Comment: Working paper},
}
Downloads: 0
{"_id":"wBE9n7rBnWwxsTzRR","bibbaseid":"chalkidis-chatgptmaypassthebarexamsoonbuthasalongwaytogoforthelexgluebenchmark-2023","author_short":["Chalkidis, I."],"bibdata":{"bibtype":"misc","type":"misc","title":"ChatGPT may Pass the Bar Exam soon, but has a Long Way to Go for the LexGLUE benchmark","url":"http://arxiv.org/abs/2304.12202","doi":"10.48550/arXiv.2304.12202","abstract":"Following the hype around OpenAI's ChatGPT conversational agent, the last straw in the recent development of Large Language Models (LLMs) that demonstrate emergent unprecedented zero-shot capabilities, we audit the latest OpenAI's GPT-3.5 model, `gpt-3.5-turbo', the first available ChatGPT model, in the LexGLUE benchmark in a zero-shot fashion providing examples in a templated instruction-following format. The results indicate that ChatGPT achieves an average micro-F1 score of 47.6% across LexGLUE tasks, surpassing the baseline guessing rates. Notably, the model performs exceptionally well in some datasets, achieving micro-F1 scores of 62.8% and 70.2% in the ECtHR B and LEDGAR datasets, respectively. The code base and model predictions are available for review on https://github.com/coastalcph/zeroshot_lexglue.","urldate":"2023-06-12","publisher":"arXiv","author":[{"propositions":[],"lastnames":["Chalkidis"],"firstnames":["Ilias"],"suffixes":[]}],"month":"March","year":"2023","note":"arXiv:2304.12202 [cs]","keywords":"Computer Science - Computation and Language","annote":"Comment: Working paper","bibtex":"@misc{chalkidisChatGPTMayPass2023,\n\ttitle = {{ChatGPT} may {Pass} the {Bar} {Exam} soon, but has a {Long} {Way} to {Go} for the {LexGLUE} benchmark},\n\turl = {http://arxiv.org/abs/2304.12202},\n\tdoi = {10.48550/arXiv.2304.12202},\n\tabstract = {Following the hype around OpenAI's ChatGPT conversational agent, the last straw in the recent development of Large Language Models (LLMs) that demonstrate emergent unprecedented zero-shot capabilities, we audit the latest OpenAI's GPT-3.5 model, `gpt-3.5-turbo', the first available ChatGPT model, in the LexGLUE benchmark in a zero-shot fashion providing examples in a templated instruction-following format. The results indicate that ChatGPT achieves an average micro-F1 score of 47.6\\% across LexGLUE tasks, surpassing the baseline guessing rates. Notably, the model performs exceptionally well in some datasets, achieving micro-F1 scores of 62.8\\% and 70.2\\% in the ECtHR B and LEDGAR datasets, respectively. The code base and model predictions are available for review on https://github.com/coastalcph/zeroshot\\_lexglue.},\n\turldate = {2023-06-12},\n\tpublisher = {arXiv},\n\tauthor = {Chalkidis, Ilias},\n\tmonth = mar,\n\tyear = {2023},\n\tnote = {arXiv:2304.12202 [cs]},\n\tkeywords = {Computer Science - Computation and Language},\n\tannote = {Comment: Working paper},\n}\n\n","author_short":["Chalkidis, I."],"key":"chalkidisChatGPTMayPass2023","id":"chalkidisChatGPTMayPass2023","bibbaseid":"chalkidis-chatgptmaypassthebarexamsoonbuthasalongwaytogoforthelexgluebenchmark-2023","role":"author","urls":{"Paper":"http://arxiv.org/abs/2304.12202"},"keyword":["Computer Science - Computation and Language"],"metadata":{"authorlinks":{}}},"bibtype":"misc","biburl":"https://bibbase.org/f/vr5ooa48xeYes5KDD/ailaw.bib","dataSources":["7FkfQdR6FwGXEAZFa","QHxajSYCsDY5s5PEr"],"keywords":["computer science - computation and language"],"search_terms":["chatgpt","pass","bar","exam","soon","long","way","lexglue","benchmark","chalkidis"],"title":"ChatGPT may Pass the Bar Exam soon, but has a Long Way to Go for the LexGLUE benchmark","year":2023}