PaLM 2 Technical Report

PaLM 2 Technical Report. Anil, R., Dai, A. M., Firat, O., Johnson, M., Lepikhin, D., Passos, A., Shakeri, S., Taropa, E., Bailey, P., Chen, Z., Chu, E., Clark, J. H., Shafey, L. E., Huang, Y., Meier-Hellstern, K., Mishra, G., Moreira, E., Omernick, M., Robinson, K., Ruder, S., Tay, Y., Xiao, K., Xu, Y., Zhang, Y., Abrego, G. H., Ahn, J., Austin, J., Barham, P., Botha, J., Bradbury, J., Brahma, S., Brooks, K., Catasta, M., Cheng, Y., Cherry, C., Choquette-Choo, C. A., Chowdhery, A., Crepy, C., Dave, S., Dehghani, M., Dev, S., Devlin, J., Díaz, M., Du, N., Dyer, E., Feinberg, V., Feng, F., Fienber, V., Freitag, M., Garcia, X., Gehrmann, S., Gonzalez, L., Gur-Ari, G., Hand, S., Hashemi, H., Hou, L., Howland, J., Hu, A., Hui, J., Hurwitz, J., Isard, M., Ittycheriah, A., Jagielski, M., Jia, W., Kenealy, K., Krikun, M., Kudugunta, S., Lan, C., Lee, K., Lee, B., Li, E., Li, M., Li, W., Li, Y., Li, J., Lim, H., Lin, H., Liu, Z., Liu, F., Maggioni, M., Mahendru, A., Maynez, J., Misra, V., Moussalem, M., Nado, Z., Nham, J., Ni, E., Nystrom, A., Parrish, A., Pellat, M., Polacek, M., Polozov, A., Pope, R., Qiao, S., Reif, E., Richter, B., Riley, P., Ros, A. C., Roy, A., Saeta, B., Samuel, R., Shelby, R., Slone, A., Smilkov, D., So, D. R., Sohn, D., Tokumine, S., Valter, D., Vasudevan, V., Vodrahalli, K., Wang, X., Wang, P., Wang, Z., Wang, T., Wieting, J., Wu, Y., Xu, K., Xu, Y., Xue, L., Yin, P., Yu, J., Zhang, Q., Zheng, S., Zheng, C., Zhou, W., Zhou, D., Petrov, S., & Wu, Y. September, 2023. arXiv:2305.10403 [cs]

Paper abstract bibtex

We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities. When discussing the PaLM 2 family, it is important to distinguish between pre-trained models (of various sizes), fine-tuned variants of these models, and the user-facing products that use these models. In particular, user-facing products typically include additional pre- and post-processing steps. Additionally, the underlying models may evolve over time. Therefore, one should not expect the performance of user-facing products to exactly match the results reported in this report.

@misc{anil_palm_2023,
	title = {{PaLM} 2 {Technical} {Report}},
	url = {http://arxiv.org/abs/2305.10403},
	abstract = {We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities. When discussing the PaLM 2 family, it is important to distinguish between pre-trained models (of various sizes), fine-tuned variants of these models, and the user-facing products that use these models. In particular, user-facing products typically include additional pre- and post-processing steps. Additionally, the underlying models may evolve over time. Therefore, one should not expect the performance of user-facing products to exactly match the results reported in this report.},
	urldate = {2024-11-15},
	publisher = {arXiv},
	author = {Anil, Rohan and Dai, Andrew M. and Firat, Orhan and Johnson, Melvin and Lepikhin, Dmitry and Passos, Alexandre and Shakeri, Siamak and Taropa, Emanuel and Bailey, Paige and Chen, Zhifeng and Chu, Eric and Clark, Jonathan H. and Shafey, Laurent El and Huang, Yanping and Meier-Hellstern, Kathy and Mishra, Gaurav and Moreira, Erica and Omernick, Mark and Robinson, Kevin and Ruder, Sebastian and Tay, Yi and Xiao, Kefan and Xu, Yuanzhong and Zhang, Yujing and Abrego, Gustavo Hernandez and Ahn, Junwhan and Austin, Jacob and Barham, Paul and Botha, Jan and Bradbury, James and Brahma, Siddhartha and Brooks, Kevin and Catasta, Michele and Cheng, Yong and Cherry, Colin and Choquette-Choo, Christopher A. and Chowdhery, Aakanksha and Crepy, Clément and Dave, Shachi and Dehghani, Mostafa and Dev, Sunipa and Devlin, Jacob and Díaz, Mark and Du, Nan and Dyer, Ethan and Feinberg, Vlad and Feng, Fangxiaoyu and Fienber, Vlad and Freitag, Markus and Garcia, Xavier and Gehrmann, Sebastian and Gonzalez, Lucas and Gur-Ari, Guy and Hand, Steven and Hashemi, Hadi and Hou, Le and Howland, Joshua and Hu, Andrea and Hui, Jeffrey and Hurwitz, Jeremy and Isard, Michael and Ittycheriah, Abe and Jagielski, Matthew and Jia, Wenhao and Kenealy, Kathleen and Krikun, Maxim and Kudugunta, Sneha and Lan, Chang and Lee, Katherine and Lee, Benjamin and Li, Eric and Li, Music and Li, Wei and Li, YaGuang and Li, Jian and Lim, Hyeontaek and Lin, Hanzhao and Liu, Zhongtao and Liu, Frederick and Maggioni, Marcello and Mahendru, Aroma and Maynez, Joshua and Misra, Vedant and Moussalem, Maysam and Nado, Zachary and Nham, John and Ni, Eric and Nystrom, Andrew and Parrish, Alicia and Pellat, Marie and Polacek, Martin and Polozov, Alex and Pope, Reiner and Qiao, Siyuan and Reif, Emily and Richter, Bryan and Riley, Parker and Ros, Alex Castro and Roy, Aurko and Saeta, Brennan and Samuel, Rajkumar and Shelby, Renee and Slone, Ambrose and Smilkov, Daniel and So, David R. and Sohn, Daniel and Tokumine, Simon and Valter, Dasha and Vasudevan, Vijay and Vodrahalli, Kiran and Wang, Xuezhi and Wang, Pidong and Wang, Zirui and Wang, Tao and Wieting, John and Wu, Yuhuai and Xu, Kelvin and Xu, Yunhan and Xue, Linting and Yin, Pengcheng and Yu, Jiahui and Zhang, Qiao and Zheng, Steven and Zheng, Ce and Zhou, Weikang and Zhou, Denny and Petrov, Slav and Wu, Yonghui},
	month = sep,
	year = {2023},
	note = {arXiv:2305.10403 [cs]},
	keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language},
}

Downloads: 0