Reward Learning from Human Demonstration Improves SFT for LLM Alignment

Reward Learning from Human Demonstration Improves SFT for LLM Alignment. Li, J., Zeng, S., Wai, H., Li, C., Garcia, A., & Hong, M. In NeurIPS, 2024.

Paper bibtex 2 downloads

@inproceedings{li2024getting,
	author = {Li, Jiaxiang and Zeng, Siliang and Wai, Hoi-To and Li, Chenliang and Garcia, Alfredo and Hong, Mingyi},
	date-added = {2024-06-01 00:29:40 +0800},
	date-modified = {2024-06-01 00:29:49 +0800},
	booktitle = {NeurIPS},
	year = {2024},
	title = {Reward Learning from Human Demonstration Improves SFT for LLM Alignment},
	url_paper = {https://arxiv.org/abs/2405.17888}}

Downloads: 2