Reward Learning from Human Demonstration Improves SFT for LLM Alignment. Li, J., Zeng, S., Wai, H., Li, C., Garcia, A., & Hong, M. In NeurIPS, 2024.
Reward Learning from Human Demonstration Improves SFT for LLM Alignment [link]Paper  bibtex   2 downloads  
@inproceedings{li2024getting,
	author = {Li, Jiaxiang and Zeng, Siliang and Wai, Hoi-To and Li, Chenliang and Garcia, Alfredo and Hong, Mingyi},
	date-added = {2024-06-01 00:29:40 +0800},
	date-modified = {2024-06-01 00:29:49 +0800},
	booktitle = {NeurIPS},
	year = {2024},
	title = {Reward Learning from Human Demonstration Improves SFT for LLM Alignment},
	url_paper = {https://arxiv.org/abs/2405.17888}}

Downloads: 2