From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence. Yang, J., Liu, X., Lv, W., Deng, K., Guo, S., Jing, L., Li, Y., Liu, S., Luo, X., Luo, Y., Pan, C., Shi, E., Tan, Y., Tao, R., Wu, J., Wu, X., Wu, Z., Zan, D., Zhang, C., Zhang, W., Zhu, H., Zhuo, T. Y., Cao, K., Cheng, X., Dong, J., Fang, S., Fei, Z., Guan, X., Guo, Q., Han, Z., James, J., Luo, T., Li, R., Li, Y., Liang, Y., Liu, C., Liu, J., Liu, Q., Liu, R., Loakman, T., Meng, X., Peng, C., Peng, T., Shi, J., Tang, M., Wang, B., Wang, H., Wang, Y., Xu, F., Xu, Z., Yuan, F., Zhang, G., Zhang, J., Zhang, X., Zhou, W., Zhu, H., Zhu, K., Dai, B., Liu, A., Li, Z., Lin, C., Liu, T., Peng, C., Shen, K., Qin, L., Song, S., Zhan, Z., Zhang, J., Zhang, J., Zhang, Z., & Zheng, B. December, 2025. arXiv:2511.18538 [cs]

Paper doi abstract bibtex

Large language models (LLMs) have fundamentally transformed automated software development by enabling direct translation of natural language descriptions into functional code, driving commercial adoption through tools like Github Copilot (Microsoft), Cursor (Anysphere), Trae (ByteDance), and Claude Code (Anthropic). While the field has evolved dramatically from rule-based systems to Transformer-based architectures, achieving performance improvements from single-digit to over 95\% success rates on benchmarks like HumanEval. In this work, we provide a comprehensive synthesis and practical guide (a series of analytic and probing experiments) about code LLMs, systematically examining the complete model life cycle from data curation to post-training through advanced prompting paradigms, code pre-training, supervised fine-tuning, reinforcement learning, and autonomous coding agents. We analyze the code capability of the general LLMs (GPT-4, Claude, LLaMA) and code-specialized LLMs (StarCoder, Code LLaMA, DeepSeek-Coder, and QwenCoder), critically examining the techniques, design decisions, and trade-offs. Further, we articulate the research-practice gap between academic research (e.g., benchmarks and tasks) and real-world deployment (e.g., software-related code tasks), including code correctness, security, contextual awareness of large codebases, and integration with development workflows, and map promising research directions to practical needs. Last, we conduct a series of experiments to provide a comprehensive analysis of code pre-training, supervised fine-tuning, and reinforcement learning, covering scaling law, framework selection, hyperparameter sensitivity, model architectures, and dataset comparisons.

@misc{yang_code_2025,
	title = {From {Code} {Foundation} {Models} to {Agents} and {Applications}: {A} {Comprehensive} {Survey} and {Practical} {Guide} to {Code} {Intelligence}},
	shorttitle = {From {Code} {Foundation} {Models} to {Agents} and {Applications}},
	url = {http://arxiv.org/abs/2511.18538},
	doi = {10.48550/arXiv.2511.18538},
	abstract = {Large language models (LLMs) have fundamentally transformed automated software development by enabling direct translation of natural language descriptions into functional code, driving commercial adoption through tools like Github Copilot (Microsoft), Cursor (Anysphere), Trae (ByteDance), and Claude Code (Anthropic). While the field has evolved dramatically from rule-based systems to Transformer-based architectures, achieving performance improvements from single-digit to over 95{\textbackslash}\% success rates on benchmarks like HumanEval. In this work, we provide a comprehensive synthesis and practical guide (a series of analytic and probing experiments) about code LLMs, systematically examining the complete model life cycle from data curation to post-training through advanced prompting paradigms, code pre-training, supervised fine-tuning, reinforcement learning, and autonomous coding agents. We analyze the code capability of the general LLMs (GPT-4, Claude, LLaMA) and code-specialized LLMs (StarCoder, Code LLaMA, DeepSeek-Coder, and QwenCoder), critically examining the techniques, design decisions, and trade-offs. Further, we articulate the research-practice gap between academic research (e.g., benchmarks and tasks) and real-world deployment (e.g., software-related code tasks), including code correctness, security, contextual awareness of large codebases, and integration with development workflows, and map promising research directions to practical needs. Last, we conduct a series of experiments to provide a comprehensive analysis of code pre-training, supervised fine-tuning, and reinforcement learning, covering scaling law, framework selection, hyperparameter sensitivity, model architectures, and dataset comparisons.},
	urldate = {2025-12-26},
	publisher = {arXiv},
	author = {Yang, Jian and Liu, Xianglong and Lv, Weifeng and Deng, Ken and Guo, Shawn and Jing, Lin and Li, Yizhi and Liu, Shark and Luo, Xianzhen and Luo, Yuyu and Pan, Changzai and Shi, Ensheng and Tan, Yingshui and Tao, Renshuai and Wu, Jiajun and Wu, Xianjie and Wu, Zhenhe and Zan, Daoguang and Zhang, Chenchen and Zhang, Wei and Zhu, He and Zhuo, Terry Yue and Cao, Kerui and Cheng, Xianfu and Dong, Jun and Fang, Shengjie and Fei, Zhiwei and Guan, Xiangyuan and Guo, Qipeng and Han, Zhiguang and James, Joseph and Luo, Tianqi and Li, Renyuan and Li, Yuhang and Liang, Yiming and Liu, Congnan and Liu, Jiaheng and Liu, Qian and Liu, Ruitong and Loakman, Tyler and Meng, Xiangxin and Peng, Chuang and Peng, Tianhao and Shi, Jiajun and Tang, Mingjie and Wang, Boyang and Wang, Haowen and Wang, Yunli and Xu, Fanglin and Xu, Zihan and Yuan, Fei and Zhang, Ge and Zhang, Jiayi and Zhang, Xinhao and Zhou, Wangchunshu and Zhu, Hualei and Zhu, King and Dai, Bryan and Liu, Aishan and Li, Zhoujun and Lin, Chenghua and Liu, Tianyu and Peng, Chao and Shen, Kai and Qin, Libo and Song, Shuangyong and Zhan, Zizheng and Zhang, Jiajun and Zhang, Jie and Zhang, Zhaoxiang and Zheng, Bo},
	month = dec,
	year = {2025},
	note = {arXiv:2511.18538 [cs]},
	keywords = {Computer Science - Computation and Language, Computer Science - Software Engineering},
}

Downloads: 0