Polyglot: An Extensible Framework to Benchmark Code Translation with LLMs

Polyglot: An Extensible Framework to Benchmark Code Translation with LLMs. Vieira, M., Shah, P. A., Shah, B., & Krasniqi, R. In 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 2363–2375, 2025. ISSN: 2643-1572

Paper doi abstract bibtex

Large Language Models (LLMs) show great potential for automating code-related tasks. However, sound assessments are necessary to understand their true capabilities, particularly in code translation, where reliability is crucial. We introduce Polyglot, an automated, multi-language framework for evaluating the translation quality of LLMs between different programming languages. Leveraging the IBM CodeNet Project, an extensive collection of coding problems in multiple languages, we assess translation quality using syntactic correctness, execution reliability, semantic preservation, and static code metrics. Our evaluation focuses on translating C to Java, Python, and Rust, languages that follow distinct paradigms and represent alternatives to modernize C-based systems. We evaluate open-source LLMs using three prompting strategies to understand the impact on translation performance. Our findings highlight that while LLMs show promising results for simple code translation, their limitations regarding complex logic and distinct language paradigms require further analysis.

@inproceedings{vieira_polyglot_2025,
	title = {Polyglot: {An} {Extensible} {Framework} to {Benchmark} {Code} {Translation} with {LLMs}},
	shorttitle = {Polyglot},
	url = {https://ieeexplore.ieee.org/document/11334550},
	doi = {10.1109/ASE63991.2025.00195},
	abstract = {Large Language Models (LLMs) show great potential for automating code-related tasks. However, sound assessments are necessary to understand their true capabilities, particularly in code translation, where reliability is crucial. We introduce Polyglot, an automated, multi-language framework for evaluating the translation quality of LLMs between different programming languages. Leveraging the IBM CodeNet Project, an extensive collection of coding problems in multiple languages, we assess translation quality using syntactic correctness, execution reliability, semantic preservation, and static code metrics. Our evaluation focuses on translating C to Java, Python, and Rust, languages that follow distinct paradigms and represent alternatives to modernize C-based systems. We evaluate open-source LLMs using three prompting strategies to understand the impact on translation performance. Our findings highlight that while LLMs show promising results for simple code translation, their limitations regarding complex logic and distinct language paradigms require further analysis.},
	urldate = {2026-02-02},
	booktitle = {2025 40th {IEEE}/{ACM} {International} {Conference} on {Automated} {Software} {Engineering} ({ASE})},
	author = {Vieira, Marco and Shah, Priyam Ashish and Shah, Bhavain and Krasniqi, Rrezarta},
	year = {2025},
	note = {ISSN: 2643-1572},
	keywords = {Conference Full Papers},
	pages = {2363--2375},
}

Downloads: 0

{"_id":"PiQ7igftdbgk754JJ","bibbaseid":"vieira-shah-shah-krasniqi-polyglotanextensibleframeworktobenchmarkcodetranslationwithllms-2025","author_short":["Vieira, M.","Shah, P. A.","Shah, B.","Krasniqi, R."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","title":"Polyglot: An Extensible Framework to Benchmark Code Translation with LLMs","shorttitle":"Polyglot","url":"https://ieeexplore.ieee.org/document/11334550","doi":"10.1109/ASE63991.2025.00195","abstract":"Large Language Models (LLMs) show great potential for automating code-related tasks. However, sound assessments are necessary to understand their true capabilities, particularly in code translation, where reliability is crucial. We introduce Polyglot, an automated, multi-language framework for evaluating the translation quality of LLMs between different programming languages. Leveraging the IBM CodeNet Project, an extensive collection of coding problems in multiple languages, we assess translation quality using syntactic correctness, execution reliability, semantic preservation, and static code metrics. Our evaluation focuses on translating C to Java, Python, and Rust, languages that follow distinct paradigms and represent alternatives to modernize C-based systems. We evaluate open-source LLMs using three prompting strategies to understand the impact on translation performance. Our findings highlight that while LLMs show promising results for simple code translation, their limitations regarding complex logic and distinct language paradigms require further analysis.","urldate":"2026-02-02","booktitle":"2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE)","author":[{"propositions":[],"lastnames":["Vieira"],"firstnames":["Marco"],"suffixes":[]},{"propositions":[],"lastnames":["Shah"],"firstnames":["Priyam","Ashish"],"suffixes":[]},{"propositions":[],"lastnames":["Shah"],"firstnames":["Bhavain"],"suffixes":[]},{"propositions":[],"lastnames":["Krasniqi"],"firstnames":["Rrezarta"],"suffixes":[]}],"year":"2025","note":"ISSN: 2643-1572","keywords":"Conference Full Papers","pages":"2363–2375","bibtex":"@inproceedings{vieira_polyglot_2025,\n\ttitle = {Polyglot: {An} {Extensible} {Framework} to {Benchmark} {Code} {Translation} with {LLMs}},\n\tshorttitle = {Polyglot},\n\turl = {https://ieeexplore.ieee.org/document/11334550},\n\tdoi = {10.1109/ASE63991.2025.00195},\n\tabstract = {Large Language Models (LLMs) show great potential for automating code-related tasks. However, sound assessments are necessary to understand their true capabilities, particularly in code translation, where reliability is crucial. We introduce Polyglot, an automated, multi-language framework for evaluating the translation quality of LLMs between different programming languages. Leveraging the IBM CodeNet Project, an extensive collection of coding problems in multiple languages, we assess translation quality using syntactic correctness, execution reliability, semantic preservation, and static code metrics. Our evaluation focuses on translating C to Java, Python, and Rust, languages that follow distinct paradigms and represent alternatives to modernize C-based systems. We evaluate open-source LLMs using three prompting strategies to understand the impact on translation performance. Our findings highlight that while LLMs show promising results for simple code translation, their limitations regarding complex logic and distinct language paradigms require further analysis.},\n\turldate = {2026-02-02},\n\tbooktitle = {2025 40th {IEEE}/{ACM} {International} {Conference} on {Automated} {Software} {Engineering} ({ASE})},\n\tauthor = {Vieira, Marco and Shah, Priyam Ashish and Shah, Bhavain and Krasniqi, Rrezarta},\n\tyear = {2025},\n\tnote = {ISSN: 2643-1572},\n\tkeywords = {Conference Full Papers},\n\tpages = {2363--2375},\n}\n\n\n\n","author_short":["Vieira, M.","Shah, P. A.","Shah, B.","Krasniqi, R."],"key":"vieira_polyglot_2025","id":"vieira_polyglot_2025","bibbaseid":"vieira-shah-shah-krasniqi-polyglotanextensibleframeworktobenchmarkcodetranslationwithllms-2025","role":"author","urls":{"Paper":"https://ieeexplore.ieee.org/document/11334550"},"keyword":["Conference Full Papers"],"metadata":{"authorlinks":{}},"downloads":0},"bibtype":"inproceedings","biburl":"https://bibbase.org/zotero/Rrezarta.Krasniqi","dataSources":["37aX9ioouEvzbunGp"],"keywords":["conference full papers"],"search_terms":["polyglot","extensible","framework","benchmark","code","translation","llms","vieira","shah","shah","krasniqi"],"title":"Polyglot: An Extensible Framework to Benchmark Code Translation with LLMs","year":2025}