MALLM: Multi-Agent Large Language Models Framework

MALLM: Multi-Agent Large Language Models Framework. Becker, J., Kaesberg, L. B., Bauer, N., Wahle, J. P., Ruas, T., & Gipp, B. In Habernal, I., Schulam, P., & Tiedemann, J., editors, Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 418–439, Suzhou, China, November, 2025. Association for Computational Linguistics.

Paper doi abstract bibtex 1 download

Multi-agent debate (MAD) has demonstrated the ability to augment collective intelligence by scaling test-time compute and leveraging expertise. Current frameworks for MAD are often designed towards tool use, lack integrated evaluation, or provide limited configurability of agent personas, response generators, discussion paradigms, and decision protocols. We introduce MALLM (Multi-Agent Large Language Models), an open-source framework that enables systematic analysis of MAD components. MALLM offers more than 144 unique configurations of MAD, including (1) agent personas (e.g., Expert, Personality), (2) response generators (e.g., Critical, Reasoning), (3) discussion paradigms (e.g., Memory, Relay), and (4) decision protocols (e.g., Voting, Consensus). MALLM uses simple configuration files to define a debate. Furthermore, MALLM can load any textual Hugging Face dataset (e.g., MMLU-Pro, WinoGrande) and provides an evaluation pipeline for easy comparison of MAD configurations. MALLM enables researchers to systematically configure, run, and evaluate debates for their problems, facilitating the understanding of the components and their interplay.

@inproceedings{becker_mallm_2025,
	address = {Suzhou, China},
	title = {{MALLM}: {Multi}-{Agent} {Large} {Language} {Models} {Framework}},
	isbn = {979-8-89176-334-0},
	shorttitle = {{MALLM}},
	url = {https://aclanthology.org/2025.emnlp-demos.29/},
	doi = {10.18653/v1/2025.emnlp-demos.29},
	abstract = {Multi-agent debate (MAD) has demonstrated the ability to augment collective intelligence by scaling test-time compute and leveraging expertise. Current frameworks for MAD are often designed towards tool use, lack integrated evaluation, or provide limited configurability of agent personas, response generators, discussion paradigms, and decision protocols. We introduce MALLM (Multi-Agent Large Language Models), an open-source framework that enables systematic analysis of MAD components. MALLM offers more than 144 unique configurations of MAD, including (1) agent personas (e.g., Expert, Personality), (2) response generators (e.g., Critical, Reasoning), (3) discussion paradigms (e.g., Memory, Relay), and (4) decision protocols (e.g., Voting, Consensus). MALLM uses simple configuration files to define a debate. Furthermore, MALLM can load any textual Hugging Face dataset (e.g., MMLU-Pro, WinoGrande) and provides an evaluation pipeline for easy comparison of MAD configurations. MALLM enables researchers to systematically configure, run, and evaluate debates for their problems, facilitating the understanding of the components and their interplay.},
	urldate = {2026-02-03},
	booktitle = {Proceedings of the 2025 {Conference} on {Empirical} {Methods} in {Natural} {Language} {Processing}: {System} {Demonstrations}},
	publisher = {Association for Computational Linguistics},
	author = {Becker, Jonas and Kaesberg, Lars Benedikt and Bauer, Niklas and Wahle, Jan Philip and Ruas, Terry and Gipp, Bela},
	editor = {Habernal, Ivan and Schulam, Peter and Tiedemann, Jörg},
	month = nov,
	year = {2025},
	pages = {418--439},
}

Downloads: 1

{"_id":"eBiZbeunW9zW8F8Kv","bibbaseid":"becker-kaesberg-bauer-wahle-ruas-gipp-mallmmultiagentlargelanguagemodelsframework-2025","author_short":["Becker, J.","Kaesberg, L. B.","Bauer, N.","Wahle, J. P.","Ruas, T.","Gipp, B."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","address":"Suzhou, China","title":"MALLM: Multi-Agent Large Language Models Framework","isbn":"979-8-89176-334-0","shorttitle":"MALLM","url":"https://aclanthology.org/2025.emnlp-demos.29/","doi":"10.18653/v1/2025.emnlp-demos.29","abstract":"Multi-agent debate (MAD) has demonstrated the ability to augment collective intelligence by scaling test-time compute and leveraging expertise. Current frameworks for MAD are often designed towards tool use, lack integrated evaluation, or provide limited configurability of agent personas, response generators, discussion paradigms, and decision protocols. We introduce MALLM (Multi-Agent Large Language Models), an open-source framework that enables systematic analysis of MAD components. MALLM offers more than 144 unique configurations of MAD, including (1) agent personas (e.g., Expert, Personality), (2) response generators (e.g., Critical, Reasoning), (3) discussion paradigms (e.g., Memory, Relay), and (4) decision protocols (e.g., Voting, Consensus). MALLM uses simple configuration files to define a debate. Furthermore, MALLM can load any textual Hugging Face dataset (e.g., MMLU-Pro, WinoGrande) and provides an evaluation pipeline for easy comparison of MAD configurations. MALLM enables researchers to systematically configure, run, and evaluate debates for their problems, facilitating the understanding of the components and their interplay.","urldate":"2026-02-03","booktitle":"Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","publisher":"Association for Computational Linguistics","author":[{"propositions":[],"lastnames":["Becker"],"firstnames":["Jonas"],"suffixes":[]},{"propositions":[],"lastnames":["Kaesberg"],"firstnames":["Lars","Benedikt"],"suffixes":[]},{"propositions":[],"lastnames":["Bauer"],"firstnames":["Niklas"],"suffixes":[]},{"propositions":[],"lastnames":["Wahle"],"firstnames":["Jan","Philip"],"suffixes":[]},{"propositions":[],"lastnames":["Ruas"],"firstnames":["Terry"],"suffixes":[]},{"propositions":[],"lastnames":["Gipp"],"firstnames":["Bela"],"suffixes":[]}],"editor":[{"propositions":[],"lastnames":["Habernal"],"firstnames":["Ivan"],"suffixes":[]},{"propositions":[],"lastnames":["Schulam"],"firstnames":["Peter"],"suffixes":[]},{"propositions":[],"lastnames":["Tiedemann"],"firstnames":["Jörg"],"suffixes":[]}],"month":"November","year":"2025","pages":"418–439","bibtex":"@inproceedings{becker_mallm_2025,\n\taddress = {Suzhou, China},\n\ttitle = {{MALLM}: {Multi}-{Agent} {Large} {Language} {Models} {Framework}},\n\tisbn = {979-8-89176-334-0},\n\tshorttitle = {{MALLM}},\n\turl = {https://aclanthology.org/2025.emnlp-demos.29/},\n\tdoi = {10.18653/v1/2025.emnlp-demos.29},\n\tabstract = {Multi-agent debate (MAD) has demonstrated the ability to augment collective intelligence by scaling test-time compute and leveraging expertise. Current frameworks for MAD are often designed towards tool use, lack integrated evaluation, or provide limited configurability of agent personas, response generators, discussion paradigms, and decision protocols. We introduce MALLM (Multi-Agent Large Language Models), an open-source framework that enables systematic analysis of MAD components. MALLM offers more than 144 unique configurations of MAD, including (1) agent personas (e.g., Expert, Personality), (2) response generators (e.g., Critical, Reasoning), (3) discussion paradigms (e.g., Memory, Relay), and (4) decision protocols (e.g., Voting, Consensus). MALLM uses simple configuration files to define a debate. Furthermore, MALLM can load any textual Hugging Face dataset (e.g., MMLU-Pro, WinoGrande) and provides an evaluation pipeline for easy comparison of MAD configurations. MALLM enables researchers to systematically configure, run, and evaluate debates for their problems, facilitating the understanding of the components and their interplay.},\n\turldate = {2026-02-03},\n\tbooktitle = {Proceedings of the 2025 {Conference} on {Empirical} {Methods} in {Natural} {Language} {Processing}: {System} {Demonstrations}},\n\tpublisher = {Association for Computational Linguistics},\n\tauthor = {Becker, Jonas and Kaesberg, Lars Benedikt and Bauer, Niklas and Wahle, Jan Philip and Ruas, Terry and Gipp, Bela},\n\teditor = {Habernal, Ivan and Schulam, Peter and Tiedemann, Jörg},\n\tmonth = nov,\n\tyear = {2025},\n\tpages = {418--439},\n}\n\n","author_short":["Becker, J.","Kaesberg, L. B.","Bauer, N.","Wahle, J. P.","Ruas, T.","Gipp, B."],"editor_short":["Habernal, I.","Schulam, P.","Tiedemann, J."],"key":"becker_mallm_2025","id":"becker_mallm_2025","bibbaseid":"becker-kaesberg-bauer-wahle-ruas-gipp-mallmmultiagentlargelanguagemodelsframework-2025","role":"author","urls":{"Paper":"https://aclanthology.org/2025.emnlp-demos.29/"},"metadata":{"authorlinks":{}},"downloads":1},"bibtype":"inproceedings","biburl":"https://api.zotero.org/users/11945993/collections/ZFRN5S7H/items?key=vDp6ir3hxPv6qnr9sd8LWFWG&format=bibtex&limit=100","dataSources":["Ntor4bPwHAu6xbP9D","Zp98Nuv7ftsXLefzT","kHqqD8pzLteJJWS2X","rWKnJXWNHJNeBcwtc","vt2kLcQ9HA5XKwuhk"],"keywords":[],"search_terms":["mallm","multi","agent","large","language","models","framework","becker","kaesberg","bauer","wahle","ruas","gipp"],"title":"MALLM: Multi-Agent Large Language Models Framework","year":2025,"downloads":1}