TechMB: Exploring the Potential of Vision Language Models for Interpreting Technical Drawings

TechMB: Exploring the Potential of Vision Language Models for Interpreting Technical Drawings. Kunz, L., Klostermeier, M., Thanabalan, K., Legler, T., & Ruskowski, M. In Krause, D., Paetzold, K., & Wartzack, S., editors, DS 140: Proceedings of the 36th Symposium Design for X (DFX2025). Design for X Symposium (DFX-2025), September 11-12, Hamburg, Germany, pages 179–188, 2025. The Design Society.

Paper doi abstract bibtex 1 download

Vision Language Models (VLMs) have gained widespread adoption among end users. Their versatility also sparked interest in applying them to more domain-specific challenges. This paper investigates the principal suitability of small-scale VLMs in the task of evaluating the manufacturability of parts based on a technical drawing by providing the Technical drawings for Manufacturability Benchmark (TechMB). A selection of small-scale VLMs is then tested using this benchmark. The results indicate that the models show potential for text extraction and interpretation of domain-specific terminology. However, they struggle with the reasoning about the manufacturing of the depicted parts and partly even with the delivery of concise and precise answers necessary for the targeted task.

@inproceedings{KunzEtAl2025Dfx,
    author = { Leonhard Kunz and Mario Klostermeier and Kokulan  Thanabalan and Tatjana Legler and Martin Ruskowski},
    editor = { Dieter Krause and  Kristin Paetzold and Sandro Wartzack},
    title = {TechMB: Exploring the Potential of Vision Language Models for Interpreting Technical Drawings},
 	booktitle = {DS 140: Proceedings of the 36th Symposium Design for X (DFX2025). Design for X Symposium (DFX-2025), September 11-12, Hamburg, Germany},
	abstract = {Vision Language Models (VLMs) have gained widespread adoption among end users. Their versatility also sparked interest in applying them to more domain-specific challenges. This paper investigates the principal suitability of small-scale VLMs in the task of evaluating the manufacturability of parts based on a technical drawing by providing the Technical drawings for Manufacturability Benchmark (TechMB). A selection of small-scale VLMs is then tested using this benchmark. The results indicate that the models show potential for text extraction and interpretation of domain-specific terminology. However, they struggle with the reasoning about the manufacturing of the depicted parts and partly even with the delivery of concise and precise answers necessary for the targeted task.},
	keywords = { Benchmark, Visual Question Answering, CAD, Design for Manufacturability},
    year = {2025},
    pages = {179--188},
	doi = {10.35199/dfx2025.19},
    publisher = {The Design Society},
  	url = {https://www.wi2.uni-trier.de/shared/publications/DFX_Symposium_2025_XDP-Opt.pdf}
}

Downloads: 1

{"_id":"GCdPW5EuPrBz3KGKb","bibbaseid":"kunz-klostermeier-thanabalan-legler-ruskowski-techmbexploringthepotentialofvisionlanguagemodelsforinterpretingtechnicaldrawings-2025","author_short":["Kunz, L.","Klostermeier, M.","Thanabalan, K.","Legler, T.","Ruskowski, M."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["Leonhard"],"propositions":[],"lastnames":["Kunz"],"suffixes":[]},{"firstnames":["Mario"],"propositions":[],"lastnames":["Klostermeier"],"suffixes":[]},{"firstnames":["Kokulan"],"propositions":[],"lastnames":["Thanabalan"],"suffixes":[]},{"firstnames":["Tatjana"],"propositions":[],"lastnames":["Legler"],"suffixes":[]},{"firstnames":["Martin"],"propositions":[],"lastnames":["Ruskowski"],"suffixes":[]}],"editor":[{"firstnames":["Dieter"],"propositions":[],"lastnames":["Krause"],"suffixes":[]},{"firstnames":["Kristin"],"propositions":[],"lastnames":["Paetzold"],"suffixes":[]},{"firstnames":["Sandro"],"propositions":[],"lastnames":["Wartzack"],"suffixes":[]}],"title":"TechMB: Exploring the Potential of Vision Language Models for Interpreting Technical Drawings","booktitle":"DS 140: Proceedings of the 36th Symposium Design for X (DFX2025). Design for X Symposium (DFX-2025), September 11-12, Hamburg, Germany","abstract":"Vision Language Models (VLMs) have gained widespread adoption among end users. Their versatility also sparked interest in applying them to more domain-specific challenges. This paper investigates the principal suitability of small-scale VLMs in the task of evaluating the manufacturability of parts based on a technical drawing by providing the Technical drawings for Manufacturability Benchmark (TechMB). A selection of small-scale VLMs is then tested using this benchmark. The results indicate that the models show potential for text extraction and interpretation of domain-specific terminology. However, they struggle with the reasoning about the manufacturing of the depicted parts and partly even with the delivery of concise and precise answers necessary for the targeted task.","keywords":"Benchmark, Visual Question Answering, CAD, Design for Manufacturability","year":"2025","pages":"179–188","doi":"10.35199/dfx2025.19","publisher":"The Design Society","url":"https://www.wi2.uni-trier.de/shared/publications/DFX_Symposium_2025_XDP-Opt.pdf","bibtex":"@inproceedings{KunzEtAl2025Dfx,\n author = { Leonhard Kunz and Mario Klostermeier and Kokulan Thanabalan and Tatjana Legler and Martin Ruskowski},\n editor = { Dieter Krause and Kristin Paetzold and Sandro Wartzack},\n title = {TechMB: Exploring the Potential of Vision Language Models for Interpreting Technical Drawings},\n \tbooktitle = {DS 140: Proceedings of the 36th Symposium Design for X (DFX2025). Design for X Symposium (DFX-2025), September 11-12, Hamburg, Germany},\n\tabstract = {Vision Language Models (VLMs) have gained widespread adoption among end users. Their versatility also sparked interest in applying them to more domain-specific challenges. This paper investigates the principal suitability of small-scale VLMs in the task of evaluating the manufacturability of parts based on a technical drawing by providing the Technical drawings for Manufacturability Benchmark (TechMB). A selection of small-scale VLMs is then tested using this benchmark. The results indicate that the models show potential for text extraction and interpretation of domain-specific terminology. However, they struggle with the reasoning about the manufacturing of the depicted parts and partly even with the delivery of concise and precise answers necessary for the targeted task.},\n\tkeywords = { Benchmark, Visual Question Answering, CAD, Design for Manufacturability},\n year = {2025},\n pages = {179--188},\n\tdoi = {10.35199/dfx2025.19},\n publisher = {The Design Society},\n \turl = {https://www.wi2.uni-trier.de/shared/publications/DFX_Symposium_2025_XDP-Opt.pdf}\n}\n\n\n","author_short":["Kunz, L.","Klostermeier, M.","Thanabalan, K.","Legler, T.","Ruskowski, M."],"editor_short":["Krause, D.","Paetzold, K.","Wartzack, S."],"key":"KunzEtAl2025Dfx","id":"KunzEtAl2025Dfx","bibbaseid":"kunz-klostermeier-thanabalan-legler-ruskowski-techmbexploringthepotentialofvisionlanguagemodelsforinterpretingtechnicaldrawings-2025","role":"author","urls":{"Paper":"https://www.wi2.uni-trier.de/shared/publications/DFX_Symposium_2025_XDP-Opt.pdf"},"keyword":["Benchmark","Visual Question Answering","CAD","Design for Manufacturability"],"metadata":{"authorlinks":{}},"downloads":1},"bibtype":"inproceedings","biburl":"https://web.wi2.uni-trier.de/publications/WI2Publikationen.bib","dataSources":["MSp3DzP4ToPojqkFy"],"keywords":["benchmark","visual question answering","cad","design for manufacturability"],"search_terms":["techmb","exploring","potential","vision","language","models","interpreting","technical","drawings","kunz","klostermeier","thanabalan","legler","ruskowski"],"title":"TechMB: Exploring the Potential of Vision Language Models for Interpreting Technical Drawings","year":2025,"downloads":1}