C&SA | Contents

Volume 62 >>> № 3 MAY — JUNE 2026

-->

UDC 004.89

O. Pastukh
Ternopil Ivan Puluj National Technical University, Ternopil, Ukraine,
ol_pas@tntu.edu.ua

V. Yatsyshyn
Ternopil Ivan Puluj National Technical University, Ternopil, Ukraine,
yacyshyn@tntu.edu.ua

O. Zadvornyi
Ternopil Ivan Puluj National Technical University, Ternopil, Ukraine,
zadvornyi.alex16@gmail.com

A TECHNOLOGY FOR GENERATION AND QUALITY EVALUATION
OF LLM-GENERATED SOURCE CODE

Abstract. This paper proposes a zero-shot prompting-based technology for generating program code and introduces an integral metric for assessing its overall quality. The integral quality metric combines four groups of indicators: full model confidence, semantic quality, structural integrity, and dynamic code execution. An empirical study of two types of prompt structures has been conducted. The proposed metric enables a reproducible and comprehensive assessment that integrates functional correctness with key attributes of maintainability and code reliability. The results provide a foundation for the further standardization of prompt engineering and the development of objective evaluation methodologies for LLM-generated program code in real-world information systems.

Keywords: large language models (LLMs), source code generation, code quality assessment, zero-shot prompting, prompt engineering, integrated quality metric.

full text

REFERENCES

1. Chen M., Tworek J., Jun H. et al. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374. 2021. https://doi.org/10.48550/arXiv.2107.03374.
2. Austin J., Odena A., Nye M. et al. Program synthesis with large language models. arXiv preprint arXiv:2108.07732. 2021. https://doi.org/10.48550/arXiv.2108.07732.
3. Anisimov A.V., Marchenko O.O., Nasirov E.M., Taranukha V.Y. Comparative analysis of neural network models for text classification problems. Cybernetics and Systems Analysis. 2025. Vol. 61, N 4. P. 339–346. https://doi.org/10.1007/s10559-025-00772-0.
4. Ding C., Wang J. A prompt example construction method based on clustering and semantic similarity. Systems. 2024. Vol. 12, N 10. Article number 410. https://doi.org/10.3390/systems12100410.
5. Zgurovsky M.Z. Global trends in artificial intelligence. Challenges, opportunities, and prospects. Cybernetics and System Analysis. 2025. Vol. 61, N 4. P. 533–553. https://doi.org/10.1007/s10559-025-00790-y.
6. Sepidband M., Taherkhani H., Wang S. et al. Enhancing LLM-based code generation with complexity metrics: A feedback-driven approach. arXiv preprint arXiv:2505.23953. 2025. https://doi.org/10.48550/arXiv.2505.23953.
7. Rahman M., Khatoonabadi S., Shihab E. Beyond synthetic benchmarks: Evaluating LLM performance on real-world class-level code generation. arXiv preprint arXiv:2510.26130. 2025. https://doi.org/10.48550/arXiv.2510.26130.
8. Sabra A., Schmitt O., Tyler J. Assessing the quality and security of ai-generated code: A quantitative analysis. arXiv preprint arXiv:2508.14727. 2025. https://doi.org/10.48550/arXiv.2508.14727.
9. Jimenez C., Yang J., Wettig A. et al. SWE-bench: Can language models resolve real-world GitHub issues? arXiv preprint arXiv:2310.06770. 2024. https://doi.org/10.48550/arXiv.2310.06770.
10. Kharchenko V., Yakovlev S., Veprytska O., Illiashenko O., Fesenko H. Explaining artificial inelligence as a service: Metodology of assessment and quality models. Cybernetics and System Analysis. 2025. Vol. 61, N 4. P. 175–185. https://doi.org/10.1007/s10559-025-00757-z.
11. Licorish S.A., Bajpai A., Arora C. et al. Comparing Human and LLM generated code: The jury is still out! arXiv preprint arXiv:2501.16857. 2025. https://doi.org/10.48550/arXiv.2501.16857.

UDC 004.89

A TECHNOLOGY FOR GENERATION AND QUALITY EVALUATION OF LLM-GENERATED SOURCE CODE

REFERENCES

A TECHNOLOGY FOR GENERATION AND QUALITY EVALUATION
OF LLM-GENERATED SOURCE CODE