Cybernetics And Systems Analysis logo
Editorial Board Announcements Abstracts Authors Archive
KIBERNETYKA TA SYSTEMNYI ANALIZ
International Theoretical Science Journal
-->


DOI 10.34229/KCA2522-9664.26.3.4
UDC 004.8

D. Yuvzhenko
National Technical University “Igor Sikorsky Kyiv Polytechnic Institute,”
Kyiv, Ukraine, d.yuvzhenko@kpi.ua

S. Stirenko
National Technical University “Igor Sikorsky Kyiv Polytechnic Institute,”
Kyiv, Ukraine,s.stirenko@kpi.ua


A COMPARATIVE STUDY OF CHUNKING STRATEGIES
FOR RETRIEVAL-AUGMENTED GENERATION

Abstract. An empirical comparative study of four document segmentation strategies is presented: fixed windows of 256, 512, and 1024 tokens, and semantic segmentation based on a large language model. Experiments were conducted on long, semantically coherent texts from the SQuALITY dataset. The evaluation was performed on 225 question–answer pairs using Precision@5 and Recall@5 (top-5 retrieval metrics), answer quality metrics (Exact Match and token-level F1), and average retrieval latency. The results reveal a clear trade-off between retrieval precision and recall driven by granularity: smaller fragments provide higher precision, whereas larger fragments substantially increase recall and improve answer quality in terms of F1. Within this experimental setting, semantic segmentation demonstrates competitive results but does not show a consistent advantage over fixed windows of 512–1024 tokens. A reduction in retrieval latency is observed when using larger segments, which can be explained by lower vector-index density. A reproducible evaluation procedure and practical recommendations for selecting a segmentation strategy for efficient RAG systems are provided.

Keywords: Retrieval-Augmented Generation, RAG, chunking, semantic search, long-document question answering, chunking strategies.


full text

REFERENCES




© 2026 Kibernetika.org. All rights reserved.