Аннотации:
The study presents a detailed framework designed to develop a Question-Answering System
(QA System) for the Kazakh language, highlighting its importance in the field of Low Resource Languages
(LRL) Text Processing. This effort aims to fill the gap in resources for languages that lack substantial digital
tools. Specifically, the project focuses on geographical questions about Kazakhstan, aiming to enhance accessibility and understanding of the nation’s geography. The challenges associated with LRL text processing are
addressed through the creation of a question-answer corpus, training a Bidirectional Encoder Representations
from Transformers (BERT)-based model, and evaluating the system using Bilingual Evaluation Understudy
(BLEU) metrics. The endeavor begins with the careful compilation of a corpus containing 50,000 questions,
which supports the subsequent development phases and ensures the creation of a robust QA System. In the
second phase, a BERT model equipped with 91,821,056 parameters is trained, enhancing the model’s
ability to understand the complex linguistic nuances of the Kazakh language. The final phase involves a
rigorous evaluation using BLEU metrics, where the system achieves an impressive average score of 0.9576.
This score indicates a high level of agreement between the system-generated answers and the reference
answers, demonstrating the system’s effectiveness at interpreting and responding to queries about Kazakh
geography. This study significantly contributes to the field by providing a systematic and nuanced approach
to QA System development and underscores the model’s effectiveness through thorough evaluation and
comparative analysis.