Аннотации:
This study aims to develop a hybrid system for the automatic annotation of scientific texts that efficiently processes
multilingual publications using state-of-the-art natural language processing (NLP) technologies. The system integrates
classical algorithms (Gensim, NLTK) with transformer-based models via the Cohere API to achieve high semantic
consistency and accuracy in annotations. The system architecture comprises modules for data acquisition, preprocessing,
manual and automatic annotation, data storage, and quality control. The performance of the proposed model was
benchmarked against established methods such as BERTSUM, TF-IDF + LSA, and GPT-3.5-turbo using evaluation metrics
including ROUGE, BLEU, and METEOR. The hybrid model outperformed other automated systems, demonstrating superior
scores across ROUGE-1 (0.52), BLEU (0.41), and METEOR (0.39) metrics, indicating its effectiveness in producing concise
and semantically accurate summaries. The system also achieved 100% language detection accuracy and 90% accuracy in
semantic word relationships via Word2Vec. The integration of traditional statistical methods with advanced transformer
models enables the proposed system to deliver high-quality annotations suitable for diverse scientific domains. The results
validate the model’s ability to process and summarize complex scientific texts effectively. This system provides a scalable,
secure, and user-friendly platform for researchers, institutions, and developers. It supports multilingual annotation, seamless
API integration, and potential deployment in cloud environments, offering significant benefits for academic, biomedical, and
information-intensive sectors.