Репозиторий Евразийского национального университета имени Л.Н. Гумилева
Репозиторий Евразийского национального университета имени Л.Н. Гумилева
Репозиторий Евразийского национального университета имени Л.Н. Гумилева
Просмотр элемента 
  •   Главная
  • Научные статьи
  • 01. Публикации в изданиях зарубежных стран
  • Mathematics
  • Просмотр элемента
  •   Главная
  • Научные статьи
  • 01. Публикации в изданиях зарубежных стран
  • Mathematics
  • Просмотр элемента
JavaScript is disabled for your browser. Some features of this site may not work without it.

Analysis of Short Texts Using Intelligent Clustering Methods

Thumbnail
Автор
Tussupov, Jamalbek
Kassymova, Akmaral
Mukhanova, Ayagoz
Bissengaliyeva, Assyl
Azhibekova, Zhanar
Yessenova, Moldir
Abuova, Zhanargul
Дата
2025
Редактор
Algorithms
ISSN
1999-4893
xmlui.dri2xhtml.METS-1.0.item-identifier-citation
Tussupov, J.; Kassymova, A.; Mukhanova, A.; Bissengaliyeva, A.; Azhibekova, Z.; Yessenova, M.; Abuova, Z. Analysis of Short Texts Using Intelligent Clustering Methods. Algorithms 2025, 18, 289. https:// doi.org/10.3390/a18050289
Аннотации
This article presents a comprehensive review of short text clustering using stateof-the-art methods: Bidirectional Encoder Representations from Transformers (BERT), Term Frequency-Inverse Document Frequency (TF-IDF), and the novel hybrid method Latent Dirichlet Allocation + BERT + Autoencoder (LDA + BERT + AE). The article begins by outlining the theoretical foundation of each technique and its merits and limitations. BERT is critiqued for its ability to understand word dependence in text, while TF-IDF is lauded for its applicability in terms of importance assessment. The experimental section compares the efficacy of these methods in clustering short texts, with a specific focus on the hybrid LDA + BERT + AE approach. A detailed examination of the LDA-BERT model’s training and validation loss over 200 epochs shows that the loss values start above 1.2 and quickly decrease to around 0.8 within the first 25 epochs, eventually stabilizing at approximately 0.4. The close alignment of these curves suggests the model’s practical learning and generalization capabilities, with minimal overfitting. The study demonstrates that the hybrid LDA + BERT + AE method significantly enhances text clustering quality compared to individual methods. Based on the findings, the study recommends the optimum choice and use of clustering methods for different short texts and natural language processing operations. The applications of these methods in industrial and educational settings, where successful text handling and categorization are critical, are also addressed. The study ends by emphasizing the importance of the holistic handling of short texts for deeper semantic comprehension and effective information retrieval.
URI
http://repository.enu.kz/handle/enu/30658
Открыть
Analysis-of-Short-Texts-Using-Intelligent-Clustering-Methods_2025_Multidisciplinary-Digital-Publishing-Institute-MDPI.pdf (2.003Mb)
Collections
  • Mathematics[236]
Показать полную информацию
CORE Recommender

Евразийский национальный университет имени Л.Н. Гумилева | Научная библиотека | Контакты
Яндекс.Метрика
Научная библиотека | Контакты
 

Просмотр

Весь DSpaceСообщества и коллекцииДата публикацииАвторыНазванияТематикаЭта коллекцияДата публикацииАвторыНазванияТематика

Моя учетная запись

ВойтиРегистрация

Евразийский национальный университет имени Л.Н. Гумилева | Научная библиотека | Контакты
Яндекс.Метрика
Научная библиотека | Контакты