Аннотации:
This article presents a comprehensive comparative analysis of two advanced
hybrid machine learning approaches for keyword extraction: bidirectional
encoder representations from transformers (BERT) combined with
autoencoder (AE) and term frequency-inverse document frequency (TF-IDF)
combined with autoencoder. The research targets the task of semantic
analysis in text data to evaluate the effectiveness of these methods in
ensuring adequate keyword coverage across diverse text corpora. The study
delves into the architecture and operational principles of each method, with a
particular focus on the integration with autoencoders to enhance the
semantic integrity and relevance of the extracted keywords. The
experimental section provides a detailed performance analysis of both
methods on various text datasets, highlighting how the structure and
semantic richness of the source data influence the outcomes. The evaluation
methodology includes precision, recall, and F1-score metrics. The paper
discusses the advantages and disadvantages of each approach and their
suitability for specific keyword extraction tasks. The findings offer valuable
insights for the scientific community, aiding in the selection of the most
appropriate text processing method for applications requiring deep semantic
understanding and high accuracy in information extraction.