. Speaker recognition by ultrashort utterances

Medetov, B.; Nurlankyzy, A.; Namazbayev, T.; Akhmediyarova, A.; Zhetpisbayev, K.; Zhetpisbayeva, A.; Kargulova, A.

DSpace Home
→
Научные статьи
→
01. Публикации в изданиях зарубежных стран
→
Business, Management and Accounting
→
View Item

dc.contributor.author	Medetov, B.
dc.contributor.author	Nurlankyzy, A.
dc.contributor.author	Namazbayev, T.
dc.contributor.author	Akhmediyarova, A.
dc.contributor.author	Zhetpisbayev, K.
dc.contributor.author	Zhetpisbayeva, A.
dc.contributor.author	Kargulova, A.
dc.date.accessioned	2026-02-23T11:25:49Z
dc.date.available	2026-02-23T11:25:49Z
dc.date.issued	2025
dc.identifier.citation	Medetov, B., Nurlankyzy, A., Namazbayev, T., Akhmediyarova, A., Zhetpisbayev, K., Zhetpisbayeva, A., Kargulova, A. (2025). Speaker recognition by ultrashort utterances. Eastern-European Journal of Enterprise Technologies, 2 (9 (134)), 62–69. https://doi.org/10.15587/1729-4061.2025.327907	ru
dc.identifier.issn	1729-3774
dc.identifier.other	doi.org/10.15587/1729-4061.2025.327907
dc.identifier.uri	http://repository.enu.kz/handle/enu/29351
dc.description.abstract	The object of this study is the accuracy of announcer identification based on short utterances. To solve the task of speaker identification based on ultrashort speech utterances, a phoneme-by-phoneme approach to constructing voice models has been proposed within the framework of the study. The validity of this approach is based on the fact that short utterances usually contain a limited number of phonemes. In this regard, a hypothesis was put forward assuming that in order to increase the accuracy of announcer identification based on short utterances, it is necessary to analyze the sound of specific phonemes by different announcers. The experiments involved speech recordings of monosyllabic words with corresponding phonemes, on the basis of which, using the ECAPA-TDNN neural network architecture, announcer voice models were constructed. The experimental studies showed that voice models constructed based on the sounds of only one model provide higher announcer identification accuracy compared to generalized models constructed based on all speech sounds. It was also found that different phonemes provide different announcer identification accuracy. For example, with a speech signal duration of 2–3 seconds, the accuracy of announcer identification by the generalized model was 75 %. And the accuracy of announcer identification using a model built on the basis of only one phoneme "E", with the same input data, was 85 %, which is 10 percentage points higher than that of the generalized model	ru
dc.language.iso	en	ru
dc.publisher	Eastern-European Journal of Enterprise Technologies	ru
dc.relation.ispartofseries	2 (9 (134)), 62–69.;
dc.subject	announcer recognition	ru
dc.subject	ultra-short utterances	ru
dc.subject	phoneme-by-phoneme recognition	ru
dc.subject	ECAPA-TDNN	ru
dc.subject	phonemes of the Kazakh language	ru
dc.title	. Speaker recognition by ultrashort utterances	ru
dc.type	Article	ru