Аннотации:
Ensuring the best quality and performance of
modern speech technologies, today, is possible
based on the widespread use of machine learning methods. The idea of this project is to study
and implement an end-to-end system of automatic speech recognition using machine learning
methods, as well as to develop new mathematical models and algorithms for solving the problem
of automatic speech recognition for agglutinative (Turkic) languages.
Many research papers have shown that deep
learning methods make it easier to train automatic speech recognition systems that use an endto-end approach. This method can also train an
automatic speech recognition system directly, that
is, without manual work with raw signals. Despite
the good recognition quality, this model has some
drawbacks. These disadvantages are based on
the need for a large amount of data for training.
This is a serious problem for low-data languages, especially Turkic languages such as Kazakh
and Azerbaijani. To solve this problem, various
methods are needed to apply. Some methods are
used for end-to-end speech recognition of languages belonging to the group of languages of the
same family (agglutinative languages). Method
for low-resource languages is transfer learning,
and for large resources – multi-task learning. To
increase efficiency and quickly solve the problem associated with a limited resource, transfer
learning was used for the end-to-end model. The
transfer learning method helped to fit a model
trained on the Kazakh dataset to the Azerbaijani
dataset. Thereby, two language corpora were
trained simultaneously. Conducted experiments
with two corpora show that transfer learning
can reduce the symbol error rate, phoneme error
rate (PER), by 14.23 % compared to baseline
models (DNN+HMM, WaveNet, and CNC+LM).
Therefore, the realized model with the transfer method can be used to recognize other lowresource languages