Abstract:
This study focused on predicting the spatial distribution of environmental risk indicators
using mathematical modeling methods including machine learning. The northern industrial zone
of Pavlodar City in Kazakhstan was used as a model territory for the case. Nine models based on
the methods kNN, gradient boosting, artificial neural networks, Kriging, and multilevel b-spline
interpolation were employed to analyze pollution data and assess their effectiveness in predicting
pollution levels. Each model tackled the problem as a regression task, aiming to estimate the pollution
load index (PLI) values for specific locations. It was revealed that the maximum PLI values were
mainly located to the southwest of the TPPs over some distance from their territories according to
the average wind rose for Pavlodar City. Another area of high PLI was located in the northern part
of the studied region, near the Hg-accumulating ponds. The high PLI level is generally attributed
to the high concentration of Hg. Each studied method of interpolation can be used for spatial
distribution analysis; however, a comparison with the scientific literature revealed that Kriging
and MLBS interpolation can be used without extra calculations to produce non-linear, empirically
consistent, and smooth maps.