Abstract:
The paper presents a hybrid machine learning model for the spatial segmentation
of soils by salinity using multispectral satellite data from Sentinel-2 and climate parameters
of the ERA5-Land model. The proposed method aims to solve the problem of accurate
soil cover segmentation under climate change and high spatial heterogeneity of data.
The approach includes the sequential application of unsupervised learning algorithms
(K-Means, hierarchical clustering, DBSCAN), the XGBoost model, and a multitasking
neural network that performs simultaneous classification and regression. At the first
stage, pseudo-labels are formed using K-Means, then a probabilistic assessment of object
membership in classes and ensemble voting of clustering algorithms are carried out. The
final model is trained on an extended feature space and demonstrates improved results
compared to traditional approaches. Experiments on a sample of 33,624 observations
(23,536—training sample, 10,088—test sample) showed an increase in the Silhouette Score
value from 0.7840 to 0.8156 and a decrease in the Davies–Bouldin Score from 0.3567 to
0.3022. The classification accuracy was 99.99%, with only one error in more than 10,000 test
objects. The results confirmed the proposed method’s high efficiency and applicability for
remote monitoring, environmental analysis, and sustainable land management.