FOLIA GEOGRAPHICA

Folia Geographica 2020, 62/1, pp. 112 - 126

SOLVING OF CLASSIFICATION PROBLEM IN SPATIAL ANALYSIS APPLYING THE TECHNOLOGY OF GRADIENT BOOSTING CATBOOST

Ruslan Z. SAFAROV A, Zhanat K. SHOMANOVA B*, Yuriy G. NOSSENKO C, Zharas G. BERDENOV D, Zhuldyz B. BEXEITOVA E, Adai S. SHOMANOV F, Madina MANSUROVA G

Received: January 19, 2019 | Revised: May 1, 2020 | Accepted: May 7, 2020

Paper No. 20-62/1-558


A L.N. Gumilyov Eurasian National University, Nur-Sultan, Kazakhstan
ruslanbox@yandex.ru

B* Pavlodar State Pedagogical University, Pavlodar, Kazakhstan
zshoman@yandex.ru (corresponding author)

C Innovative University of Eurasia, Pavlodar, Kazakhstan
nosenko1980@yandex.ru

D L.N. Gumilyov Eurasian National University, Nur-Sultan, Kazakhstan
berdenov-z@mail.ru

E Eurasian Center of Innovative Development DARA, Nur-Sultan, Kazakhstan
femida.pvl@gmail.com

F Nazarbayev University, Nur-Sultan, Kazakhstan
adai.shomanov@nu.edu.kz

G Al-Farabi Kazakh National University, Almaty, Kazakhstan
mansurova.madina@gmail.com

PDF FULL TEXT



Abstract
In the paper two models of spatial analysis are considered. The models are dedicated for spatial analysis of ecological factors distribution, such as distribution of contaminant concentration on researched territory. The models are created using the method of machine learning – gradient boosting. In order to build the models we have used open source effective library CatBoost. Functions AUC and Accuracy were calculated for each model. MultiClass – integrated function of CatBoost library was used for loss minimization. For solving the problem, it was necessary to define affiliation of searched point from test dataset to one of four classes. This problem belongs to the type of classification, or rather multiclassification. As a result of the studies, an effective model was obtained that allows one to perform with sufficient accuracy the spatial forecast of the factor distribution at points and regions of the studied field with an unknown gradient value of this factor. This model works adequately with a training dataset of 0.5% of all analyzed information about the object.

Key words
Spatial analysis, gradient boosting, CatBoost, machine learning, neural networks, computer modeling, geoecological maps


REFERENCES

  1. ABADI, M., AGARWAL, A., BARHAM, P., et al. (2016). TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. Retrieved from http://arxiv.org/abs/1603.04467.
  2. AZHAYEV, G., ESIMOVA, D., SONKO, S. M., et al. (2020). Geoecological environmental evaluation of Pavlodar region of the republic of Kazakhstan as a factor of perspectives for touristic activity. GeoJournal of Tourism and Geosites, 28(1), 104–113. https://doi.org/10.30892/gtg.28108-455.
  3. BEKETOVA, A., BERDENOV, Z. G., ATAEVA, G., et al. (2019). Geochemical monitoring of industrial center for development of recreational areas (on the example of Khromtau-don industrial hub, Kazakhstan). GeoJournal of Tourism and Geosites, 27(4), 1449–1463. https://doi.org/10.30892/gtg.27428-447.
  4. BERDENOV, Z., MENDIBAEV, E., SALIHOV, T., et al. (2017). Geoecological analysis of industrial cities: on the example of Aktobe agglomeration. Chemistry, 26(6), 890–902.
  5. BERROCAL, V. J., GELFAND, A. E., HOLLAND, D. M. (2010). A bivariate space-time downscaler under space and time misalignment. Annals of Applied Statistics, 4(4), 1942–1975. https://doi.org/10.1214/10-AOAS351.
  6. BIAU, G., CHAZAL, F., COHEN-STEINER, D., et al. (2011). A weighted k-nearest neighbor density estimate for geometric inference. Electronic Journal of Statistics, 5, 204–237. https://doi.org/10.1214/11-EJS606.
  7. CASATI, B., ROSS, G., STEPHENSON, D. B. (2004). A new intensity-scale approach for the verification of spatial precipitation forecasts. Meteorological Applications, 11(2), 141–154. https://doi.org/10.1017/S1350482704001239.
  8. File:KnnClassification.svg – Wikimedia Commons. (n.d.). Retrieved May 4, 2020, from https://commons.wikimedia.org/w/index.php?curid=2170282.
  9. GALLEGO-ÁLVAREZ, I., VICENTE-GALINDO, M. P., GALINDO-VILLARDÓN, M. P., et al. (2014). Environmental performance in countries worldwide: Determinant factors and multivariate analysis. Sustainability (Switzerland), 6(11), 7807–7832. https://doi.org/10.3390/su6117807.
  10. GLASS, G. E., SCHWARTZ, B. S., MORGAN, J. M., et al. (1995). Environmental risk factors for Lyme disease identified with geographic information systems. American Journal of Public Health, 85(7), 944–948. https://doi.org/10.2105/AJPH.85.7.944.
  11. ILIEŞ, D. C., BAIAS, Ş., BUHAŞ, R., et al. (2017). Environmental education in protected areas. case study from Bihor County, Romania. Geojournal of Tourism and Geosites, 19(1), 126–132.
  12. ILIEȘ, D. C., ONEȚ, A., GRIGORE, H., et al. (2019). Exploring the indoor environment of heritage buildings and its role in the conservation of valuable objects. Environmental Engineering and Management Journal, 18(12), 2579–2586. https://doi.org/10.30638/eemj.2019.243.
  13. JERRETT, M., BURNETT, R. T., GOLDBERG, M. S., et al. (2003, August 22). Spatial analysis for environmental health research: Concepts, methods, and examples. Journal of Toxicology and Environmental Health – Part A. https://doi.org/10.1080/15287390306446.
  14. KÖRNER, P., KRONENBERG, R., GENZEL, S., et al. (2018). Introducing Gradient Boosting as a universal gap filling tool for meteorological time series. Meteorologische Zeitschrift, 27(5), 369–376. https://doi.org/10.1127/metz/2018/0908.
  15. KUNG, Y. H., LIN, P. S., KAO, C. H. (2012). An optimal k-nearest neighbor for density estimation. Statistics and Probability Letters, 82(10), 1786–1791. https://doi.org/10.1016/j.spl.2012.05.017.
  16. LEE, J. M. (2017). Fast k-Nearest Neighbor Searching in Static Objects. Wireless Personal Communications, 93(1), 147–160. https://doi.org/10.1007/s11277-016-3524-1.
  17. LI, R., ZHANG, X., LIU, L., et al. (2020). Application of neural network to building environmental prediction and control. Building Services Engineering Research and Technology, 41(1), 25–45. https://doi.org/10.1177/0143624419838362.
  18. LI, T. Z., LIN, J. S., WU, M. T., et al. (2009). Concept and spatial analysis method of urban environmental traffic capacity. Journal of Transportation Engineering, 135(11), 873–879. https://doi.org/10.1061/(ASCE)TE.1943-5436.0000061.
  19. MADADGAR, S., MORADKHANI, H. (2014). Spatio-temporal drought forecasting within Bayesian networks. Journal of Hydrology, 512, 134–146. https://doi.org/10.1016/j.jhydrol.2014.02.039.
  20. MATLOVIČ, R., MATLOVIČOVÁ, K. (2012). The social relevance and branding of geography [Spoločenská Relevancia a Budovanie Značky Geografie]. Geografie-Sbornik CGS, 117(1), 33–51.
  21. MATLOVIČ, R., MATLOVIČOVÁ, K. (2020). First and second order discontinuities in world geographical thought and their primary reception in Slovak geography. Folia Geographica, 62(1), (online first).
  22. MEYER, H., KATURJI, M., DETSCH, F., et al. (2019). AntAir: satellite-derived 1 km daily Antarctic air temperatures since 2003. https://doi.org/10.5194/essd-2019-215.
  23. MIHINCǍU, D. C., ILIES, D. C., KOROLEVA, Y., et al. (2019). The study of indoor microclimate on wooden churches to be included among Oradea’s representative sights. Geojournal of Tourism and Geosites, 26(3), 737–750. https://doi.org/10.30892/gtg.26305-393.
  24. MURTAGH, F., ZHENG, G., CAMPBELL, J. G., et al. (2000). Neural network modelling for environmental prediction. Neurocomputing, 30(1–4), 65–70. https://doi.org/10.1016/S0925-2312(99)00144-7.
  25. NATEKIN, A., KNOLL, A. (2013). Gradient boosting machines, a tutorial. Frontiers in Neurorobotics, 7(DEC). https://doi.org/10.3389/fnbot.2013.00021.
  26. PACI, L., GELFAND, A. E., HOLLAND, D. M. (2013). Spatio-temporal modeling for real-time ozone forecasting. Spatial Statistics, 4, 79–93. https://doi.org/10.1016/j.spasta.2013.04.003.
  27. PROKHORENKOVA, L., GUSEV, G., VOROBEV, A., et al. (2018). CatBoost: unbiased boosting with categorical features. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (Eds.), ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018) (Vol. 31). 10010 North Torrey Pines Rd, La Jolla, California 92037 USA: NEURAL INFORMATION PROCESSING SYSTEMS (NIPS).
  28. RAMAZANOVA, N., BERDENOV, Z. G., RAMAZANOV, S., et al. (2019). Landscape-geochemical analysis of steppe zone basin Zhaiyk. News of National Academy of Sciences of the Republic of Kazakhstan, 4(436), 33–41. https://doi.org/10.32014/2019.2518-170x.95.
  29. RIBEIRO, M. C., PINHO, P., LLOP, E., et al. (2013). Multivariate geostatistical methods for analysis of relationships between ecological indicators and environmental factors at multiple spatial scales. Ecological Indicators, 29, 339–347. https://doi.org/10.1016/j.ecolind.2013.01.011.
  30. SAFAROV, R. Z., SHOMANOVA, Z. K., MUKANOVA, R. Z., et al. (2019). Design of neural network for forecast analysis of elements-contaminants distribution on studied territories (on example of Pavlodar city, Kazakhstan). News of the National Academy of Sciences of the Republic of Kazakhstan, Series of Chemistry and Technology, 438(6), 86–98. https://doi.org/10.32014/2019.2518-1491.78.
  31. SAHU, S. K., YIP, S., HOLLAND, D. M. (2009). Improved space-time forecasting of next day ozone concentrations in the eastern US. Atmospheric Environment, 43(3), 494–501. https://doi.org/10.1016/j.atmosenv.2008.10.028.
  32. SEXTON, K., WALLER, L. A., MCMASTER, R. B., et al. (2002). The importance of spatial effects for environmental health policy and research. In Human and Ecological Risk Assessment (Vol. 8, pp. 109–125). https://doi.org/10.1080/20028091056764.
  33. SHOJI, R., KAWAKAMI, M. (2006). Prediction of genotoxicity of various environmental pollutants by artificial neural network simulation. Molecular Diversity, 10(2), 101–108. https://doi.org/10.1007/s11030-005-9005-1.
  34. VICENTE, J. R., GONÇALVES, J., HONRADO, J. P., et al. (2014). A framework for assessing the scale of influence of environmental factors on ecological patterns. Ecological Complexity, 20, 151–156. https://doi.org/10.1016/j.ecocom.2014.10.005.
  35. XIONG, D., GUI, Q., HOU, W., et al. (2018). Gradient boosting for single image super-resolution. Information Sciences, 454–455, 328–343. https://doi.org/10.1016/j.ins.2018.04.075.
  36. YUAN, S. (2015). Random gradient boosting for predicting conditional quantiles. Journal of Statistical Computation and Simulation, 85(18), 3716–3726. https://doi.org/10.1080/00949655.2014.1002099.