Digital Soil Mapping by Machine Learning Techniques

Authors

1 Department of Soil Science, Faculty of Agriculture, Urmia University, Urmia, Iran

2 Department of Soil Science, Faculty of Agriculture, Urmia University, Iran

3 Department of Agricultural Extension and Landscape Architecture, Shahid Bakeri High Education Center of Miandoab, Urmia University, Iran

4 Department of Soil Science, Faculty of Agriculture, Shahrekork University, Iran

Abstract

Background and Objectives
The use of geospatial techniques for mapping soils is broadly covered by the term digital soil mapping (DSM). Soil maps have considerable significance as basic maps in many environmental and natural resources studies. Digital soil maps are based on the relationship between environmental variables and soil properties. With the development of computers and technology, digital and quantitative approaches have been developed. Continuous utilization of agricultural lands regardless of the land suitability caused soil destruction. Also, incompetency in custom methods, invention geographic information system (GIS), and remote sensing (RS) techniques cause erupt and use of digital soil mapping.
 
Methodology
The study area is approximately 5000 ha which is located in the west of Heris region of East Azerbaijan province, Iran. In the first study, the potential of different models to predict soil classes at different taxonomic levels was investigated. According to semi-detailed soil, survey and using stratified random sampling method, 50 pedons and 50 augers with an approximate distance of 1000 m were excavated, described and soil samples were taken from different genetic horizons. Based on the pedon descriptions and soil analytical data, pedons were classified up to the family level. Different machine learning techniques, namely boosted regression tree (BRT), random forest (RF), artificial neural networks (ANNs), and multinomial logistic regression (MLR) were used to test the predictive power for mapping the soil classes. After preparing the soil properties maps and checking their accuracy, these maps were used along with auxiliary parameters for estimating soil classes using an artificial neural network model in the R software. Finally, the accuracy and uncertainty of the model were evaluated by overall accuracy and confusion index, respectively.
 
Results
Results showed that the different models had the same ability for prediction of the soil classes across all taxonomic levels but a considerable decreasing trend was observed for their accuracy at subgroup and family levels. The terrain attributes were the most important auxiliary information to predict the soil classes up to the family level. The main goal of the second study was to predict soil surface properties (pH, electrical conductivity, gypsum, organic carbon, calcium carbonate equivalent, coarse fragments, and particle size distribution) using ANNs, BRT, generalized linear model (GLM), and multiple linear regression (MLR). Among the studied models, GLM showed the highest performance to predict most soil properties whereas the best model is not necessarily able to make an accurate estimation. Also, the terrain attributes were the most important environmental covariates to predict the soil classes in all taxonomic levels, but they could not display the soil variation entirely. This shows that the unexplained variations are controlled by unobserved variations in the environment, which can be due to the management over time. Results suggested that the DSM approaches have not enough prediction accuracy for the soil classes at lower taxonomic levels that focus on the soil properties affecting land use and management. Results showed that the entry of more details in the soil classification at the lower levels of the Soil Taxonomy system while increasing the number of classes, leads to decreasing the overall accuracy and increasing uncertainty. It is noticeable that the ANNs model has a good accuracy up to the great group level through the acceptable level of overall accuracy (i.e., 75 %), hence it has a high degree of uncertainty. Therefore, the accuracy of the model could not be effective in its selection through the modeling process; however, paying attention to its uncertainty is also very important along with the model error.
 
Conclusion
Terrain attributes were the main predictors among different studied auxiliary information. The accuracy of the estimations with more observations is recommended to give a better understanding about the performance of DSM approach over low-relief areas. Further studies may still be required to distinguish new environmental covariates and introduce new tools to capture the complex nature of soils. Accordingly, we suggest using the other methods of soft computing for modeling in plain areas or low relief regions. Finally, the use of DSM methods is increasing over time and will eventually be considered as distinct and novel techniques.
 
 
Data Availability Statement
Data is available on reasonable request from the authors.
 
Acknowledgements
This paper is published as a part of a Master's thesis supported by the Vice Chancellor for Research and Technology of the Urmia University, Iran. The authors are thankful to the Urmia University for financial supports.
 
Conflict of interest
The authors declare no conflict of interest.
 
Ethical considerations
The authors avoided data fabrication, falsification, plagiarism, and misconduct.

Keywords


Adhikari, K., Minasny, B., Greve, M.B. & Greve, M.H. (2014). Constructing a soil class map of Denmark based on the FAO legend using digital techniques. Geoderma, 214, 101–113.
Aksoy, E., Panagos, P. & Montanarella, L. (2012). Spatial prediction of soil organic carbon of Crete using geostatistics. Pp. 149–159. In: Minasny B, et al. (eds.) Digital soil assessments and beyond. CRC Press, London.
Alijani, Z. & Sarmadian, F. (2014). The role of topography in changing of soil carbonate content. Indian Journal of Science and Research, 6, 263–271.
Arrouays, D., McKenzie, N., Hempel, J., de Forges, A.R. & McBratney, A.B. (2014). Global soil map: basis of the global spatial soil information system. CRC press.
Bagheri Bodaghabadi, M., Martinez-Casasnovas, J.A., Salehi, M.H., Mohammadi, J., Esfandiarpoor Borujeni, I., Toomanian, N. & Gandomkar, A. (2015). Digital soil mapping using artificial neural networks and terrain-related attributes. Pedosphere, 25, 580–591.
Boettinger, J.L., Ramsey, R.D., Bodily, J.M., Cole, N.J., Kienast Brown, S., Nield, S.J., Saunders, A.M. & Stum, A.K. (2008). Landsat spectral data for digital soil mapping. Pp. 193–203. In: Hartemink A.E. et al. (eds). Digital soil mapping with limited data. Springer, Australia.
Byrt, T., Bishop, J. & Carling, J.B. (1993). Bias, prevalence and kappa. Journal of Clinical Epidemiology, 46, 423–429.
Camera, C., Zomeni, Z., Noller, J.S., Zissimos, A.M., Christoforou, I.C. & Bruggeman, A. (2017). A high-resolution map of soil types and physical properties for Cyprus: A digital soil mapping optimization. Geoderma, 285, 35–49.
Carré, F., McBratney, A.B., Mayr, T. & Montanarella, L. (2007). Digital soil assessments: Beyond DSM. Geoderma, 142(1-2), 69–79.
Congalton, R. (1991). A review of assessing the accuracy of classifications of remotely sensed data. Remote Sensing of Environment, 37, 35–46.
Congalton, R.G. and Green, K. (1998). Assessing the accuracy of remotely sensed data: principles and practices. CRC Press.
Friedman, J.H. (2001). Greedy function approximation: a gradient boosting machine. The Annals of Statistics, 29, 1189–1232.
Heung, B., Bulmer, C.E. & Schmidt M.G. (2014). Predictive soil parent material mapping at a regional-scale: a random forest approach. Geoderma, 214, 141–154.
Jafari, A., Ayoubi, S., Khademi, H., Finke, P.A. & Toomanian, N. (2013). Selection of a taxonomic level for soil mapping using diversity and map purity indices: A case study from an Iranian arid region. Geomorphology, 201, 86–97.
Khaledian, Y & Miller, BA, (2020). Selecting appropriate machine learning methods for digital soil mapping. Applied Mathematical Modelling, 81, 401–418.
Lal, R., Mohtar, R.H., Assi, A.T., Ray, R., Baybil, H. & Jahn, M. (2017). Soil as a basic nexus tool: soils at the center of the food–energy–water nexus. Current Sustainable/Renewable Energy Reports, 4(3), 117–129.
Lie, M., Glaser, B. & Huwe, B. (2012). Uncertainty in the spatial prediction of soil texture: comparison of regression tree and Random Forest models. Geoderma, 170, 70–79.
Lumley, T. (2009). Regression subset selection. http://CRAN.com.
MacMillan, R.A., Jones, R.K. & McNabb, D.H. (2004). Defining a hierarchy of spatial entities for environmental analysis and modeling using digital elevation models (DEMs). Computers, Environment and Urban Systems, 28(3), 175–200.
McBratney, A.B., Santos, M.M.  & Minasny, B. (2003). On digital soil mapping. Geoderma, 117(1-2), 3–52.
Minasny, B. & McBratney, A.B. (2007). Spatial prediction of soil properties using EBLUP with the Matérn covariance function. Geoderma, 140, 324–336.
Minasny, B. & McBratney, A.B. (2016). Digital soil mapping: A brief history and some lessons. Geoderma, 264, 301–311.
Mosleh, Z., Salehi, M.H., Jafari, A. Esfandiarpoor Borujeni, I. & Mehnatkesh, A. (2016). The effectiveness of digital soil mapping to predict soil properties over low-relief areas. Environmental Monitoring Assessment, 31, 188–195.
Padarian, J., Minasny, B. & McBratney, A.B. (2019). Using deep learning for digital soil mapping. Soil Discussion, 5(1), 79–89.
Padarian, J., Perez-Quezada, J. & Seguel, S. (2012). Modeling the distribution of organic carbon in the soils of Chile. Pp. 329–333. In: Minasny B, et al. (eds.) Digital soil assessments and beyond. CRC Press, London, UK. 
Pahlavanrad, M.R., Toomanian, N., Khormali, F., Brungard, C.W., Komaki, S.B. & Bogaert, P. (2014). Updating soil survey maps using random forest and conditioned Latin hypercube sampling in the loess derived soils of northern Iran. Geoderma, 232, 97–106.
Rodriguez-Galiano, V.F., Ghimire, B., Rogan, J., Chica-Olmo, M. & Rigol-Sanchez, J.P. (2012). An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS Journal of Photogrammetry and Remote Sensing, 67, 93–104.
Taghizadeh Mehrjardi, R., Nabiollahi, K., Minasny, B. & Triantafilis, J. (2015). Comparing data mining classifiers to predict spatial distribution of USDA-family soil groups in Baneh region, Iran. Geoderma, 253–254, 67–77.
Wadoux, A.M.C. (2019). Using deep learning for multivariate mapping of soil with quantified uncertainty. Geoderma, 351, 59–70.
Zhu, A.X., Band L., Vertessy, R. & Dutton, B. (1997). Derivation of soil properties using a soil land inference model (SoLIM). Soil Science Society of America Journal, 61(2), 523–533.