Capability of Regression and Random Forest Methods to Estimate Soil Water Retention Curve by Developing Pseudo-Continuous Pedotransfer Functions

Authors

Department of Soil Sciences and Engineering, Faculty of Agriculture, Bu-Ali Sina University, Hamedan, Iran

Abstract

Background and Objectives
Direct methods of measuring soil water retention curve (SWRC) are time-consuming and expensive, so they are not easily applicable to large scales. Therefore, researchers use pedotransfer functions (PTFs) to obtain it. Various point and parametric pedotransfer functions have been used so far, with numerous methods to estimate the SWRC, each of which has its drawbacks. However, rare methods have been used to develop pseudo-continuous pedotransfer functions. The random forest (RF) method has not been utilized in any study so far, to create pseudo-continuous pedotransfer functions. Some variables have not been used as predictors in pseudo-continuous pedotransfer functions in any research. Therefore, the objectives of this article include investigating the potential of the RF method in creating pseudo-continuous pedotransfer functions, comparing its performance with linear regression, and examining the probability of improving the performance of these functions using the geometric mean and standard deviation of particles diameter and field capacity (FC) and permanent wilting point (PWP) as predictors.
 
Methodology
A total of 120 disturbed and undisturbed soil samples were collected from two provinces of Tehran and Hamedan. Soil texture, bulk density, and soil water retention curve in the range of 0 to 15000 hPa were measured. Then pseudo-continuous pedotransfer functions were created using two methods of linear regression and random forest. The soil water matric suction, soil texture, percentage of silt and sand, bulk density, geometric mean, standard deviation of particles diameter, and moisture content at FC and PWP were used in various combinations to estimate the soil water retention curve. The accuracy and reliability of the generated functions were compared between the two methods and within each method.
 
Results
Using soil water matric suction as the only input variable for estimating moisture at different matric suctions was not effective in the RF method, and no model was created. However, in the linear regression method, a model with acceptable results was developed (with R2 values of 0.675 and 0.674 for training and validation stages, respectively), which can be utilized in situations where additional information is not available. The inclusion of soil texture in the linear regression method significantly improved the accuracy of estimates by 5.4% and 5.3% in both training and validation stages, respectively. In the third function, incorporating the percentage of clay and sand alongside soil water matric suction as predictors improved SWRC estimation by 1.5% to 25.0% in both training and validation stages for both RF and linear regression compared to the second function. In the fourth function, using bulk density as an additional predictor led to a significant improvement in accuracy by 6.9% to 13.1%, because bulk density serves as an indicator of soil structure, enhancing the estimation of the soil water retention curve. Utilizing FC improved estimation accuracy by 3.5% to 24.4%, because FC is a point on the SWRC and enters direct information to the models. However, using the PWP as a predictor did not significantly improve estimation accuracy. Using geometric mean (dg) and geometric standard deviation (Sg) instead of percentage of clay and sand in pseudo-continuous pedotransfer functions did not lead to noticeable improvements. Error distribution across soil texture triangles in the linear regression method showed no dependence on soil texture. Because, in pedotransfer functions 1, 2, 4, 7, and 8, the highest error values were obtained in coarse-textured soils, while in pedotransfer functions 5, 6, 9, and 10, the lowest error values were associated with coarse-textured soils. Error distribution across soil texture triangles depended on the type of input variables and the method used to create pedotransfer functions. In all pseudo-continuous pedotransfer functions created by both methods, the accuracy of estimates in both training and validation stages in the RF method was significantly and noticeably higher, ranging from 22% to 46% more than those in linear regression. 
 
Conclusion
Using the regression method and solely relying on soil water matric suction as a predictor, an acceptable pseudo-continuous pedotransfer function was developed. Investigating the potential of establishing a similar relationship using the state-of-the-art estimation methods may lead to independence from relying on numerous soil water retention curve models. Utilizing more detailed information such as particle size distribution and FC for estimating the SWRC through pseudo-continuous pedotransfer functions is recommended. The dependence of error distribution on soil texture triangles on the type of input variables and the method used to create pedotransfer functions underscores the importance of selecting an appropriate combination of input variables and method for creating pseudo-continuous pedotransfer functions for estimating the SWRC. Given the significant superiority of the random forest method over linear regression, using soil water matric suction, percentage of clay and sand, bulk density, and FC as predictors in pseudo-continuous pedotransfer functions with the RF method yielded the best results in estimating the SWRC.
Data Availability Statement
Data is available on reasonable request from the authors.
 
Acknowledgements
This paper is published as a part of a Ph. D. thesis supported by the Vice Chancellor for Research and Technology of the Bu-Ali Sina University, Iran. The authors are thankful to the Bu-Ali Sina University for financial support.
 
Conflict of interest
The authors declare no conflict of interest.
 
Ethical considerations
The authors avoided data fabrication, falsification, plagiarism, and misconduct.

Keywords


Ahuja, L., Cassel, D., Bruce, R. & Barnes, B. (1989). Evaluation of spatial distribution of hydraulic conductivity using effective porosity data. Soil Science, 148, 404–411.
Akaike, H. (1974). New Look at the Statistical Model Identification. IEEE Transactions on Automatic Control, AC-19, 716-723.
Araya, S.N. & Ghezzehei, T.A. (2019). Using Machine Learning for Prediction of Saturated Hydraulic Conductivity and Its Sensitivity to Soil Structural Perturbations. Water Resources Research, 55, 5715-5737. 
Bayat, H., Neyshabouri, M., Mohammadi, K. & Nariman-Zadeh, N. (2011). Estimating water retention with pedotransfer functions using multi-objective group method of data handling and ANNs. Pedosphere, 21, 107–114.
Bayat, H., Neyshaburi, M.R., Mohammadi, K., Nariman-Zadeh, N. & Irannejad, M. (2013a). Improving water content estimations using penetration resistance and principal component analysis. Soil and Tillage Research, 129, 83–92.
Bayat, H., Neyshaburi, M.R., Mohammadi, K., Nariman-Zadeh, N., Irannejad, M. & Gregory, A.S. (2013b). Combination of artificial neural networks and fractal theory to predict soil water retention curve. Computers and Electronics in Agriculture, 92, 92-103.
Børgesen, C.D. & Schaap, M.G. (2005). Point and parameter pedotransfer functions for water retention predictions for Danish soils. Geoderma, 127, 154–167.
Brady, N.C. & Weil, R.R. (2010). Elements of The Nature and Properties of Soils. Pearson Educational International Upper Saddle River, NJ.
Breiman, L. (1984). Classification and regression trees. Routledge, New York.
Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
Cueff, S., Coquet, Y., Aubertot, J.-N., Bel, L., Pot, V. & Alletto, L. (2021). Estimation of soil water retention in conservation agriculture using published and new pedotransfer functions. Soil and Tillage Research, 209, 104967.
Dane, J. & Hopmans, J.W. (2002). Water retention and storage. Pp.671-717. In: Dane, J. H. & Topp, G. G. (eds.) Methods of soil analysis. Part 4. Physical Methods. Soil Science Society of America, Book Series No. 5. Madison, WI.
Dexter, A.R., Czyz, E.A., Richard, G. & Reszkowska, A. (2008). A user-friendly water retention function that takes account of the textural and structural pore spaces in soil. Geoderma, 143, 243-253.
Ebrahimi, E., Bayat, H., Sadeghi, S., Fallah, M. & Jorreh, M. (2016). Using compression curve characteristics to estimate water content by the van Genuchten model. Iranian Journal of Soil and Water Research, 47, 217-228. (in Persian with English abstract)
Efron, B. & Tibshirani, R.J. (1993). An introduction to the bootstrap. Monographs on Statistics and Applied Probability. Chapman & Hall, CRC press, London.
Gee, G.W. & Or, D. (2002). Particle-size analysis. Pp. 255–293. In: Dane, J. H. & Topp, G. G. (eds.) Methods of soil analysis. Part 4. Physical Methods. Soil Science Society of America, Book Series No. 5. Madison, WI.
Grossman, R. & Reinsch, T. (2002). Bulk density and linear extensibility. Pp. 201-228. In: Dane, J. H. & Topp, G. G. (eds.) Methods of soil analysis. Part 4. Physical Methods. Soil Science Society of America, Book Series No. 5. Madison, WI.
Gunarathna, M.H.J.P., Sakai, K., Nakandakari, T., Momii, K. & Kumari, M.K.N. (2019a). Machine learning approaches to develop pedotransfer functions for tropical Sri Lankan soils. Water, 11, 1940.
Gunarathna, M.H.J.P., Sakai, K., Nakandakari, T., Momii, K., Kumari, M.K.N. & Amarasekara, M.G.T.S. (2019b). Pedotransfer functions to estimate hydraulic properties of tropical Sri Lankan soils. Soil and Tillage Research, 190, 109-119.
Haghverdi, A., Cornelis, W.M. & Ghahraman, B. (2012). A pseudo-continuous neural network approach for developing water retention pedotransfer functions with limited data. Journal of Hydrology, 442-443, 46-54.
Haghverdi, A., Öztürk, H.S. & Cornelis, W.M. (2014). Revisiting the pseudo continuous pedotransfer function concept: Impact of data quality and data mining method. Geoderma, 226-227, 31-38.
Haghverdi, A., Öztürk, H.S. & Durner, W. (2018). Measurement and estimation of the soil water retention curve using the evaporation method and the pseudo continuous pedotransfer function. Journal of Hydrology, 563, 251-259.
Hillel, D. (1998). Environmental Soil Physics: Fundamentals, Applications, and Environmental Considerations. Elsevier Academic Press, Amsterdam, San Diego.
Hillel, D. (2004). Introduction to Environmental Soil Physics. Elsevier Academic Press, Amsterdam, San Diego.
Jamshidi, M., Delavar, M.A., Taghizadehe-Mehrjerdi, R. & Brungard, C. (2019). Evaluating digital soil mapping approaches for 3D mapping of soil organic carbon. Iranian Journal of Soil Research, 33, 227-239. (in Persian with English abstract)
Khlosi, M., Alhamdoosh, M., Douaik, A., Gabriels, D. & Cornelis, W. (2016). Enhanced pedotransfer functions with support vector machines to predict water retention of calcareous soil. European Journal of Soil Science, 67, 276-284.
Kotlar, A.M., de Jong van Lier, Q. & de Souza Brito, E. (2020). Pedotransfer functions for water contents at specific pressure heads of silty soils from Amazon rainforest. Geoderma, 361, 114098.
Ließ, M., Glaser, B. & Huwe, B. (2012). Uncertainty in the spatial prediction of soil texture: comparison of regression tree and Random Forest models. Geoderma, 170, 70-79.
Merdun, H., Çınar, Ö., Meral, R. & Apan, M. (2006). Comparison of artificial neural network and regression pedotransfer functions for prediction of soil water retention and saturated hydraulic conductivity. Soil and Tillage Research, 90, 108-116.
Nemes, A., Rawls, W., Pachepsky, Y.A. & van Genuchten, M.T. (2006). Sensitivity analysis of the nonparametric nearest neighbor technique to estimate soil water retention. Vadose Zone Journal, 5, 1222-1235.
Neyshaburi, M.R., Bayat, H., Mohammadi, K., Nariman-Zadeh, N. & Irannejad, M. (2015). Improvement in estimation of soil water retention using fractal parameters and multiobjective group method of data handling. Archives of Agronomy and Soil Science, 61, 257-273.
Nguyen, P.M., Haghverdi, A., De Pue, J., Botula, Y.-D., Le, K.V., Waegeman, W. & Cornelis, W.M. (2017). Comparison of statistical regression and data-mining techniques in estimating soil water retention of tropical delta soils. Biosystems Engineering, 153, 12-27.
Pachepsky, Y., Rawls, W., Gimenez, D. & Watt, J. (1998). Use of soil penetration resistance and group method of data handling to improve soil water retention estimates. Soil and Tillage Research, 49, 117-126.
Pachepsky, Y.A. & Rawls, W. (1999). Accuracy and reliability of pedotransfer functions as affected by grouping soils. Soil Science Society of America Journal, 63, 1748-1757.
Pachepsky, Y.A., Rawls, W. & Lin, H. (2006). Hydropedology and pedotransfer functions. Geoderma, 131, 308-316.
Pachepsky, Y.A. & Rawls, W.J. (2004). Development of Pedotransfer Functions in Soil Hydrology. Elsevier Science.
Paul, R.K. (2006). Multicollinearity: Causes, Effects and Remedies. IASRI, New Delhi.
Rastgou, M. (2020). Comprehensive comparison of the methods of developing pedotransfer functions (PTFs) and development of new algorithms to predict soil water retention curve (SWRC) and soil hydraulic conductivity curve (SHCC). Ph. D. Thesis, Department of Soil Science and Engineering, Faculty of Agriculture, Bu-Ali Sina University, Hamedan, Iran. (in Persian with English abstract)
Rastgou, M., Bayat, H., Mansoorizadeh, M. & Gregory, A.S. (2020). Estimating the soil water retention curve: Comparison of multiple nonlinear regression approach and random forest data mining technique. Computers and Electronics in Agriculture, 174, 105502.
Rastgou, M., Bayat, H., Mansoorizadeh, M. & Gregory, A.S. (2022). Estimating soil water retention curve by extreme learning machine, radial basis function, M5 tree and modified group method of data handling approaches. Water Resources Research, 58, e2021WR031059.
Rawls, W.J., Gish, T.J. & Brakensiek, D.L. (1991). Estimating Soil Water Retention from Soil Physical Properties and Characteristics, Pp. 213-234. In: Stewart, B.A. (Ed.), Advances in Soil Science: Volume 16. Springer New York, New York, NY.
Rawls, W., Pachepsky, Y.A., Ritchie, J., Sobecki, T. & Bloodworth, H. (2003). Effect of soil organic carbon on soil water retention. Geoderma, 116, 61-76.
Romano, N. & Palladino, M. (2002). Prediction of soil water retention using soil physical data and terrain attributes. Journal of Hydrology, 265, 56-75.
Schaap, M.G., Leij, F.J. & van Genuchten, M.T. (2001). Rosetta: a computer program for estimating soil hydraulic parameters with hierarchical pedotransfer functions. Journal of Hydrology, 251, 163–176.
Shirazi, M.A. & Boersma, L. (1984). A unifying quantitative analysis of soil texture. Soil Science Society of America Journal, 48, 142-147.
Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P. & Feuston, B.P. (2003). Random Forest:  A Classification and Regression Tool for Compound Classification and QSAR Modeling. Journal of Chemical Information and Computer Sciences, 43, 1947-1958.
Szabó, B., Szatmári, G., Takács, K., Laborczi, A., Makó, A., Rajkai, K. & Pásztor, L. (2019). Mapping soil hydraulic properties using random-forest-based pedotransfer functions and geostatistics. Hydrology and Earth System Science, 23, 2615-2635.
Tafteh, A., Davatgar, N. & Sedaghat, A. (2022). Estimation of important points on soil water retention curve (SWRC): comparison experimental-physical models and data mining technique. Arabian Journal of Geosciences, 15, 968.
Tomasella, J., Pachepsky, Y., Crestana, S. & Rawls, W. (2003). Comparison of two techniques to develop pedotransfer functions for water retention. Soil Science Society of America Journal, 67, 1085-1092.
Tóth, B., Makó, A. & Toth, G. (2014). Role of soil properties in water retention characteristics of main Hungarian soil types. Journal of Central European Agriculture, 15, 137-153.
Touil, S., Degre, A. & Chabaca, M.N. (2016). Sensitivity analysis of point and parametric pedotransfer functions for estimating water retention of soils in Algeria. Soil, 2, 647.
Twarakavi, N.K., Šimůnek, J. & Schaap, M. (2009). Development of pedotransfer functions for estimation of soil hydraulic parameters using support vector machines. Soil Science Society of America Journal, 73, 1443-1452.
Ungaro, F., Calzolari, C. & Busoni, E. (2005). Development of pedotransfer functions using a group method of data handling for the soil of the Pianura Padano-Veneta region of North Italy: water retention properties. Geoderma, 124, 293-317.
Vereecken, H., Maes, J., Feyen, J. & Darius, P. (1989). Estimating the soil moisture retention characteristic from texture, bulk density, and carbon content. Soil Science, 148, 389–403.
Vereecken, H., Weynants, M., Javaux, M., Pachepsky, Y., Schaap, M. & Genuchten, M.T. (2010). Using pedotransfer functions to estimate the van Genuchten–Mualem soil hydraulic properties: A review. Vadose Zone Journal, 9, 795–820.
Verikas, A., Gelzinis, A. & Bacauskiene, M. (2011). Mining data with random forests: A survey and results of new tests. Pattern Recognition, 44, 330–349.
Warkentin, B.P. (1972). Use of the liquid limit in characterizing the clay soils. Canadian Journal of Soil Science, 52, 457–464.
Wiesmeier, M., Barthold, F., Blank, B. & Kögel-Knabner, I. (2011). Digital mapping of soil organic matter stocks using Random Forest modeling in a semi-arid steppe ecosystem. Plant and Soil, 340, 7-24.
Williams, J., Prebble, R., Williams, W. & Hignett, C. (1983). The influence of texture, structure and clay mineralogy on the soil moisture characteristic. Soil Research, 21, 15–32.
Wösten, J., Pachepsky, Y.A. & Rawls, W. (2001). Pedotransfer functions: bridging the gap between available basic soil data and missing soil hydraulic characteristics. Journal of Hydrology, 251, 123–150.
Zamani, P. (2011). Statistical Designs in Animal Science. Bu-Ali Sina University, Hamedan Press. (In Farsi).