Publication : USDA ARS

ARS Home » Research » Publications at this Location » Publication #209091

Title: Software to use the non-parametric k-nearest neighbor approach to estimate soil water retention

Author

	NEMES, ATTILA - UNIV OF MD
	ROBERTS, RALPH
	RAWLS, WALTER
	Pachepsky, Yakov
	VAN GENUCHTEN, MARTINUS

Submitted to: BARC Poster Day
Publication Type: Abstract Only
Publication Acceptance Date: 3/26/2007
Publication Date: 4/25/2007
Citation: Nemes, A., Roberts, R.T., Rawls, W.J., Pachepsky, Y.A., Van Genuchten, M.T. 2007. Software to use the non-parametric k-nearest neighbor approach to estimate soil water retention [abstract]. BARC Poster Day. Abs. 28.

Interpretive Summary:

Technical Abstract: Non-parametric approaches are being used in various fields to address classification type problems, as well as to estimate continuous variables. One type of the non-parametric lazy learning algorithms, a k-Nearest Neighbor (k-NN) algorithm has been applied as a pedotransfer technique to estimate water retention at -33 and -1500 kPa matric potentials. The algorithm was combined with the bootstrap data-subset selection technique to allow the development of model ensembles; that can be used to estimate the uncertainty of the final model output. Performance of the algorithm has subsequently been tested against estimations made by neural network (NNet) models, developed using the same data and input soil attributes. We used a hierarchical set of inputs using soil texture, bulk density and organic matter content to avoid possible bias towards one set of inputs, and varied the size of the data set used to develop the NNet models and to run the k-NN estimation algorithms. Different 'design-parameter' settings, analogous to model parameters have been optimized. The k-NN technique showed little sensitivity to potential sub-optimal settings in terms of how many nearest soils were selected and how those were weighed while formulating the output of the algorithm, as long as extremes were avoided. The optimal settings were, however, dependent on the size of the reference (development) data set. The non-parametric k-NN technique performed mostly equally well with the NNet models, in terms of root-mean-squared residuals and mean residuals. Gradual reduction of the data set size from 1600 to 100 (in 4 steps) resulted in only a slight loss of accuracy for both the k-NN and NNet approaches. We also characterized the sensitivity of this technique to (1) estimations made to soils with differing distribution of properties; (2) the choice between different sample weighing methods; (3) the number of ensembles needed to obtain stable output; (4) the presence of outliers in the reference data set; (5) the un-equal weighing of input attributes; (6) the addition of new,locally specific, data to the reference data set; and (7) the influence of local data density. The k-NN technique is a competitive alternative to other techniques to develop pedotransfer functions (PTFs), especially since re-development of this PTF is not needed as new data become available. We developed a computer tool, that is publicly available, that uses the above algorithm to estimate soil water retention at -33 and -1500 kPa matric potentials. We describe the rationale behind the k-NN estimation of soil water retention, provide insight into the sensitivity analyses that were performed and give account of the most important features of this computer tool. Software demonstration will also be available upon request.