Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Hydrology and Remote Sensing Laboratory » Research » Publications at this Location » Publication #295795

Title: Information loss in approximately bayesian data assimilation: a comparison of generative and discriminative approaches to estimating agricultural yield

Author
item NEARING, G - University Of Arizona
item GUPTA, H - University Of Arizona
item Crow, Wade

Submitted to: Journal of Hydrology
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 10/20/2013
Publication Date: 12/1/2013
Publication URL: http://handle.nal.usda.gov/10113/60028
Citation: Nearing, G.S., Gupta, H.V., Crow, W.T. 2013. Information loss in approximately bayesian data assimilation: A comparison of generative and discriminative approaches to estimating agricultural yield. Journal of Hydrology. 507:163-173.

Interpretive Summary: The best possible estimate of critical agricultural variables (e.g., crop yield and soil water availability) are typically based on the merger of independent information acquired from models, remote sensing retrievals and/or ground-based observations. In the geosciences, the merger of information acquired from models with information acquired from observations is generally referred to as "data assimilation." For most agricultural applications, there exists a range of possible data assimilation strategies. Techniques are needed to evaluate various approaches and determine which one integrates information most efficiently (and thus results in, for example, the best-possible estimate of crop yield and/or soil water availability for an agricultural drought monitoring system). This paper describes and applies a novel entropy-based tool for evaluating the performance of a data assimilation system and makes specific recommendations that will eventually improve our ability to accurately estimate agriculturally-relevant variables using remote sensing observations.

Technical Abstract: Data assimilation and regression are two commonly used methods for predicting agricultural yield from remote sensing observations. Data assimilation is a generative approach because it requires explicit approximations of the Bayesian prior and likelihood to compute the probability density function of yield, conditional on observations; regression is discriminative because it models the conditional yield density function directly. Here synthetic experiments were used to evaluate the abilities of two methods - the ensemble Kalman filter (EnKF) and Gaussian process regression (GPR) - to extract information from observations. The amount of information in an observation was formally quantified as the mutual information between that observation and end-of-season biomass. We formally define missing information, used information, and bad information as partial divergences from the true Bayesian posterior (yield conditional on the observations). Our results suggest that the simpler discriminative GPR approach can be as efficient as the more complex generative EnKF at extracting information from observations, and may therefore be better suited to dealing with the practical problems associated with remotely sensed data (e.g., sub-footprint scale heterogeneity, since GPR does not require specification of an appropriate observation function). This is important, for several practical and theoretical reasons, the most important being that discriminative methods can be applied without the need for a physical or conceptual simulator. Our method for analyzing information use has many potential applications. Approximations of Bayes’ law are used regularly in predictive models of environmental systems of all kinds, and the efficiency of such approximations has not been formally addressed.