Global Change Biology (2010) 16, 2737–2749, doi: 10.1111/j.1365-2486.2010.02171.x
Characterization of ecosystem responses to climatic controls using artificial neural networks A N T J E M . M O F F A T *, C L E M E N S B E C K S T E I N w , G A L I N A C H U R K I N A z, M A R T I N A M U N D * and M A R T I N H E I M A N N * *Max Planck Institute for Biogeochemistry, Hans-Kno¨ll-Str. 10, 07745 Jena, Germany, wDepartment of Mathematics and Computer Science, Friedrich Schiller University, Ernst-Abbe-Platz 1-4, 07743 Jena, Germany, zLeibniz-Centre for Agricultural Landscape Research (ZALF), Eberswalder Strasse 84, 15374 Mu¨ncheberg, Germany
Abstract Understanding and modeling ecosystem responses to their climatic controls is one of the major challenges for predicting the effects of global change. Usually, the responses are implemented in models as parameterized functional relationships of a fixed type. In contrast, the inductive approach presented here based on artificial neural networks (ANNs) allows the relationships to be extracted directly from the data. It has been developed to explore large, fragmentary, noisy, and multidimensional datasets, such as the carbon fluxes measured at the ecosystem level with the eddy covariance technique. To illustrate this, our approach has been systematically applied to the daytime carbon flux dataset of the deciduous broadleaf forest Hainich in Germany. The total explainable variability of the half-hourly carbon fluxes from the driving climatic variables was 93.1%, showing the excellent data mining capability of the ANNs. Total photosynthetic photon flux density was identified as the dominant control of the daytime response, followed by the diffuse radiation. The vapor pressure deficit was the most important nonradiative control. From the ANNs, we were also able to deduce and visualize the dependencies and sensitivities of the response to its climatic controls. With respect to diffuse radiation, the daytime carbon response showed no saturation and the light use efficiency was three times greater for diffuse compared with direct radiation. However, with less potential radiation reaching the forest, the overall effect of diffuse radiation was slightly negative. The optimum uptake of carbon occurred at diffuse fractions between 30% and 40%. By identifying the hierarchy of the climatic controls of the ecosystem response as well as their multidimensional functional relationships, our inductive approach offers a direct interface to the data. This provides instant insight in the underlying ecosystem physiology and links the observational relationships to their representation in the modeling world. Keywords: artificial neural networks (ANNs), climatic controls, ecological data mining, ecosystem physiology, eddy covariance carbon flux, FLUXNET, Hainich forest, inductive modeling
Received 20 August 2009 and accepted 8 November 2009
Introduction The change of the earth’s climate strongly affects terrestrial biological ecosystems (IPCC, 2007a), but the response of the ecosystems to the changing environmental conditions is largely unknown. Even basic phenomena are still under debate: The observed net uptake of CO2 by the land biosphere implies an unexplained large, increasing land sink, also called missing sink or residual land sink (Burgermeister, 2007; IPCC, 2007b). For the Northern Hemisphere, the average estimate of the land carbon sink from atmospheric inversions is almost a factor of two larger than the bottom-up estimate, and the longitudinal partitioning of the northern sink is subject to large unCorrespondence: Antje M. Moffat, tel. 1 49 3641 576220, fax 1 49 3641 577200, e-mail:
[email protected]
r 2010 Blackwell Publishing Ltd
certainties (IPCC, 2007b). Furthermore, it is now recognized that biological processes influence the climate of the earth system significantly (Heimann & Reichstein, 2008). Therefore, understanding the climatic controls of the ecosystem response is fundamental and essential in the context of global change. To tackle this question, towers equipped with the eddy covariance technique have been established, and these are measuring the carbon flux in a wide range of vegetation types and climate zones all over the world (Baldocchi, 2008). The flux measurements have a high temporal resolution of half-hourly to hourly, but, due to the limitations of the eddy covariance technique, they are fragmentary and noisy (Papale et al., 2006). Main limitations are the theoretical requirement of stationarity of the flow, turbulent atmospheric conditions, and no residual vertical wind speed or horizontal advection, 2737
2738 A . M . M O F F A T et al. but also varying source areas of the fluxes in heterogeneous environments (Goeckede et al., 2004). In contrast to controlled lab experiments, the ecosystem response is driven by external weather conditions. To capture the climatic controls, there are concurrent measurements of a wide range of meteorological variables, such as radiation, temperature, and humidity. This results in large, complex, and multidimensional datasets from which the causalities cannot be obtained just by visual evaluation of the measurements. Therefore, additional modeling is required. Two basic modeling approaches can be distinguished: the hypothetic-deductive and the inductive (Hempel & Oppenheim, 1948; Young & Jarvis, 2002). The hypotheticdeductive approach (Fig. 1, top) begins with hypotheses about how the controls in the ecosystem work. The controlling processes are then implemented in an ecosystem model as parameterized equations (deduction). The carbon flux datasets are used to constrain the parameters and to test the validity of the model. A good agreement of the model’s predictions with the measurements is assumed to corroborate the hypotheses. This paper presents a fully inductive approach (Fig. 1, bottom), where a priori assumptions are avoided as much as possible. It is based on a purely empirical model with a very general function class, here artificial neural networks (ANNs). The functional relationships of the carbon fluxes to the climatic controls are inferred solely and directly from the observations. These purely empirical relationships are then used to characterize the ecosystem response to its climatic drivers, e.g., the
hierarchy of the controls, the multivariate dependencies, and the sensitivities of the response. Only at the last step are the results put in the context of current hypotheses. Hence, the inductive approach described below can be used to answer the question of what controls the carbon flux in terrestrial ecosystems directly from the observations.
Method
ANN modeling framework In the past, purely empirical models have been used broadly, e.g., for the spatial or temporal interpolation of the carbon fluxes (Papale & Valentini, 2003; Gove & Hollinger, 2006; Stauch & Jarvis, 2006). These models are generally used as a black box. Only a few of them are used in an inductive manner also aiming to provide a physiological interpretation, such as data-based mechanistic modeling (Young & Jarvis, 2002). Our inductive approach is based on statistical multivariate modeling with ANNs (Bishop, 1995; Rojas, 1996). It exploits their ability to recognize the underlying patterns even in large sets of (noisy) observational datasets. Owing to their outstanding data-mining ability, the ANNs often outperform classical semi-empirical methods (e.g., Abramowitz, 2005; Moffat et al., 2007) and can thus be used as a benchmark for process-based model descriptions (Abramowitz, 2005). The modeling framework used in this paper is based on feedforward ANNs with a sigmoid activation function trained with the backpropagation algorithm (Bishop, 1995; Rojas, 1996). A feed-forward ANN consists of nodes, interconnected by weighted links. Information moves only in a forward direction, from the input node layer through the hidden node layer(s) to
Hypothetic-deductive modeling approach
X
Y
Data Hypotheses
Ecosystem model with fixed equations
Parameterization and evaluation
Constrained set of parameters
Inductive modeling approach
? ? Purely empirical model
X
Y
Data Extraction of the functional relationships and evaluation
∂f ∂f ∂x ∂y
Hypotheses
Characterization
Fig. 1 Conceptual flow of the two different modeling approaches. The shaded areas depict the special features of the inductive approach presented in this paper to characterize the underlying functional relationships of the ecosystem response.
r 2010 Blackwell Publishing Ltd, Global Change Biology, 16, 2737–2749
CHARACTERIZING ECOSYSTEM RESPONSES the output node layer. This type of ANN provides the features required by our inductive approach: a very general function class with a closed-form expression representing the network response, trained by a supervised learning algorithm that is suited for nonlinear regression tasks. Cybenko (1989) proved that a single hidden layer, feed-forward ANN is capable of approximating any continuous, multivariate function to any desired degree of precision. This means that a complex enough feed-forward ANN has the built-in flexibility to map the individual conditions without needing prior assumptions about the shape of the response. The training procedure is used to constrain the weights of this purely empirical model to the functional relationships present in the data. The weights can be viewed as nonlinear regression parameters. Since the dataset is presented as snapshots, one tuple at a time, the ANN models picks up correlations of the responding output variable to the controlling input variables (drivers) at the presented time scale, here halfhourly. An optimally trained ANN model is able to map the temporal correlations in the data while maintaining the ability to generalize beyond the training dataset. To yield a good generalization of the data, the ANN model is systematically varied in each training run. The training begins with a large, usually single layer network with many nodes and randomly chosen initial values of the weights. Then the network is trained in online or batch mode and pruned to reach an optimum size. The final structure and weight parameters of the network depend on the initial number of nodes at each layer, the initial values of the weights, and the training progression (pattern shuffling, online or batch mode, pruning, early stopping). Further details on the technical implementation can be found in A. M. Moffat, 2010. The parameterized ANN model with fixed weights after training will be referred to as the ANN model. For the mapping to be robust, two ANNs trained on the same dataset should result in the same functional relationships, even though their final node structure and weight parameters differ due to the systematically varied training procedure. To get a measure of the robustness, each ANN training scenario in this paper has been repeated 10 times.
2739
where o is the mean of the observed values. For the model residuals, the standard deviation (SD) will be used according to Richardson et al. (2006): pffiffiffi pffiffiffi 1 X SD ¼ 2 MAE ¼ 2 ð3Þ jpi oi j; N where MAE is the mean absolute error. If the ANN model will be used for predictions, the mean bias error should also be considered (Moffat et al., 2007). Driver relevance. If a climatic variable d1 has more correlation with the responding output variable than another climatic variable d2, the mapping performance P1 of the ANN with d1 as the single input will be higher than the performance P2 of the ANN with single d2: P1 > P2 :
ð4Þ
The ANN mapping performance with single inputs can thus be used to quantify their importance as primary input drivers. In the same manner, the improvement in ANN performance with a new driver added to an existing network can be used as a measure of importance as an additional driver. The more new information the additional driver dA adds, the greater is the improvement in the network performance and the more relevant is this climatic variable dA for the response (van de Laar et al., 1999). The performance improvement DPA can be calculated as: DPA ¼ PþA P;
ð5Þ
where P is the performance without dA and P1A the performance with dA added. When using the performance improvement as a measure of relevance, attention has to be paid to correlations between the input drivers. If a new driver adds little information to the system, it might mean that it is irrelevant or that the information is already present in the existing inputs. The latter fact can be used to detect correlations by first training the networks separately on two drivers of interest, and then together. Assuming that the two drivers showed high relevance when trained separately but only little added performance when both were used for the training, it means that the two are closely correlated.
Analysis tools Mapping performance. The quality, or performance, of the ANN model can be used to estimate how much of the response can be mapped (explained) with the input drivers provided. During the ANN training, the sum of squared errors (SSEerr) is optimized: 1X SSEerr ¼ ðpi oi Þ2 ¼ RMSE2 ; ð1Þ N where oi are the individual observed data, pi are the values predicted by the ANN, and N is number of data points. The SSEerr is equal to the squared root mean square error (RMSE2). The coefficient of determination (R2) is directly related to SSEerr normalized by the total variance SSEtot of the dataset: P SSEerr ðpi oi Þ2 R2 ¼ 1 ¼1 P ; ð2Þ SSEtot ðoi oÞ2
Network function. The ANN model maps the response of the dependent output variable to the input driver(s) as present in the data. In a feed-forward network, the input drivers d1 to dn are mapped unidirectionally, layer by layer, onto the predicted output. This yields a unique, continuous analytical network function f describing the response: fðd1 ; :::; dn Þ; where f: D ! R and D Rn :
ð6Þ
If the input drivers are mapped on multiple outputs m, then f is a vector of Rm. Each element of this vector is a closedform expression of the input space, describing one aspect of the response. Numerical partial derivatives. The numerical partial derivative PaD of f with respect to each input driver di characterizes the
r 2010 Blackwell Publishing Ltd, Global Change Biology, 16, 2737–2749
2740 A . M . M O F F A T et al. change in the ecosystem response for each individual dataset tuple tj: @f PaDi;j ¼ : ð7Þ @di tj To calculate the numerical partial derivatives, the standard ANN backpropagation algorithm was extended to save the composition of the derivatives at each node. Now the ANN can be used not only for function approximation but additionally for the calculation of the numerical partial derivative (after Rojas, 1996). The numerical partial derivative PaDi,j describes the change of the ecosystem response per measured physical unit. Since the interest of this study is in the overall response of the ecosystem, the numerical partial derivatives of each input driver di are transformed from the dynamic range, estimated from its yearly absolute minimum di,min and maximum di,max, to unit range: ½di; min ; di; max 7!½0; 1:
ð8Þ
The normalized numerical partial derivative is calculated as:
nor: PaDi;j ¼ di; max di; min PaDi;j :
ð9Þ
The normalized PaD has the same scale for each of the climatic drivers, namely in units of ecosystem response per unit-normalized dynamic range. To get an estimate of the mean absolute change of the response, the absolute numerical partial derivatives for an input variable di are averaged over all N tuples tj: abs: PaDi ¼
N 1X nor: PaDi;j : N j¼1
ð10Þ
The positive and negative fractions of this sum provide information on negative and positive changes in the response: 1 X ðnor: PaDi;j Þ; ð11Þ neg: PaDi ¼ N PaD < 0 i;j
and pos: PaDi ¼
1 X nor: PaDi;j : N PaD > 0
ð12Þ
i;j
Methodological stages Our inductive approach encompasses six stages to characterize the ecosystem response directly from large and multidimensional, even fragmented observational datasets with as few prior assumptions as possible. The stages are based on the highly flexible, purely empirical modeling framework and on the analysis tools described above. An extension of this inductive approach to a full methodology – a body of methods generally applicable for the exploration of ecological datasets – is presented in A. M. Moffat, 2010. 1. Preconsiderations. The observational dataset consists of a responding variable to be induced, such as the net carbon flux, and driving variables, such as the meteorological data. To ensure that the dataset is representative for the char-
acterization of the ecosystem response, the following preconsiderations should be taken into account:
Quality: Since an empirical model will map the functional relationships as present in the datasets, it is important to use accurate and consistent measurements. The quality of the dataset does not depend on the quantitative amount of data but on the enclosed information. Reducing complexity: The more explicit the information in the dataset, the more concise will be the mapped relationships. To reduce the complexity, only data relevant for the queried ecosystem response should be considered. For example, if the photosynthetic response of the ecosystem is of interest, the dataset should be restricted to the daytime data of the active period. Candidates for input drivers: The dataset should contain all climatic variables that are assumed to have an effect on the ecosystem response. Data coverage: A purely empirical model can only map functional relationships properly within the scope of the training dataset. Interpolation between underrepresented regions or extrapolation might lead to physiologically implausible mapping. Therefore, the measurements used for training should have good data coverage over the full range of interest.
2. Benchmarking. The ANN training is first performed with all available climatic variables in the dataset as input drivers. Assuming an optimally trained ANN, this gives a benchmark measure of the maximum mapping between the responding variable and all the provided meteorological observations. If the determination coefficient R2 is used as the performance measure, then the benchmark describes the total explainable variability in the dataset. If all relevant climatic controls, and, if necessary, all information about the state of the ecosystem such as the phenology are included in the benchmark dataset, the remaining unexplained variability can be attributed to the noise in the measurement. The model residuals can then be used to give an estimate of the uncertainty (random error) in the flux measurements (Richardson et al., 2008). 3. Hierarchy of the climatic controls. After determining the total benchmark, the ANNs are trained with single input drivers at a time. This ANN mapping exercise determines the relevance of a climatic variable as a primary driver. The hierarchy of the climatic controls of the response can be obtained by ranking the driver’s relevances. If the response is highly modulated by a certain driver (e.g., the response of photosynthesis by light), it acts like a carrier signal for the response to the minor driver (e.g., temperature). In this case, the ANN might not be able to pick up the underlying minor correlation directly. To overcome this problem, the dominating input driver is identified using the primary driver performance from above. The networks are then trained with the dominant primary driver, plus each of the other climatic variables in turn as secondary drivers. The
r 2010 Blackwell Publishing Ltd, Global Change Biology, 16, 2737–2749
CHARACTERIZING ECOSYSTEM RESPONSES
2741
performance improvement is used to identify the relevance of the climatic variables as secondary drivers.
binning and grouping, different phases of the response can be examined:
4. Function analysis of single to multivariate ANN models. The hierarchy of the climatic controls allows to confine the function analysis to the relevant climatic drivers. First, the single ANN model trained on the primary climatic control is examined, and, then, the ANN models of the primary control plus secondary driver(s) are analyzed for their multivariate dependencies. In order to get plausible functional relationships, the choice of input drivers is subject to the following conditions:
Data binning: To analyze certain aspects of the overall response query, the representative dataset can be binned to certain variable ranges (e.g., flux magnitudes). The ANN models are trained on the complete dataset, but the analysis is performed on the individual bins.
The drivers should be physiologically meaningful. The drivers often have obvious or hidden correlations that may distort the dependencies. Therefore, it is important to be aware of cross-dependencies and to keep the drivers as independent as possible. Confounding drivers should be included in the driver set in order to obtain robust relationships. The degrees of freedom of the empirical model increase with each added driving variable and may lead to a physiologically implausible mapping of the response. Consequently, the number of input drivers should be kept as low as possible. During the training, the general ANN network function is constrained by the measurements. Afterwards the ANN network function represents the ecosystem response to its climatic controls as present in the data. This network function can be used to characterize the physiological properties of the ecosystem as present in the data: Its form (e.g., the basic shape, the offsets at the origin, or the saturation) shows the functional dependency and can be used to derive the physiologically relevant parameters. Plotting of the network function helps to visualize the functional dependencies on the climatic controls. Its partial derivatives give information about the changes in the response with respect to the input driver(s): positive and negative derivatives or potential turning points of the response may provide insight into the underlying processes. The absolute sums of the partial derivatives and their positive and negative fractions reveal the sensitivities of the ecosystem response to the climatic drivers. The derived physiological properties are then compared with the existing hypotheses. This comparison may corroborate the hypotheses or indicate new or different features present in the data. The function analysis of the purely empirical ANN models can thus serve as a link between the observations and their semi-empirical representations in the modeling world. 5. Data stratification. Our inductive approach works on fragmented data as long as there are enough representative samples in the dataset for training the ANN models. With
Data grouping: To investigate differences in the response, the representative dataset can be grouped into subsets (e.g., each month). The ANN models are trained separately for each subset and the differences between them give insight into the variability of the response. The setup of the grouping can be varied to test for diurnal to seasonal to interannual variability.
6. Theoretical driver variables. The dataset can be extended from observable to theoretical driver variables. For example, the phenological state of the ecosystem can be described with a fuzzy variable for the course of the season (Papale & Valentini, 2003) or with the latent variable of the weekly mean temperature. Other latent variables, such as the fraction of diffuse light, might expose a different aspect of the response. Time lag effects can be included by providing information about preceding events, e.g., previous productivity rates. The relevance of the theoretical variable as an additional input driver gives a measure of its importance to modeling the ecosystem response. The six stages described are independent of a specific domain and can be generally applied to characterize ecosystem responses to climatic controls hidden in complex observational datasets. Their capability will be demonstrated for the daytime carbon flux measurements of the Hainich forest in the following.
Domain of the study The domain of this study is the characterization of the daytime carbon fluxes of the deciduous broadleaf forest Hainich in Germany obtained with eddy covariance measurements (Knohl et al., 2003). Hainich is a mature beech forest in a temperate, continental climate with the flux tower located at 51.071N and 10.451E. The turbulent exchange of CO2 is measured above the canopy with the eddy covariance technique. Detailed stand characteristics of the Hainich forest within the main footprint of the measurements can be found in Table 1 of Kutsch et al. (2008). The carbon flux measured is the net ecosystem exchange (NEE) between the atmosphere and the terrestrial ecosystem. Throughout this paper, the term net ecosystem productivity (NEP) will be used to describe the negative of NEE (NEP 5 NEE). NEP of the forest is the carbon uptake by photosynthesis minus the release by autotrophic and heterotrophic respiration. Since the focus of this study is on the daytime response during the active period, the response of the
r 2010 Blackwell Publishing Ltd, Global Change Biology, 16, 2737–2749
2742 A . M . M O F F A T et al. List of variables used for the characterization
Net carbon flux NEP Net ecosystem productivity (mmol CO2 m2 s1) Radiative variables PPFD (Total) photosynthetic photon flux density (mmol photon m2 s1) PPFDdir Direct PPFD (mmol photon m2 s1) PPFDdif Diffuse PPFD (mmol photon m2 s1) Meteorological variables VPD Vapor pressure deficit (hPa) Rh Relative humidity (%) SWC Soil water content (%) Air temperature ( 1C) Ta Soil temperature at 5 and 30 cm depth ( 1C) Ts1, Ts2 WD Wind direction (1) Friction velocity (m s1) u* Theoretical variables Potential radiation at the top of atmosphere (W m2) Rpot Diffuse fraction (%) fdif NEP measurement of the previous half-hour NEPhh
3.5
3.0
SD (µmol CO2 m–2 s–1)
Table 1
2.5
2.0
1.5
1.0
0.5
0.0
NEP (μmol CO2 m–2 s–1)
35
–5
0
5
10
15
20
25
NEP (µmol CO2 m–2 s–1)
30
Fig. 3 The standard deviation SD of the model residuals binned by the NEP flux magnitude in steps of 5 mmol CO2 m2 s1. The SD ranging from 1.1 to 3.4 mmol CO2 m2 s1 indicates low noise in the measurements as well as a good ANN model performance. (The dotted line is the average SD of the whole dataset. The error bars show the standard deviation of 10 ANN training scenarios.) For variable descriptions see Table 1.
25 20 15 10 5 0
Measured Modeled
–5 0
200 400 600 800 1000 1200 1400 1600
PPFD (μmol photons m–2 s–1) Fig. 2 Daytime NEP response of the Hainich forest plotted vs. PPFD. The response modeled with all 14 climatic drivers (black circles) captures 93.1% of the variability of the halfhourly measurements (gray circles). For variable descriptions see Table 1.
forest is dominated by photosynthesis, while respiration plays only a minor role. The quality-checked (level 3) observational datasets of the three nondrought years 2000, 2001, and 2002 were obtained from the standardized Carboeurope IP database (Papale et al., 2006). The climatic variables used for the characterization are listed in Table 1. In addition to the provided variables, the diffuse fraction fdif was calculated from the ratio of diffuse global Rdif to total global radiation Rg : fdif 5 Rdif/Rg. The variable fdif ranges from only direct, 0%, to only diffuse light, 100%. Since PPFDdif and PPFDdir were not measured directly, they were calculated from photosynthetic photon flux density (PPFD) and fdif. Only best quality data (flag 5 0) with complete input data during daytime (PPFD410 mmol photon m2 s1) of
the summer period (June–September, to avoid phenology effects) were selected. Additionally, five outlier data points of an exceptionally dry day [vapor pressure deficit (VPD)418 hPa] and 27 unclean diffuse fractions (fdif 5 0% and fdif4100%) were removed from the dataset. The total number of half-hourly data analyzed was 3015.
Results
How much information about the NEP response is present in the dataset? The ANNs trained with all 14 climatic drivers yielded an R2 of 93.1( 0.1)%, where the value in brackets is the SD over 10 ANN training scenarios. This benchmark means that 93.1% of the total variability of NEP in the half-hourly dataset can be explained with these 14 climatic drivers (Fig. 2). The high R2 also attests to the excellent data mining capability of the ANNs. The residuals of the benchmark ANN models can be used to obtain an estimate of the remaining error. The SD of the model residuals, binned by the NEP flux magnitude in steps of 5 mmol CO2 m2 s1, varied between 1.1 and 3.4 mmol CO2 m2 s1 (Fig. 3).
r 2010 Blackwell Publishing Ltd, Global Change Biology, 16, 2737–2749
CHARACTERIZING ECOSYSTEM RESPONSES To compare our results to previous work using paired observations or model residuals from Richardson et al. (2008), the linear relationship between SD and the magnitude of NEP was calculated for the positive bins:
1.0
SD ¼1:32ð0:05Þ þ 0:079ð0:003ÞNEP:
0.9
2743
Radiative variable Meteorological variable Theoretical variable Benchmark
ð13Þ
1
0.8
What are the climatic controls of the measured NEP flux? ANNs trained with only one climatic variable at a time showed the best performance with total PPFD as the input variable (Fig. 4, top). The coefficient of determination for modeling the half-hourly daytime NEP at the Hainich forest with the main driver total PPFD yielded 82.2%. The diffuse radiation PPFDdif exhibited a higher relevance than direct radiation PPFDdir as the single input driver. It is interesting to note that the NEP measurement for the previous half-hour explained 76.6% of the total variability. This indicates the persistency of the meteorological conditions between successive half-hours and can be taken as a measure of the lower performance limit of models used for predictions. The SD of R2 over 10 ANN training scenarios was generally small, demonstrating the robustness of this inductive approach; for these two primary drivers, the formal numerical SD was o0.01%. Since the response of NEP is dominated by light, the relevance of the other climatic controls was determined
0.7 R2
0.6 0.5 0.4 0.3 0.2 0.1 0.0
f dir D h t f r FD di P R SWC Ta Ts1 Ts2 WDusta _po f_di P_hh R PP PFD PFD V NE P P
Primary driver 1.00
Dominant driver only Performance improvement with secondary driver added
0.95 0.90
2
(2)
2 3
R2
The relationship obtained has an offset similar to the one previously reported for Hainich, but with only half the slope. This means that the random error estimated from the ANN benchmark models increases only half as fast with increasing flux magnitude. ANN training setups with different sets of input variables showed that the smaller increase can be attributed to including the diffuse radiation – this was not included in the analysis of Richardson et al. (2008); a minimal configuration with only PPFD, PPFDdif, VPD, and Ta as input drivers and only three to five nodes in the hidden layer of the ANN resulted in almost half the slope. The fact that the standard deviation of the ANN residuals is even below the paired observation estimates from Richardson et al. (2008), corroborates the assumption that the remaining error and thus the unexplained variability can be mostly attributed to noise in the measurements. Moreover, it shows that the relevant climatic drivers were included in the training dataset, and that the ANNs were able to pick up the underlying correlations and fully capture the ecosystem response. The mapped correlations permit the reconstruction of missing NEP measurements from the associated meteorological data. Hence, the benchmark ANNs can also be used as a so called gap-filling technique (A. M. Moffat, 2010).
0.85
3 7
4
7 6
4
5
0.80 0.75 0.70
f dir D h f t r FD di P R SWC Ta Ts1 Ts2 WDusta _po f_di P_hh R PP PFD PFD V NE P P
Secondary driver Fig. 4 Primary R2 performance of the ANN models trained with a single climatic driver at a time (top). Total PPFD is the dominating climatic control of the half-hourly daytime NEP response at Hainich. Then the ANNs were trained with PPFD plus a secondary climatic driver (bottom) and the improvement in the performance indicates the relevance. The proportion of diffuse to direct radiation (provided as PPFDdif, PPFDdir, or diffuse fraction fdif) is the most important secondary climatic driver. (The dotted line is the total explainable variability benchmarked with all 14 drivers. The error bars indicate the standard deviation of 10 ANN training scenarios; for most drivers this error bar is so small that it is not visible on the graph. Please note the different scale of the y-axis for the bottom graph.) For variable descriptions see Table 1.
by training the ANNs with PPFD plus one secondary climatic driver at a time. The highest improvement in performance, thus the most relevant secondary control for the daytime NEP response, was the proportion of diffuse radiation (Fig. 4, bottom). Each combination, total PPFD plus PPFDdir, total PPFD plus PPFDdif, total PPFD plus fdif, or PPFDdir plus PPFDdif (shown in the
r 2010 Blackwell Publishing Ltd, Global Change Biology, 16, 2737–2749
2744 A . M . M O F F A T et al.
NEPANN ¼ fðPPFDÞ ¼ 38:4 þ
NEP (μmol CO2 m–2 s–1)
30 25 20 15 10 5 0
Measured Modeled
–5 0
200 400 600 800 1000 1200 1400 1600
PPFD (μmol photons m–2 s–1) 0.06 0.05 Quantum Yield α
0.04 0.03 0.02 0.01 0
–0.01 0
200 400 600 800 1000 1200 1400 1600
PPFD (μmol photons m–2 s–1) Fig. 5 Daytime NEP response (top) and its numerical derivative (bottom) modeled with PPFD as a single climatic driver. The light response curve shows the expected behavior: a steep, almost linear initial increase leveling off to saturation for high PPFD. For variable descriptions see Table 1.
What are the characteristics of the NEP response to light? PPFD was identified as the dominant primary control of the half-hourly daytime NEP response. The ANN models trained on the Hainich dataset with PPFD as the only climatic control showed the expected functional form: a steep, almost linear initial increase that levels off to saturation for high PPFD (Fig. 5). This form is the result of the projection of the input PPFD via the nodes in the hidden layer onto the output f(PPFD). One of the ANNs with four hidden nodes had the following analytical network function: 105:4
1 þ 0:934 e
35
dNEP/dPPFD
next section), yielded the same R2 network performance. This means that each of these input driver combinations carries the same amount of information. In addition, it demonstrates the outstanding ability of the ANNs to extract this information even when the input drivers change units, change magnitude, or are (non)linearly transformed. The independence from the representation of the input drivers also shows the reliability of the network performance as a measure of the relevance of the climatic drivers. The sensitivity of these results to input uncertainty was tested by introducing an artificial uncertainty of 5%, the maximum relative error according to the specifications of the instrument devices. None of the tested scenarios (positive or negative offset and uncorrelated or correlated random noise) had an impact on the network performances or on the functional relationships derived below. The R2 of the ANN models with PPFD plus one of the three diffuse proportion drivers was 89.6( 0.1)%, and the average SD was reduced by over 20%, from 3.7 to 2.9 mmol CO2 m2 s1. In other words, the diffuse proportion explains an extra 7% of the variability and both drivers together almost all the explainable variability (93.1%) in the dataset. The next most relevant secondary control was the amount of air moisture represented either as VPD, or as relative humidity (Rh); this was followed by air temperature (Ta), and wind direction (WD). As a single driver (Fig. 4, top), the information of WD was concealed, since the mean NEP per degree was fairly constant. However, with the modulation of NEP by PPFD included, the effect of WD as a secondary driver revealed a difference in the NEP uptake. The uptake was higher for winds from the SW than from the NE (not shown). This is probably mostly related to regional weather patterns (prevailing dry and cold air masses from NE vs. humid and warm air from SW), but can also be associated with changes in the footprint. The friction velocity u*, which is a measure of the turbulent mixing in the atmosphere, should be of little relevance, since a correlation of NEP with u* indicates a systematic bias error in the eddy covariance measurements. The soil temperature Ts2 at the greater depth of 30 cm added slightly more new information than Ts1 to the daytime NEP response modeled with PPFD as the primary driver. As a slower changing variable, Ts2 might provide some information on the ecosystem state.
0:440 1þ14:6e0:00173PPFD
The soil water content, SWC, had little effect during these nondrought summers.
þ
0:289 15:53e0:00166PPFD
þ
0:380 15:08e0:00152PPFD
þ
2:599 1 þ 2:15e0:00360PPFD
! :
ð14Þ
Although the regression parameters of Eqn (14) (the weights and offsets of the logistic sigmoid functions)
r 2010 Blackwell Publishing Ltd, Global Change Biology, 16, 2737–2749
CHARACTERIZING ECOSYSTEM RESPONSES have no direct physiological meaning, the curve progression of this function and of its derivative can be used to derive the physiological characteristics: The derivative starts off almost constant at the onset of light, corresponding to a linear initial slope. This initial slope of 0.050 mmol CO2/mmol photons is the initial quantum yield a, the maximum light use efficiency of the ecosystem. The offset of NEP at zero light is the daytime respiration and has a value of 2.9 mmol CO2 m2 s1. Towards high PPFD values, the derivative approaches zero, denoting the saturation of the NEP response. The optimum (saturated) NEP at the highest irradiance of 1750 mmol photons m2 s1 is 22.5 mmol CO2 m2 s1. The obtained properties, the initial linear increase, the leveling off to saturation, and the magnitude of physiological parameters, meet the behavior expected for the light response of a deciduous broadleaf forest (e.g., Larcher, 2003). This agreement demonstrates that our inductive approach is able to extract the underlying functional relationship directly from the data. Since the relationships were derived solely from the observations without a priori assumptions, the agreement also provides an independent corroboration of the light response hypotheses at the ecosystem level. Although the need for such corroboration might not be obvious, an assessment of all commonly used semiempirical light response curves showed that some curves (e.g., the rectangular hyperbola) do not reflect the required physiological characteristics (A. M. Moffat, 2010). Their incorrect behavior at the edges, right where the physiological parameters are derived, leads to large differences in the estimates of the physiological parameters, despite a comparable overall performance.
full agreement with Gu et al. (2002), who found similarly enhanced light use efficiencies and weakened tendencies to cause canopy saturation for the diffuse radiation. Since the response to PPFDdif does not saturate, this effect is even more pronounced for high values of total PPFD. The high input relevance and enhanced light use efficiency and sensitivity of PPFDdif compared with PPFDdir stresses the importance of the diffuse radiation for the ecosystem response. As the dominant secondary control of the half-hourly daytime NEP response, it should be included in ecosystem models trying to predict the carbon flux at half-hourly or hourly timescales (see also Roderick et al., 2001). The hypotheses needed for the implementation can be based on the functional relationships derived by the ANNs. The presented study shows the dependencies of the NEP response to diffuse radiation at the Hainich forest (Figs 6 and 7). Herein lies the strength of our inductive approach: in addition to the detection and quantification of the impact of diffuse radiation, it provides an explicit characterization of the functional relationship. The enhanced light use efficiency of diffuse light leads to an increase in the NEP response of the Hainich forest. However, less of the potential radiation Rpot is received at the surface for high diffuse fractions due to the absorption and reflection by clouds and aerosols, and less light leads to a decrease in the NEP response. Therefore, the question arises whether the overall effect is positive or negative? To provide insight into this aspect, the ANN model was trained with the following three climatic input
NEP (μmol CO2 m–2 s–1)
What is the effect of diffuse radiation? The dependency of the daytime NEP response on the diffuse light was extracted from the dataset by training the ANN models with the diffuse and direct PPFD as inputs (Fig. 6). The simplicity of these ANN models (see Fig. 6), their high R2 of 89.6% and their low SD of 2.9 mmol CO2 m2 s1 (see section ‘What are the climatic controls of the measured NEP flux?’) demonstrate, that the extracted functional relationship NEPANN(PPFDdif, PPFDdir) is well suited to display and quantitatively characterize the response. The numerical partial derivatives reveal a significant difference in the functional relationship to diffuse radiation compared with direct radiation (Fig. 7, bottom): The initial quantum yield of PPFDdif is almost three times higher, its light use efficiency (magnitude of the derivative) is enhanced throughout the response, and the response shows no saturation even for high PPFDdif. These results are in
2745
(μm
ol
35 30 25 20 15 10 5 0 800 700 600 500 400 PP 300 F 200 ph D dif 100 ot
on
s m–
2
1400 10001200 600 800 400 200
ir PPFDd –2 s–1 ) m s n to o mol ph
s –1 )
(μ
Fig. 6 Closed symbolic representation of the ANN modeling the half-hourly daytime NEP response to the climatic controls diffuse PPFDdif and direct PPFDdir. The simplicity of the ANN model is well suited to display and characterize the functional relationship NEPANN(PPFDdif, PPFDdir). For variable descriptions see Table 1.
r 2010 Blackwell Publishing Ltd, Global Change Biology, 16, 2737–2749
2746 A . M . M O F F A T et al.
(a)
(b)
35
NEP (μmol CO2 m–2 s–1)
NEP (μmol CO2 m–2 s–1)
30 25 20 15 10 5 0
Measured Modeled
–5 0
30 25 20 15 10 5 0
Measured Modeled
–5
100 200 300 400 500 600 700 800
(c) 0.06
0
200
(d) 0.06
Numerical partial derivatives
400
600
800 1000 1200 1400 1600
Numerical partial derivatives
0.05
∂NEP/∂PPFDdif
0.05
∂NEP/∂PPFDdif
35
0.04 0.03 0.02 0.01
0.04 0.03 0.02 0.01 0
0
–0.01
–0.01 0
100 200 300 400 500 600 700 800
PPFDdif (μmol photons
0
m–2 s–1)
200
400
600
800 1000 1200 1400 1600
PPFDdir (μmol photons m–2 s–1)
Fig. 7 ANN model predictions (black circles) and half-hourly measurements (gray circles) of the daytime NEP response plotted vs. the two climatic drivers: (a) diffuse PPFDdif and (b) direct PPFDdir. The ANN model captures 89.5% of the variability of the half-hourly measurements. The numerical partial derivatives correspond to the light use efficiency, which is about three times higher for diffuse compared with direct radiation. For variable descriptions see Table 1. 20
NEP
Numerical partial derivatives
∂NEP/∂fdif
10
fdif
0 –10 –20 –30 0.1
0.2
0.3
0.4
0.5 0.6 fdif
0.7
0.8
0.9
1
Fig. 8 Numerical partial derivatives of the daytime NEP response to the diffuse fraction fdif for each half-hourly data point. The small sketch depicts the functional relationship of NEP to fdif. The net effect of fdif reaches its optimum between 28% and 44%. For variable descriptions see Table 1.
drivers: the potential radiation Rpot and the diffuse fraction fdif, plus the vapor pressure deficit VPD to
include confounding effects with diffuse radiation. Since the daytime NEP response is now modeled with three inputs, the analytical function has too many dimensions to be directly visualized. For these multidimensional relationships, the partial derivatives are of great value to examine their behavior. Figure 8 shows the numerical partial derivatives of the modeled response with respect to fdif: At first, the NEP response is enhanced (positive derivative) until it reaches an optimum (zero derivative) and then the NEP response is reduced (negative derivative). This means that the net effect of the diffuse radiation is at an optimum for diffuse fractions from 28% to 44% at the Hainich site. This range is close to the optimum of 45% found in a recent study by Knohl & Baldocchi (2008) for NEP fluxes at Hainich using a biophysical multilayer model of the canopy. Both approaches thus depict optima where there is less diffuse than direct light. But, as one can see in Fig. 8, the majority of the half-hourly measurements are beyond the optimum range counteracting the increase in the NEP response. This leads to a slightly negative average numer-
r 2010 Blackwell Publishing Ltd, Global Change Biology, 16, 2737–2749
Numerical partial derivatives
0.2
NEP
CHARACTERIZING ECOSYSTEM RESPONSES
4
∂NEP/∂VPD
0
VPD
–0.2 –0.4 –0.6 –0.8 0
2
4
6
8 10 VPD (hPa)
12
14
16
Fig. 9 Numerical partial derivatives of the daytime NEP response to the VPD for each half-hourly data point. The small sketch depicts the functional relationship of NEP to VPD. There is a negative, down-regulating effect on NEP for high values of VPD. For variable descriptions see Table 1.
40
Sensitivities (PaD)
30
PPFDdir PPFDdif VPD
20
2747
half-hourly daytime NEP response was modeled with PPFDdif, PPFDdir, and VPD as the climatic input drivers. Adding VPD improved the R2 by 1.1 to 90.7%. The numerical partial derivatives show the characteristics of the NEP response with respect to VPD (Fig. 9): first, a slight increase in NEP (positive derivative), an optimum (zero derivative) around 4 hPa, and, then, a strong down-regulating effect (negative derivative) with increasing dryness of the air. ANN models trained on individual months can be used to investigate whether the sensitivity of the NEP response to VPD varies over the summer period. To detect primarily the response to VPD, only early afternoon hours (11:30am–2pm hours) with stable light conditions but high changes in VPD were extracted for the analysis. Figure 10 shows that the negative sensitivity to VPD peaks in August, the hottest and driest month. In a study by Schulze (1970) on the carbon gas exchange of single beech trees in Sollingen, 100 km north-east of Hainich, the strongest effect due to dry atmospheric conditions occurred also in August. Thus, the response to air moisture found at the tree level can be observed in the carbon flux measurements at the stand level – analogously to the light response hypotheses in section ‘What are the characteristics of the NEP response to light?’ above.
10
Discussion 0
–10
–20 Jun
Jul
Aug
Sep
Month Fig. 10 The positive and negative sensitivities of the daytime NEP response to PPFDdir, PPFDdif, and VPD during early afternoon hours, modeled separately for each month. The sensitivity of NEP to VPD is most negative in the hottest and driest month of August. PPFD, photosynthetic photon flux density. For variable descriptions see Table 1.
ical derivative of 0.5 mmol CO2 m2 s1 per half-hourly data point, thus a negative but small overall effect from the diffuse radiation.
How does the VPD affect the daytime NEP response? After the diffuse proportion, the next most important secondary driver is a measure of the air humidity, Rh, or dryness, VPD, respectively. To investigate the effect, the
The strength of a fully inductive approach to rely only on the information present in the data has its own specific challenges. The following points need to be taken into consideration to avoid pitfalls in the interpretation of the ANN models: Adequate response space: To reach the goal of modeling the overall response, an annual dataset is appropriate. If the interest is in the light response curve, the dataset should span time periods where the ecosystem stays in the same phenological and ecological states with respect to the photosynthesis response. For example, including months with leaves off would smear out the photosynthesis response. Taking summer months but including months with drought conditions might result in a light response curve where the saturation has a drop for the highest irradiances. This will look like photoinhibition, but will actually be caused by the superposition of the light response curve with a reduced optimum NEP under water stress. As an alternative to limiting the dataset to the same state, the entire dataset can also be used but with an additional input variable describing the changing condition, for example a proxy for the water stress. With this, the ANN is able to distinguish between drought and
r 2010 Blackwell Publishing Ltd, Global Change Biology, 16, 2737–2749
2748 A . M . M O F F A T et al. nondrought conditions and will map the responses accordingly. Artifacts: To avoid modeling artifacts present in a specific dataset or nonobvious changes in the phenological or ecological states, the identified relationships should prove to be robust for different time periods, e.g., individual months vs. the whole summer period or summer months of different years. Missing relevant driver: The ANN can show a good model performance though a physiologically relevant driver was missing. It means that the effect of the missing driver was mapped onto the included drivers through cross-correlations. Usually, the found relationships are then not independent, and, therefore, not robust. The mapped functional relationships will change as soon as another driver with some crosscorrelation or the actual missing driver is added. If adding drivers does not change the main properties of the numerical partial derivatives, this is a good sign for robustness. Confounding factors: The hidden biases or indirect effects caused by confounding phenological, ecological, or climatic factors are much harder to detect. To rule out known confounding factors, these can be added to the data used for training as observed or theoretical drivers. This way, their impact is included in the modeled response, provided that the confounding factors are not correlated to any of the other input drivers, that they are well defined over the whole range, and that they do not add too many degrees of freedom to the network. An alternative solution is to perform marginal sampling, where the dataset is grouped into subsets for certain ranges of the confounding factor. The ANN models are then trained on each of the subgroups. Robust relationships will hold true for all of the subgroups. Ecophysiological plausibility: Since the ANN models are constrained solely by the data, some prior knowledge of ecosystem physiology is required to ensure a proper choice of the representative dataset and to judge the plausibility of the results under anticipation of confounding factors. Only then does this inductive approach produce meaningful results. The systematic approach presented in this paper has been implemented as a toolbox. Once the dataset is configured, the setup of different ANN routines is simple and highly flexible. The ANN training procedure is fully automated and takes only a few minutes on a typical desktop computer. Although the analysis tools have been tailored to extract information from large datasets, the ANNs also appear to work with small amounts of data. We have tested their ability to model the light response curve for single days with as few as 10 data points; the physiological quantities
estimated from the ANNs were consistent with the estimates of a prescribed semi-empirical equation.
Conclusions and outlook As demonstrated for the daytime carbon fluxes of the Hainich forest, the inductive approach presented here can be used to characterize the functional dependencies solely from the half-hourly eddy covariance measurements, without prior assumptions about the shape of the response. The extracted purely empirical light response curve provides an independent corroboration of current plant physiological hypotheses. Estimates of the random measurement error from the ANN model residuals were lower than previous estimates. This could be attributed to the inclusion of the proportion of diffuse radiation, which was the second most important input variable to explain the daytime carbon fluxes after total radiation. This key finding stresses the importance of the diffuse radiation for the short-term light response. The functional dependency of the daytime response to diffuse radiation showed no saturation, and it would be of great interest to investigate the generality of this relationship for other types of ecosystems. The net effect of the diffuse radiation was determined by modeling with two theoretical drivers – the potential radiation at the top of the atmosphere and the diffuse fraction. Since the light conditions at the Hainich forest were mainly beyond the optimum diffuse fraction of 30–40%, the overall effect of the diffuse light was on average slightly negative for the 3 years 2000–2002. The most important nonradiative drivers in the hierarchy of the climatic controls were the vapor pressure deficit, followed by air temperature and wind direction. Multidimensional relationships in the data were further characterized using numerical partial derivatives. For example, the vapor pressure deficit showed a strong down-regulating effect with increasing dryness of the air. Our inductive approach offers the potential to serve as a new key instrument for the explanation of observations, for instant testing and independent validation of hypotheses, and for the detection of new findings. The worldwide network of eddy flux towers in FLUXNET offers the opportunity to investigate ecosystems spanning from the arctic to the savannah. For managed ecosystems, the ability to include theoretical variables, such as a fuzzy variable to describe the harvesting event, will be of benefit. The theoretical variables also offer the possibility to include time lag effects and determine their relevance for the ecosystem response. The approach is not limited to the net carbon flux, but can be extended to
r 2010 Blackwell Publishing Ltd, Global Change Biology, 16, 2737–2749
CHARACTERIZING ECOSYSTEM RESPONSES the partitioned GPP/RE carbon flux, the energy and momentum flux, or other greenhouse gases. By supplying the link between the observations and their representation in the modeling world, the presented inductive approach is complementary to the classic hypothetic-deductive approach. This will further the understanding of the underlying processes as well as promote their implementation in models, which, in turn, will help the prediction of the effects of changing environmental conditions on the terrestrial biosphere. Since purely empirical models adapt to the particular conditions of the ecosystem as present in the training dataset, they can also be used to identify differences in the response over time. If changes in the climate lead to changes in the ecosystem response to its climatic controls, the presented methodology would be able to detect these directly in the measurements.
Acknowledgements We would like to thank the following people for their contributions to this paper: Olaf Kolle for introducing Antje Moffat to the Hainich flux site, Corinna Rebmann for sharing her broad knowledge about the measurements, Andrew Richardson, John Grace, Alessandro Cescatti, and Detlef Schulze for in-depth discussions of the results, Petra Werner and Andrew Jarvis for their comments on the general scope, Gill McLean for proofreading of this manuscript, Bryce Moffat for proofreading of the various drafts, and the ROOT team at CERN for providing their extensive C 11 programming framework. Furthermore, we would like to thank the editor Ivan Janssens and the three anonymous reviewers for their thorough comments and constructive criticism, which greatly helped to improve this paper. The datasets used in this paper were obtained from the CarboEurope-IP database (EU project GOCE-CT-2003-505572). The site PIs Alexander Knohl, Corinna Rebmann, and Werner Kutsch are thanked for making the Hainich data available to the database.
References Abramowitz G (2005) Towards a benchmark for land surface models. Geophysical Research Letters, 32, L22702. Baldocchi DD (2008) Breathing of the terrestrial biosphere: lessons learned from a global network of carbon dioxide flux measurement systems. Australian Journal of Botany, 56, 1–26. Bishop CM (1995) Neural Networks for Pattern Recognition. Oxford University Press, Oxford, UK. Burgermeister J (2007) Missing carbon mystery: case solved? Nature Reports Climate Change, 0708, 36–37. Cybenko GV (1989) Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2, 303–314.
2749
Goeckede M, Rebmann C, Foken T (2004) A combination of quality assessment tools for eddy covariance measurements with footprint modelling for the characterisation of complex sites. Agricultural and Forest Meteorology, 127, 175–188. Gove JH, Hollinger DY (2006) Application of a dual unscented Kalman filter for simultaneous state and parameter estimation in problems of surface-atmosphere exchange. Journal of Geophysical Research-Atmospheres, 111, D08S07, doi: 10.1029/ 2005JD006021. Gu LH, Baldocchi D, Verma SB, Black TA, Vesala T, Falge EM, Dowty PR (2002) Advantages of diffuse radiation for terrestrial ecosystem productivity. Journal of Geophysical Research-Atmospheres, 107, D6, 4050, doi: 10.1029/2001JD001242. Heimann M, Reichstein M (2008) Terrestrial ecosystem carbon dynamics and climate feedbacks. Nature, 451, 289–292. Hempel CG, Oppenheim P (1948) Studies in the logic of explanation. Philosophy of Science, 15, 135–175. IPCC (2007a) Observed effects of climate changes (Chapter 1.2). In: Climate Change 2007: Synthesis Report, Fourth Assessment Report of the Intergovernmental Panel on Climate Change, pp. 31–33. IPCC, Geneva, Switzerland. IPCC (2007b) The Contemporary Carbon Budget (Chapter 7.3.2). Climate Change 2007: The Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change, pp. 519–521. Cambridge University Press, Cambridge, UK. Knohl A, Baldocchi DD (2008) Effects of diffuse radiation on canopy gas exchange processes in a forest ecosystem. Journal of Geophysical Research-Biogeosciences, 113, G02023, doi:10.1029/2007JG000663. Knohl A, Schulze E-D, Kolle O, Buchmann N (2003) Large carbon uptake by an unmanaged 250-year-old deciduous forest in Central Germany. Agricultural and Forest Meteorology, 118, 151–167. Kutsch WL, Kolle O, Rebmann C, Knohl A, Ziegler W, Schulze E-D (2008) Advection and resulting CO2 exchange uncertainty in a tall forest in central Germany. Ecological Applications, 18, 1391–1405. Larcher W (2003) Physiological Plant Ecology. Springer-Verlag, Berlin, Heidelberg. Moffat AM (2010) A new methodology to interpret high resolution measurements of net carbon fluxes between the terrestrial ecosystems and the atmosphere. Doctoral thesis, Friedrich Schiller University, Jena. Moffat AM, Papale D, Reichstein M et al. (2007) Comprehensive comparison of gapfilling techniques for eddy covariance net carbon fluxes. Agricultural and Forest Meteorology, 147, 209–232. Papale D, Reichstein M, Aubinet M et al. (2006) Towards a standardized processing of net ecosystem exchange measured with eddy covariance technique: algorithms and uncertainty estimation. Biogeosciences, 3, 571–583. Papale D, Valentini A (2003) A new assessment of European forests carbon exchanges by eddy fluxes and artificial neural network spatialization. Global Change Biology, 9, 525–535. Richardson AD, Hollinger DY, Burba GG et al. (2006) A multi-site analysis of random error in tower-based measurements of carbon and energy fluxes. Agricultural and Forest Meteorology, 136, 1–18. Richardson AD, Mahecha M, Falge E et al. (2008) Statistical properties of random CO2 flux measurement uncertainty inferred from model residuals. Agricultural and Forest Meteorology, 148, 38–50. Roderick ML, Farquhar GD, Berry SL, Noble IR (2001) On the direct effect of clouds and atmospheric particles on the productivity and structure of vegetation. Oecologia, 129, 21–30. Rojas R (1996) Neural Networks – A Systematic Introduction. Springer, Berlin, Heidelberg. Schulze E-D (1970) Der CO2-Gaswechsel der Buche (Fagus silvatica L.) in Abha¨ngigkeit von den Klimafaktoren im Freiland. Flora, 159, 177–232. Stauch VJ, Jarvis AJ (2006) A semi-parametric gap-filling model for eddy covariance CO2 flux time series data. Global Change Biology, 12, 1707–1716. van de Laar P, Heskes T, Gielen S (1999) Partial retraining: a new approach to input relevance determination. International Journal of Neural Systems, 9, 75–85. Young PC, Jarvis AJ (2002) Data-based Mechanistic Modelling and State Dependent
r 2010 Blackwell Publishing Ltd, Global Change Biology, 16, 2737–2749
Parameter Models. In: CRES Report Number TR/177. Centre for Research on Environmental Systems and Statistics, Lancaster University.