Evaluation of Near-Surface Air Temperature from Reanalysis over the United States and Ukraine: Application to Winter Wheat Yield Forecasting

 Abstract — In this work we evaluate the near-surface air temperature datasets from the ERA-Interim, JRA55, MERRA2, NCEP1, and NCEP2 reanalysis projects. Reanalysis data were first compared to observations from weather stations located on wheat areas of the United States and Ukraine, and then evaluated in the context of a winter wheat yield forecast model. Results from the comparison with weather station data showed that all datasets performed well (r 2 >0.95) and that more modern reanalysis such as ERAI had lower errors (RMSD ~ 0.9) than the older, lower resolution datasets like NCEP1 (RMSD ~ 2.4). We also analyze the impact of using surface air temperature data from different reanalysis products on the estimations made by a winter wheat yield forecast model. The forecast model uses information of the accumulated Growing Degree Day (GDD) during the growing season to estimate the peak NDVI signal. When the temperature data from the different reanalysis projects were used in the yield model to compute the accumulated GDD and forecast the winter wheat yield, the results showed smaller variations between obtained values, with differences in yield forecast error of around 2% in the most extreme case. These results suggest that the impact of temperature discrepancies between datasets in the yield forecast model get diminished as the values are accumulated through the growing season.


I. INTRODUCTION
EANALYSIS projects rely on static data assimilation systems to produce continuous and consistent gridded products at global scale, from the combination of multiple observational datasets [1].Whilst originally designed with the needs of the atmospheric research community in mind, products from reanalysis projects have increasingly become a common data source for a wide range of science disciplines.For example, in the case of renewable energy production, solar irradiance and wind speed data from reanalysis can be used to simulate the potential of solar and wind energy [2]- [5].Reanalysis products have been shown useful for the study of extreme events.Precipitation and sea/land surface temperature data from reanalysis has been used to study the effects of El A. Santamaria-Artigas, B. Franch, P. Guillevic, J.C. Roger, and S. Skakun are with the Department of Geographical Sciences, University of Maryland, College Park, MD 20742 USA, and NASA Goddard Space Flight Center, Greenbelt, MD 20771 USA (email: asantam@umd.edu;befranch@umd.edu;pierreg@umd.edu;roger63@umd.edu;skakun@umd.edu)Niño on the warming and drought of the Amazon rainforest [6], and more recently [7] assessed the potential of several products from satellite observations, reanalysis projects, and land surface models to characterize episodes of agricultural drought in East Africa.Reanalysis products have also been exploited in agricultural studies.In particular, near-surface air temperature (T2M) data has proven to be a relevant input in the forecast of crop yields and production from land modeling and remote sensing approaches, as it allows to estimate parameters such as evapotranspiration [8], [9] and Growing Degree Day (GDD), which represents the amount of heat energy accumulated by an organism, a driving factor in phenological development [10], [11], and it has been widely used in local [12]- [14] and regional [15]- [19] scale crop forecasting models, and as a source of information in the early-season mapping of wheat areas [20].Near-surface air temperature data is generally obtained from measurements made by weather stations.However, access to data from many stations is not always available, and while some areas have dense measurement networks, many places don't count with enough stations to fulfill the requirements of local or global studies.In this context, reanalysis products can be a useful data source for these purposes.However, spatial and temporal characteristics of air temperature datasets differs between reanalysis projects, each of which uses different input products, spatial resolution and internal data assimilation techniques.In this regard, inter-comparing the most commonly used reanalysis datasets allows to detect spatiotemporal agreement and disagreement patterns, and help identify where and when the provided information is most reliable.
Previous studies have evaluated the performance of T2M from reanalysis projects using data from weather station or gridded datasets over different temporal scales and spatial extents.For example, the evaluation of air temperature datasets from the ERA-40, NCEP1, and NCEP2 reanalyses with groundbased measurements over China at monthly, seasonal, and yearly timescales showed good agreement between datasets, with differences in performance driven mainly by elevation differences [21].Inter-comparison and evaluation of surface air temperature monthly anomalies from ERA-Interim and JRA-55 reanalysis with data from the HadCRUT4 and NOAAGlobalTemp carried out by the Copernicus Climate Change Service from the European Union's Copernicus Earth observation program showed similar performance at continental and global scales [22].
While these inter-comparison and evaluation studies have reported good agreement and performance of reanalysis datasets, they focus on spatial extents and timescales which may be too coarse for certain applications such as agricultural forecast and monitoring, which requires information at higher spatial resolution and on timescales that allow to study the phenological development of crops.In this regard, and with the objective of carrying out an evaluation on T2M datasets that is relevant to wheat forecast studies, we delimit the extent of the inter-comparisons exclusively to planted areas, and analyze the impact of the different T2M sources on the computation of a relevant parameter, such as the accumulated GDD, and its further effects on the derived yield forecasts.
The objective of this study is then: (1) to evaluate the T2M datasets from five commonly used reanalyses over wheat planted areas in the United States and Ukraine with observations from a network of automatic weather stations; and (2) to evaluate the impact of using air temperature products from the different reanalyses on GDD and crop yield estimates.

II. DATA
In this paper, T2M from five reanalysis projects is evaluated using daily weather station data from 2000 to 2017 over two study areas: United States (U.S.) and Ukraine.Moreover, the impact of the different datasets on the errors of a winter wheat yield estimation model is evaluated using official statistics.The following types or data are used: T2M from reanalysis projects and weather stations; Bidirectional Reflectance Distribution Function (BRDF)-corrected NDVI data from MODIS; winter wheat crop specific masks; and official yield statistics.

A. Study Area United States
The U.S. is one of the top producers of wheat globally.Winter wheat varieties account for around 75% of the national wheat production and are grown primarily in the Great Plains.Winter wheat is generally planted in September-October and emerges during October-November.Cold temperatures make plants enter dormancy on late-November until they resume growth at the beginning of March.The crops then reach maturity during June-July and are harvested by the end of July or the beginning of August.

Ukraine
Ukraine is among the largest producers and exporters of wheat in the world.Winter wheat accounts for 95% of the total wheat production and is grown all over the country, although the central and southern regions are the key growing areas.Similarly to the U.S., winter wheat is planted in September-October and harvested on July-August of the following year.

B. Reanalysis Data
In this work, we evaluate the T2M from five widely used reanalysis projects: ERA-Interim (ERAI) [23] from the European Centre for Medium-Range Weather Forecast, the Japanese 55-year Reanalysis (JRA55) [24] from Japan's Meteorological Agency, the Modern-Era Retrospective Analysis for Research and Applications Version 2 (MERRA2) [1] from NASA's Global Modeling and Assimilation Office (Molod et al., 2015), and NCEP1 [25] and NCEP2 [26] from the National Center for Environmental Prediction.For this study, daily mean T2M was computed by averaging data from 00h, 06h, 12h, and 18h.

ERAI
The ERAI reanalysis from ECMWF provides data covering the period from 1979 to the present at 3-hourly time steps on the TL255 grid (around 80km x 80km).The air temperature at 2 meter product is directly assimilated from observations [27].

JRA55
The JRA55 reanalysis from the JMA provides data for the period from 1958 to the present at 6-hourly time steps on the TL319 reduced Gaussian grid (around 55km x 55km).The 2m temperature product in JRA55 is generated by comparing the first guess from the model at observation times with actual observations to derive a correction which is then applied to the values generated at the analysis times [24].

MERRA2
The MERRA2 reanalysis from NASA GMAO was designed as a replacement for MERRA [28].It provides data from 1979 to present day every 6 hours at an approximate resolution of (around 50km x 50km).

NCEP1 and NCEP2
NCEP generates two reanalysis versions currently available for download.The NCEP1 reanalysis, spanning from 1948 to the present, and NCEP2, an improved version which spans from 1979 to the present.NCEP2 reanalysis is generated at the same temporal (6-hourly) and spatial resolution as NCEP1 (T62 Gaussian grid, around 190km x 190km), but uses updated physics and corrects errors present in NCEP1 [26].The air temperature at 2 meter product is generated by a linear interpolation of the surface skin temperature and the model temperature at the .995sigma level.

C. In-situ data
To evaluate the 2m temperature reanalysis datasets we used data from automatic weather stations in the Global Surface Summary of the Day (GSOD) project [29] from the National Oceanic and Atmospheric Administration (NOAA).GSOD includes daily values for 18 surface meteorological variables from over 9,000 stations globally.In the case of T2M, daily values are computed by averaging all observations made during a particular day.For this study, we only considered data from stations within wheat areas defined by winter crop masks.

D. MODIS BRDF-corrected NDVI
The NDVI computed from BRDF-corrected MODIS red and near-infrared reflectances is one of the main inputs of the winter wheat yield estimation model used in this study.The red and near-infrared reflectances were obtained from the Collection 6 MODIS MOD09CMG (Terra) and MYD09CMG (Aqua) daily surface reflectance products [30], [31].The M{O,Y}DCMG are distributed in the Climate Modeling Grid (0.05 latitude x 0.05 longitude, around 5km x 5km).For this study, we used the VJB method [32], [33] to derive BRDF-corrected surface reflectance and NDVI as in [34].

E. Winter wheat crop masks
Winter wheat crop masks at CMG scale were derived from higher resolution crop layers.For the U.S. (Figure 1), we used a readily available winter wheat mask from the Cropland Data Layer (CDL) produced by the United States Department of Agriculture (USDA) National Agricultural Statistics Service (NASS).The CDL is generated using field level data from ground surveys, farmer reports from the U.S. Farm Service Agency, and remotely sensed data from Landsat-5 Thematic Mapper, Landsat-7 Enhanced Thematic Mapper, and the Advanced Wide Field Sensor.For Ukraine (Figure 2), we used a winter wheat map generated by [34] using a decision tree classifier similar to that used to produce the CDL from NASS.More details on the generation of the wheat map can be found on the original paper.

Figure 2. 2000-2017 Mean Winter Wheat % for Ukraine F. Official Winter Wheat Yield Statistics
As in [34], [35], official winter wheat yield statistics were derived for each country based on their administrative units.For the U.S. we work at county level (average area of 258,000 ha) and for Ukraine we work at oblast level (average area of 2,414,000 ha).Official yield statistics for the U.S. were obtained from the official archive of county-level statistics on yield available from the USDA NASS quick stats database.For Ukraine, winter wheat crop statistics at oblast level were obtained from the State Statistical Committee of Ukraine.

A. Evaluation with in-situ data
We evaluated the 2m air temperature data from reanalysis using weather station data of the GSOD project from 2000 to 2017.For this study, we only considered data from stations located in areas where winter wheat is planted.Figure 3 and Figure 4 show the distribution of considered weather stations for the U.S. (N=1,901) and Ukraine (N=167) over a Landsat-8 RGB composite of the year 2016.

S. and Ukraine
For the evaluation, we interpolated all reanalysis datasets to the CMG grid using bilinear interpolation.Then, at each station site we extracted the daily time-series for all datasets from the closest pixel.After computing the daily differences between the reanalysis and weather station time-series, outliers outside the range of ±3 standard deviations from the mean were removed.
All datasets showed a similar spatiotemporal distribution of the removed outliers, which accounted for around 1% to 1.5% of their total data points.Finally, as performance metrics we computed determination coefficient (r 2 ), mean difference (BIAS), standard deviation of the differences (SIGMA), and Root Mean Square Difference (RMSD) between reanalysis and weather station data.

B. Influence of air temperature source on yield forecasting
We evaluated the impact of using different air temperature reanalysis products on the yield monitoring using the methods described by [34], [35], so only a brief description is given here.
These methods are based on the assumption that the yield is positively and linearly correlated to the seasonal NDVI peak adjusted for background noise (ANDVI) at the administrative unit (AU, county or oblast) level and to the purity of the wheat signal (percentage of wheat within the pixel).The regression model which was calibrated and applied at the state level in Kansas using MODIS data and proved to be directly applicable at the national level in Ukraine [34].The timeliness of the method was later improved by including GDD information derived from T2M of the NCEP1 reanalysis to forecast the NDVI peak [35].The GDD is defined as the average daily temperature (  ) minus a base temperature (  ): where   = 0, and if   <   then the GDD = 0 [36].
Each developmental stage of an organism has its own total heat requirement.Accumulated GDD, calculated by summing GDDs for each day during a period starting from a biofix date (Eq.2), is related to the amount of accumulated heat by plants and can be directly related to the actual stage of plant development: In the U.S. and Ukraine, winter wheat reaches the reproductive stage represented by the peak NDVI when the accumulated GDD from January 1 st is around 1,000°C [35], [37].Assuming that after a certain date and in the absence of any stress or perturbation (e.g.droughts, frosts, heat stress), the evolution of the adjusted NDVI with the accumulated GDD will follow a normal evolution.Therefore, the peak NDVI could be predicted using the following equation: where   and   are the NDVI value at a particular day and at the peak respectively, and   and   are estimated based on the median NDVI from all years included.This relationship allowed to estimate the peak NDVI and make reliable forecasts between 30 days to 45 days prior to the peak NDVI (i.e. 60 to 75 days prior to harvest), while keeping an accuracy of 10% in the yield forecast for the U.S. [35].With this relationship in mind, in this work we also compared the day of year (DOY) at which the accumulated GDD reached 1,000°C when using each dataset.
IV. RESULTS

A. Evaluation with in-situ data
Figure 5 shows the evaluation of reanalysis with observations from weather stations between 2001-2017.

Figure 5. GSOD vs Reanalysis for Stations in the U.S. and Ukraine
All datasets show strong relationship with observations (r 2 >0.95, p-value<0.01).The ERAI, JRA55, and MERRA2 products show similar performance in terms of r 2 (~0.98) and RMSD (<2K), although MERRA2 shows a higher BIAS.NCEP1 showed the worst performance (BIAS ~-1K and RMSD ~2.5 K).Moreover, all datasets can reproduce the variability of extreme temperatures measured by the weather stations, which is similar to results found by [38] over China.All reanalysis show the higher agreement with weather stations on the central region of the U.S..The worst results are shown by the NCEP1 and NCEP2 datasets, particularly a strong cool BIAS over the stations in the mid-west of the country where mountainous areas are more common.In these areas of complex terrain structure, the different original resolution of the reanalysis products could play a greater role on the differences in reported 2m air temperatures [21].
Areas close to water also show poorer agreement between reanalysis and weather station data.This might be explained by the coarse spatial resolution of these products that can mix sea and land surface temperatures within the same pixel.
Figure 7 shows the results for each considered station in Ukraine.As for the U.S., all data was considered for this figure and there was no removal of outliers.
Performance for all reanalysis is generally better over Ukraine than the U.S..There is lower agreement with observations on areas of complex terrain close to the Carpathian mountains, and on areas to the Black Sea and to in-land water bodies like Dnieper river, where point observations might better represent local temperatures than coarser resolution pixels which can include both water and heterogeneous land surfaces.shows the average value and standard deviation of the accumulated GDD at the NDVI peak for the U.S. and Ukraine.For both countries the differences in mean accumulated GDD at the NDVI peak between datasets are not statistically significant and are included within the standard deviation.Figure 8 shows the accumulated GDD at the peak of vegetation development by U.S. county.Differences between datasets are lower for counties located on the Southern Plains than for counties on the northwestern part of the country where the terrain is more complex and spatial resolution of the models plays a greater role.

Table I. Mean National Accumulated GDD at Peak
Figure 9 shows the accumulated GDD at the peak of vegetation development by Ukraine oblast.

Figure 9. Accumulated GDD at NDVI peak for Ukraine Oblasts
For Ukraine, the difference in accumulated GDD at the peak between datasets is more evident for southern oblasts close to the black sea.This might be explained by the coarse spatial resolution of these products that can mix sea and land surface temperatures within the same pixel.

Date of the GDD peak
Table II shows the average and standard deviation of DOY when the accumulated GDD reaches the peak (1,000°C) in the U.S. and Ukraine.

Dataset
Mean DOY of GDD Peak for the U.S.

Mean DOY of GDD Peak for Ukraine
All datasets show similar values for the national average and standard deviation of the GDD peak DOY.In the case of the U.S., the average DOY computed with the different temperature datasets ranges between 141 and 145.For Ukraine, the variation in DOY is smaller and ranges between 160 and 162.This is expected due to the country's smaller size, more homogeneous climate, and closer proximity between AU.
Figure 10 shows the DOY of the GDD peak for U.S. counties.

Figure 10. DOY of the GDD peak by U.S. Counties
There is a latitudinal gradient of the date in which the counties reach the peak GDD that can be explained by the U.S.' climate variability.The temperate humid climate with hot summers and cool winters of the Southern Plains counties allows the winter crops to accumulate the required heat earlier in the season, while the humid continental climate of the northern counties makes that more time is needed to reach this point [35].As before, the largest discrepancies between datasets are evidenced on the northern counties, where the terrain is more complex than on the Southern Plains.
Figure 11 shows the DOY of the GDD peak for Ukraine oblasts.Similar to what was observed for the U.S., in Ukraine the differences in climate between southern and northern oblasts makes that the latter accumulate the required heat later in the season.In this case however, the lower climatic gradient generates smaller differences in the DOY of GDD peak along the country.

Figure 11. DOY of the GDD peak by Ukraine Oblast
Influence of T2M source on yield estimation error Figure 12 shows the forecast errors of the model [35] for the U.S. and Ukraine when using the T2M products to compute the GDD.The forecast errors are obtained by comparison with yields reported by official statistics.The values are represented in terms of days before the national average DOY of the NDVI peak (DOY 140 for the U.S. and DOY 165 for Ukraine [35]).The lowest errors from the forecast model were obtained with the surface air temperature from NCEP and NCEP2.Similarly to the results of the temperature analysis, the differences in forecast error between datasets is larger for the U.S. than for Ukraine.For both countries, the variation in forecast error between datasets is reduced when the forecast is done closer to the average NDVI peak date.It is important to note that the GDD-dependent method [35] is based on the relationship between NDVI and accumulated GDD to forecast the NDVI value at the peak.However, during or after the peak date, the model follows the approach from [34], where the actual value of the NDVI at the peak is used.

V. DISCUSSION AND CONCLUSION
In this study we evaluated the near surface air temperature product from five well known reanalysis projects over the U.S. and Ukraine in the context of a GDD-dependent winter wheat yield forecast model [2].Evaluation of the temperature datasets was carried out using weather station data from the GSOD project, and the influence of the temperature source in the model was analyzed in terms of the accumulated GDD value at the date of the NDVI peak, the DOY where the NDVI peak occurs, and the yield forecast error of the model when compared to official statistics.
All datasets performed well when evaluated using data from weather stations as reference.The r2 values ranged between 0.95-0.98 for the U.S. and between 0.96-0.99 for Ukraine.The best results for both countries were obtained by ERAI and JRA55, followed by MERRA2 which showed larger BIAS.NCEP1 and NCEP 2 showed worse, although still good, performance than the most modern reanalysis.This can be caused by the coarser resolution of these datasets compared to the newer reanalysis products.It's also important to note than neither NCEP1, NCEP2, nor MERRA2 assimilate surface air temperature data from weather stations into their models, while ERAI and JRA55 do, which could in part explain their lower BIAS values.However, further study on the particular surface datasets assimilated by these models should be done to confirm this.
In terms of the winter wheat yield forecast model, the results show larger errors for Ukraine than for the U.S..This was also observed in [35] where it was suggested that it may be a result of a more accurate U.S. wheat mask.Additionally, in this study, where we included more years in the analysis, we found larger errors for Ukraine (25% to 15%) than those found by [35] (14% to 11%).This shows the need of keep improving yield forecast model.In this regard, a new version of the yield model that improves the original by calibrating it at subnational level or including other parameters that can respond better to stress conditions was recently published [39].
As for the influence of the T2M source on the yield forecast error magnitude, the results show that the use of surface air temperature datasets from different temperature sources does not have a big impact on the forecast error.Moreover, the variation in forecast error from using different reanalysis products decreases as it gets closer to the NDVI peak date.This suggests that discrepancies in the temperature between datasets get reduced as they are accumulated through the season.It is also important to note that the GDD is used in this model as indicator of the phenological stage of the vegetation.This parameter is not intended to account for any temperature stress event that may affect the final crop yields.For example, frost events can have a negative impact on wheat production if they occur during late vegetative and reproductive stages, when wheat is sensitive to stress from low temperatures.Increased heat periods on post-flowering stages can affect wheat productivities, by shortening the duration of the grain filling period and limits biomass growth [40], [41].In this regard, the T2M from reanalysis could still prove useful as an indicator of frost or increased heat periods by providing information on daily minimum and maximum values that can be included in the model on future studies.
Results from this work show that, at least over the U.S. and Ukraine, the selection of a T2M dataset for GDD estimation should depend on some requirements of the study: (1) If the study requires data over areas where the spatial variation of temperature is higher than what the coarse resolution provide, such as mountainous areas or areas close to water bodies, then it's better to select products which higher spatial resolution, or that assimilate weather station information into their models, such as ERAI and JRA55.
(2) In applications where the timeliness of the forecasts is critical, the latency of the different reanalysis datasets should also be considered.For ERAI and JRA55 data from each is made available two to three months after the month has ended, for MERRA2 this delay is of one month, and in the case of NCEP1 and NCEP2, the 6-hourly data for a day is made available three to five days later.This time difference in data availability plays a major role in the selection of a temperature source, particularly for a yield estimation model which provides forecasts between two to two and a half months prior to the harvest.
(3) Finally, in cases where a compromise between these two is needed, for example where both low error and timeliness is required, then MERRA2 can be a good middle ground that provides both high spatial resolution and low delay in availability.
Future work will focus on integrating information from minimum and maximum temperatures as indicators of possible stress conditions into the yield forecast model.

Figure 3 .
Figure 3. Location of Weather Stations for Evaluation in the U.S.

Figure 4 .
Figure 4. Location of Weather Stations for the U.S. and Ukraine

Figure 6 .
Figure 6.Determination Coefficient, Mean Difference, Difference Standard Deviation, and Root Mean Square Difference for the U.S.

Figure 6
Figure6shows the computed metrics for each considered station in the U.S. without outlier removal.For clarity of display, only the state divisions are shown in this figureAll reanalysis show the higher agreement with weather stations on the central region of the U.S..The worst results are shown by the NCEP1 and NCEP2 datasets, particularly a strong cool BIAS over the stations in the mid-west of the country where mountainous areas are more common.In these areas of complex terrain structure, the different original resolution of the reanalysis products could play a greater role on the differences in reported 2m air temperatures[21].Areas close to water also show poorer agreement between reanalysis and weather station data.This might be explained by the coarse spatial resolution of these products that can mix sea and land surface temperatures within the same pixel.

Figure 7 .
Figure 7. Determination Coefficient, Mean Difference, Difference Standard Deviation, and Root Mean Square Difference for Ukraine

Figure 12 .
Figure 12.Forecast Error for the U.S. (left) and Ukraine (Right) calibration, atmospheric correction, aerosol retrieval, and the generation of climate data record for terrestrial studies.SergiiSkakun received the M.S. (Hons.)degree in applied mathematics from the Physics and Technology Institute, NTUU "Kyiv Polytechnic Institute," Kyiv, Ukraine, in 2004, and the Ph.D. degree in computer science from the National Academy of Sciences of Ukraine, in 2005.He is currently an Associate Research Professor at Department of Geographical Sciences at University of Maryland, College Park, MD, USA, and Research Scientist at the Terrestrial Information Systems Laboratory at NASA Goddard Space Flight Center (GSFC), Greenbelt, MD, USA.His research interests are in advancing methods, models and emerging technologies in the area of data science for heterogeneous remote sensing data fusion, processing and analysis, and their applications to the areas of societal benefit.