ERA5 Wind Data for Irish Continental Shelf region
Contents
ERA5 Wind Data for Irish Continental Shelf region¶
Introduction¶
ERA5 is the fifth generation global reanalysis data set produced by the European Centre for Medium-Range Weather Forecasts (ECMWF). It replaces the ERA-Interim reanalysis (spanning 1979 onwards), and once completed, will provide global atmosphere, land surface and ocean wave data from 1950 onwards. ERA5 is a component of the Copernicus Climate Change Service (C3S), where data products are publicly available in the C3S Climate Data Store. Detailed documentation of the ERA5 data set may be found here, while examples of ERA5 usage are described in the following publications:
Olauson (2018) - ERA5: The new champion of wind power modelling?
Kalverla et al. (2020) - Quality of wind characteristics in recent wind atlases over the North Sea
This notebook provides details of:
ERA5 wind data products retrieval from the CDS.
The creation of the ERA5 Zarr wind store that is included in the EOOffshore catalog.
A brief look at this Zarr store, including a demonstration of wind speed calculation.
This ERA5 Zarr store has been uploaded to Zenodo.
How to cite:
O’Callaghan, D. and McBreen, S.: Scalable Offshore Wind Analysis With Pangeo, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2746, https://doi.org/10.5194/egusphere-egu22-2746, 2022.
O’Callaghan, D. and McBreen, S.: EOOffshore: ERA5 Wind Data for the Irish Continental Shelf Region, (1.0.0) [Data set], Zenodo, 2022.
Note: more extensive usage of the EOOffshore ERA5 Zarr store may be found in the following notebooks:
ERA5 Wind Data Products¶
The EOOffshore project uses the ERA5 hourly data on single levels from 1979 to present data set, which provides hourly data from 1979 to the present day, at single levels (atmospheric, ocean-wave and land surface quantities). The following data variables are relevant:
Variable |
Unit |
Height (metres above sea level) |
Description |
---|---|---|---|
|
\(m s^{-1}\) |
10 |
U (eastward) wind component |
|
\(m s^{-1}\) |
10 |
V (northward) wind component |
|
\(m s^{-1}\) |
100 |
U (eastward) wind component |
|
\(m s^{-1}\) |
100 |
V (northward) wind component |
|
\(m\) |
Surface |
Forecast surface roughness |
|
\(kg\) \(m^{-3}\) |
Surface |
Air density over the oceans |
|
Dimensionless |
n/a |
Land-sea mask |
Monthly products containing these variables, covering the Irish Continental Shelf (ICS) region coordinates, were retrieved using the CDS Python API:
Observation / Models |
Reanalysis |
Processing level |
Level-3 |
Data type |
Gridded (latitude/longitude) |
Horizontal coverage |
ICS bounding box [58, -25.9, 46, -4.9] |
Horizontal resolution |
0.25° × 0.25° |
Vertical coverage |
Single level |
Temporal coverage |
2001-01-01T00:00:00 to 2021-09-30T23:00:00 |
Temporal resolution |
Hourly |
Update frequency |
Daily (5 day latency) |
File format |
NetCDF-4 (converted from GRIB) |
Total retrieved products |
249 |
Total products size |
9.9G |
ERA5 Wind Zarr Store¶
The retrieved NetCDF products were loaded using xarray.open_mfdataset()
, combined by their grid coordinates and concatenated along the time
dimension. A preprocessor function computed the following new variables from those contained in the retrieved CDS products, using MetPy functions decorated with @dask.delayed
for lazy execution by Dask:
Variable |
Unit |
Height (metres above sea level) |
Description |
---|---|---|---|
|
\(m s^{-1}\) |
10, 100 |
Wind speed calculated from U and V wind components with |
|
degree |
10, 100 |
Wind direction calculated from U and V wind components with |
A height
coordinate dimension was also added for these new 10m and 100m variables. The data set was chunked in space (latitude
, longitude
dimensions), and persisted to a single, chunked, compressed Zarr store (16G), which is a cloud-optimised format suitable for multi-dimensional arrays. A time
chunk size was specified that resulted in a low number of time
chunks, as this approach is more suitable for subsequent processing of variables over time for Areas Of Interest (AOIs).
As requested by the ECMWF - Licence to Use Copernicus Products, this Zarr store was:
Generated using Copernicus Climate Change Service information [2001 - 2021]
ERA5 in EOOffshore Catalog¶
Open the catalog and view the ERA5 metadata¶
All EOOffshore data sets, including the ERA5 Zarr store described above, are accessible using the EOOffshore Intake catalog. Each catalog entry provides a description and metadata associated with the corresponding data set, defined in a YAML configuration file. The EOOffshore catalog configuration was originally influenced by the Pangeo Cloud Data Store atmosphere.yaml catalog configuration.
To view the ERA5 metadata:
from intake import open_catalog
catalog = open_catalog('data/intake-catalogs/eooffshore_ics.yaml')
catalog.eooffshore_ics_era5_single_level_hourly_wind
eooffshore_ics_era5_single_level_hourly_wind:
args:
storage_options: null
urlpath: /data/eo/zarr/cds/era5/eooffshore_ics_era5_single_level_hourly_wind.zarr
description: EOOffshore Project 2001 - 2021 Concatenated wind variable products
from Copernicus Climate Change Service data set "ERA5 hourly data on single levels
from 1979 to present", for Irish Continental Shelf. Wind speed and direction have
been calculated from the uX and vX variables. Generated using Copernicus Climate
Change Service information [2001 - 2021].
driver: intake_xarray.xzarr.ZarrSource
metadata:
catalog_dir: /opt/eooffshore/notebooks/datasets/data/intake-catalogs/
tags:
- atmosphere
- wind
- era5
- cds
- ocean
title: 2001 - 2021 Concatenated wind variable products from Copernicus Climate
Change Service data set 'ERA5 hourly data on single levels from 1979 to present',
for Irish Continental Shelf.
url: https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels
Load the catalog ERA5 Zarr store¶
Intake catalog entries typically specify a driver to be used when loading the corresponding data set. The ERA5 entry specifies intake_xarray.xzarr.ZarrSource
, a driver implementation provided by the intake-xarray library. This enables NetCDF and Zarr data sets to be loaded using xarray, a library for processing N-D labeled arrays and datasets. As xarray labels take the form of dimensions, coordinates and attributes on top of NumPy-like arrays, it is particularly suited to data sets such as ERA5 whose variables feature latitude/longitude grid coordinates.
This intake driver will load the associated dataset into an xarray.Dataset
. To enable support for potentially large data sets, the to_dask()
function is used to load the underlying variable arrays with Dask, a parallel, out-of-core computing library. The ZarrSource
implementation will load the data set variables into Dask arrays, which will be loaded and processed in parallel as chunks during subsequent computation. As discussed above, variable chunk sizes may be specified during Zarr store creation.
Here is the ERA5 store loaded into an xarray.Dataset
:
All variables have associated coordinate dimensions:
time
- hourlylatitude
andlongitude
- the corresponding coordinate grid
The
wind_speed
andwind_direction
variables have aheight
coordinate dimension, reflecting the 10m and 100m (above sea level) variables in the products retrieved from the CDS.A low number of
time
chunks have been specified, to support subsequent computation across time for smaller AOI grid coordinates.
ds = catalog.eooffshore_ics_era5_single_level_hourly_wind.to_dask()
ds
<xarray.Dataset> Dimensions: (time: 181872, latitude: 49, longitude: 85, height: 2) Coordinates: * height (height) int64 10 100 * latitude (latitude) float32 58.0 57.75 57.5 57.25 ... 46.5 46.25 46.0 * longitude (longitude) float32 -25.9 -25.65 -25.4 ... -5.4 -5.15 -4.9 * time (time) datetime64[ns] 2001-01-01 ... 2021-09-30T23:00:00 Data variables: fsr (time, latitude, longitude) float32 dask.array<chunksize=(50000, 25, 25), meta=np.ndarray> lsm (time, latitude, longitude) float32 dask.array<chunksize=(50000, 25, 25), meta=np.ndarray> p140209 (time, latitude, longitude) float32 dask.array<chunksize=(50000, 25, 25), meta=np.ndarray> u10 (time, latitude, longitude) float32 dask.array<chunksize=(50000, 25, 25), meta=np.ndarray> u100 (time, latitude, longitude) float32 dask.array<chunksize=(50000, 25, 25), meta=np.ndarray> v10 (time, latitude, longitude) float32 dask.array<chunksize=(50000, 25, 25), meta=np.ndarray> v100 (time, latitude, longitude) float32 dask.array<chunksize=(50000, 25, 25), meta=np.ndarray> wind_direction (height, time, latitude, longitude) float32 dask.array<chunksize=(1, 50000, 25, 25), meta=np.ndarray> wind_speed (height, time, latitude, longitude) float32 dask.array<chunksize=(1, 50000, 25, 25), meta=np.ndarray> Attributes: Conventions: CF-1.6 eooffshore_zarr_creation_time: 2022-05-13T11:50:24Z eooffshore_zarr_details: EOOffshore Project: Concatenated wind var... history: 2021-10-15 20:08:53 GMT by grib_to_netcdf...
ERA5 wind speed (2001 - 2021)¶
Each variable in the ERA5 data set, for example, wind speed, is loaded into an xarray.DataArray
:
ds.wind_speed
<xarray.DataArray 'wind_speed' (height: 2, time: 181872, latitude: 49, longitude: 85)> dask.array<open_dataset-42bb8165ea7728afdc99a3feb981cce5wind_speed, shape=(2, 181872, 49, 85), dtype=float32, chunksize=(1, 50000, 25, 25), chunktype=numpy.ndarray> Coordinates: * height (height) int64 10 100 * latitude (latitude) float32 58.0 57.75 57.5 57.25 ... 46.5 46.25 46.0 * longitude (longitude) float32 -25.9 -25.65 -25.4 -25.15 ... -5.4 -5.15 -4.9 * time (time) datetime64[ns] 2001-01-01 ... 2021-09-30T23:00:00 Attributes: long_name: Wind speed units: m s**-1
Calculate mean wind speed over time
dimension for all heights at AOI grid coordinates¶
Using Dask, the data set loading process is lazy, where no data is loaded inititally. Instead, data loading is delayed until execution time, where variables will be loaded and processed in parallel according to the corresponding chunks specification. Dask arrays implement a subset of the NumPy ndarray
interface using blocked algorithms, and the original variable arrays will be split into smaller chunk arrays, enabling computation on arrays larger than memory using all available cores. The blocked algorithms are coordinated using Dask graphs.
To perform some analysis at known AOI latitude/longitude coordinates, the xarray.DataArray.sel(..., method='nearest')
function may be used to select a subset of the data array (or data set) at coordinates nearest to the specified parameters. Here, mean wind speed over the time
dimension is determined for the specified coordinates, where Dask graph execution is triggered by calling compute()
. The resulting variable values will be contained in a NumPy ndarray
.
Graph execution is managed by a task scheduler. The default scheduler (used for executing this notebook) executes computations with local threads. However, execution may also be performed on a distributed cluster without any change to the xarray
code used here.
ds.wind_speed.sel(longitude=-5.4302, latitude=53.4836, method='nearest').mean(dim='time').compute()
<xarray.DataArray 'wind_speed' (height: 2)> array([7.797151, 9.507461], dtype=float32) Coordinates: * height (height) int64 10 100 latitude float32 53.5 longitude float32 -5.4