_images/eooffshore_banner.png 

_images/seai.png _images/ucd.png

ERA5 Wind Data for Irish Continental Shelf region

Introduction

ERA5 is the fifth generation global reanalysis data set produced by the European Centre for Medium-Range Weather Forecasts (ECMWF). It replaces the ERA-Interim reanalysis (spanning 1979 onwards), and once completed, will provide global atmosphere, land surface and ocean wave data from 1950 onwards. ERA5 is a component of the Copernicus Climate Change Service (C3S), where data products are publicly available in the C3S Climate Data Store. Detailed documentation of the ERA5 data set may be found here, while examples of ERA5 usage are described in the following publications:

This notebook provides details of:

  1. ERA5 wind data products retrieval from the CDS.

  2. The creation of the ERA5 Zarr wind store that is included in the EOOffshore catalog.

  3. A brief look at this Zarr store, including a demonstration of wind speed calculation.

This ERA5 Zarr store has been uploaded to Zenodo.

How to cite:

  1. O’Callaghan, D. and McBreen, S.: Scalable Offshore Wind Analysis With Pangeo, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2746, https://doi.org/10.5194/egusphere-egu22-2746, 2022.

  2. O’Callaghan, D. and McBreen, S.: EOOffshore: ERA5 Wind Data for the Irish Continental Shelf Region, (1.0.0) [Data set], Zenodo, 2022. https://doi.org/10.5281/zenodo.6974217

Note: more extensive usage of the EOOffshore ERA5 Zarr store may be found in the following notebooks:


ERA5 Wind Data Products

The EOOffshore project uses the ERA5 hourly data on single levels from 1979 to present data set, which provides hourly data from 1979 to the present day, at single levels (atmospheric, ocean-wave and land surface quantities). The following data variables are relevant:

Variable

Unit

Height (metres above sea level)

Description

u10

\(m s^{-1}\)

10

U (eastward) wind component

v10

\(m s^{-1}\)

10

V (northward) wind component

u100

\(m s^{-1}\)

100

U (eastward) wind component

v100

\(m s^{-1}\)

100

V (northward) wind component

fsr

\(m\)

Surface

Forecast surface roughness

p140209

\(kg\) \(m^{-3}\)

Surface

Air density over the oceans

lsm

Dimensionless

n/a

Land-sea mask

Monthly products containing these variables, covering the Irish Continental Shelf (ICS) region coordinates, were retrieved using the CDS Python API:

Observation / Models

Reanalysis

Processing level

Level-3

Data type

Gridded (latitude/longitude)

Horizontal coverage

ICS bounding box [58, -25.9, 46, -4.9]

Horizontal resolution

0.25° × 0.25°

Vertical coverage

Single level

Temporal coverage

2001-01-01T00:00:00 to 2021-09-30T23:00:00

Temporal resolution

Hourly

Update frequency

Daily (5 day latency)

File format

NetCDF-4 (converted from GRIB)

Total retrieved products

249

Total products size

9.9G


ERA5 Wind Zarr Store

The retrieved NetCDF products were loaded using xarray.open_mfdataset(), combined by their grid coordinates and concatenated along the time dimension. A preprocessor function computed the following new variables from those contained in the retrieved CDS products, using MetPy functions decorated with @dask.delayed for lazy execution by Dask:

Variable

Unit

Height (metres above sea level)

Description

wind_speed

\(m s^{-1}\)

10, 100

Wind speed calculated from U and V wind components with metpy.calc.wind_speed()

wind_direction

degree

10, 100

Wind direction calculated from U and V wind components with metpy.calc.wind_direction()

A height coordinate dimension was also added for these new 10m and 100m variables. The data set was chunked in space (latitude, longitude dimensions), and persisted to a single, chunked, compressed Zarr store (16G), which is a cloud-optimised format suitable for multi-dimensional arrays. A time chunk size was specified that resulted in a low number of time chunks, as this approach is more suitable for subsequent processing of variables over time for Areas Of Interest (AOIs).

As requested by the ECMWF - Licence to Use Copernicus Products, this Zarr store was:

  • Generated using Copernicus Climate Change Service information [2001 - 2021]


ERA5 in EOOffshore Catalog

Open the catalog and view the ERA5 metadata

All EOOffshore data sets, including the ERA5 Zarr store described above, are accessible using the EOOffshore Intake catalog. Each catalog entry provides a description and metadata associated with the corresponding data set, defined in a YAML configuration file. The EOOffshore catalog configuration was originally influenced by the Pangeo Cloud Data Store atmosphere.yaml catalog configuration.

To view the ERA5 metadata:

from intake import open_catalog

catalog = open_catalog('data/intake-catalogs/eooffshore_ics.yaml')

catalog.eooffshore_ics_era5_single_level_hourly_wind
eooffshore_ics_era5_single_level_hourly_wind:
  args:
    storage_options: null
    urlpath: /data/eo/zarr/cds/era5/eooffshore_ics_era5_single_level_hourly_wind.zarr
  description: EOOffshore Project 2001 - 2021 Concatenated wind variable products
    from Copernicus Climate Change Service data set "ERA5 hourly data on single levels
    from 1979 to present", for Irish Continental Shelf. Wind speed and direction have
    been calculated from the uX and vX variables. Generated using Copernicus Climate
    Change Service information [2001 - 2021].
  driver: intake_xarray.xzarr.ZarrSource
  metadata:
    catalog_dir: /opt/eooffshore/notebooks/datasets/data/intake-catalogs/
    tags:
    - atmosphere
    - wind
    - era5
    - cds
    - ocean
    title: 2001 - 2021 Concatenated wind variable products from Copernicus Climate
      Change Service data set 'ERA5 hourly data on single levels from 1979 to present',
      for Irish Continental Shelf.
    url: https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels

Load the catalog ERA5 Zarr store

Intake catalog entries typically specify a driver to be used when loading the corresponding data set. The ERA5 entry specifies intake_xarray.xzarr.ZarrSource, a driver implementation provided by the intake-xarray library. This enables NetCDF and Zarr data sets to be loaded using xarray, a library for processing N-D labeled arrays and datasets. As xarray labels take the form of dimensions, coordinates and attributes on top of NumPy-like arrays, it is particularly suited to data sets such as ERA5 whose variables feature latitude/longitude grid coordinates.

This intake driver will load the associated dataset into an xarray.Dataset. To enable support for potentially large data sets, the to_dask() function is used to load the underlying variable arrays with Dask, a parallel, out-of-core computing library. The ZarrSource implementation will load the data set variables into Dask arrays, which will be loaded and processed in parallel as chunks during subsequent computation. As discussed above, variable chunk sizes may be specified during Zarr store creation.

Here is the ERA5 store loaded into an xarray.Dataset:

  • All variables have associated coordinate dimensions:

    • time - hourly

    • latitude and longitude - the corresponding coordinate grid

  • The wind_speed and wind_direction variables have a height coordinate dimension, reflecting the 10m and 100m (above sea level) variables in the products retrieved from the CDS.

  • A low number of time chunks have been specified, to support subsequent computation across time for smaller AOI grid coordinates.

ds = catalog.eooffshore_ics_era5_single_level_hourly_wind.to_dask()
ds
<xarray.Dataset>
Dimensions:         (time: 181872, latitude: 49, longitude: 85, height: 2)
Coordinates:
  * height          (height) int64 10 100
  * latitude        (latitude) float32 58.0 57.75 57.5 57.25 ... 46.5 46.25 46.0
  * longitude       (longitude) float32 -25.9 -25.65 -25.4 ... -5.4 -5.15 -4.9
  * time            (time) datetime64[ns] 2001-01-01 ... 2021-09-30T23:00:00
Data variables:
    fsr             (time, latitude, longitude) float32 dask.array<chunksize=(50000, 25, 25), meta=np.ndarray>
    lsm             (time, latitude, longitude) float32 dask.array<chunksize=(50000, 25, 25), meta=np.ndarray>
    p140209         (time, latitude, longitude) float32 dask.array<chunksize=(50000, 25, 25), meta=np.ndarray>
    u10             (time, latitude, longitude) float32 dask.array<chunksize=(50000, 25, 25), meta=np.ndarray>
    u100            (time, latitude, longitude) float32 dask.array<chunksize=(50000, 25, 25), meta=np.ndarray>
    v10             (time, latitude, longitude) float32 dask.array<chunksize=(50000, 25, 25), meta=np.ndarray>
    v100            (time, latitude, longitude) float32 dask.array<chunksize=(50000, 25, 25), meta=np.ndarray>
    wind_direction  (height, time, latitude, longitude) float32 dask.array<chunksize=(1, 50000, 25, 25), meta=np.ndarray>
    wind_speed      (height, time, latitude, longitude) float32 dask.array<chunksize=(1, 50000, 25, 25), meta=np.ndarray>
Attributes:
    Conventions:                    CF-1.6
    eooffshore_zarr_creation_time:  2022-05-13T11:50:24Z
    eooffshore_zarr_details:        EOOffshore Project: Concatenated wind var...
    history:                        2021-10-15 20:08:53 GMT by grib_to_netcdf...

ERA5 wind speed (2001 - 2021)

Each variable in the ERA5 data set, for example, wind speed, is loaded into an xarray.DataArray:

ds.wind_speed
<xarray.DataArray 'wind_speed' (height: 2, time: 181872, latitude: 49,
                                longitude: 85)>
dask.array<open_dataset-42bb8165ea7728afdc99a3feb981cce5wind_speed, shape=(2, 181872, 49, 85), dtype=float32, chunksize=(1, 50000, 25, 25), chunktype=numpy.ndarray>
Coordinates:
  * height     (height) int64 10 100
  * latitude   (latitude) float32 58.0 57.75 57.5 57.25 ... 46.5 46.25 46.0
  * longitude  (longitude) float32 -25.9 -25.65 -25.4 -25.15 ... -5.4 -5.15 -4.9
  * time       (time) datetime64[ns] 2001-01-01 ... 2021-09-30T23:00:00
Attributes:
    long_name:  Wind speed
    units:      m s**-1

Calculate mean wind speed over time dimension for all heights at AOI grid coordinates

Using Dask, the data set loading process is lazy, where no data is loaded inititally. Instead, data loading is delayed until execution time, where variables will be loaded and processed in parallel according to the corresponding chunks specification. Dask arrays implement a subset of the NumPy ndarray interface using blocked algorithms, and the original variable arrays will be split into smaller chunk arrays, enabling computation on arrays larger than memory using all available cores. The blocked algorithms are coordinated using Dask graphs.

To perform some analysis at known AOI latitude/longitude coordinates, the xarray.DataArray.sel(..., method='nearest') function may be used to select a subset of the data array (or data set) at coordinates nearest to the specified parameters. Here, mean wind speed over the time dimension is determined for the specified coordinates, where Dask graph execution is triggered by calling compute(). The resulting variable values will be contained in a NumPy ndarray.

Graph execution is managed by a task scheduler. The default scheduler (used for executing this notebook) executes computations with local threads. However, execution may also be performed on a distributed cluster without any change to the xarray code used here.

ds.wind_speed.sel(longitude=-5.4302, latitude=53.4836, method='nearest').mean(dim='time').compute()
<xarray.DataArray 'wind_speed' (height: 2)>
array([7.797151, 9.507461], dtype=float32)
Coordinates:
  * height     (height) int64 10 100
    latitude   float32 53.5
    longitude  float32 -5.4