<img src="images/logo/eooffshore_banner.png" width="48%" />&nbsp;

<img src="images/logo/seai.png" width="25%" /> <span /> <img src="images/logo/ucd.png" width="7%" />

# ERA5 Wind Data for Irish Continental Shelf region

## Introduction

[ERA5](https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5) is the fifth generation global reanalysis data set produced by the [European Centre for Medium-Range Weather Forecasts (ECMWF)](https://www.ecmwf.int/). It replaces the ERA-Interim reanalysis (spanning 1979 onwards), and once completed, will provide global atmosphere, land surface and ocean wave data from 1950 onwards. ERA5 is a component of the [Copernicus Climate Change Service (C3S)](https://climate.copernicus.eu/), where data products are publicly available in the [C3S Climate Data Store](https://cds.climate.copernicus.eu/cdsapp#!/home). Detailed documentation of the ERA5 data set may be found [here](https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation), while examples of ERA5 usage are described in the following publications:

* [Hersbach et al. (2020) - The ERA5 global reanalysis](https://doi.org/10.1002/qj.3803)
* [Olauson (2018) - ERA5: The new champion of wind power modelling?](https://doi.org/10.1016/j.renene.2018.03.056)
* [Hahmann et al. (2020) - Future wind energy resources in the North Sea as predicted by CMIP6 models (EGU General Assembly 2020)](https://doi.org/10.5194/egusphere-egu2020-9093)
* [Hasager et al. (2020) - Europe’s offshore winds assessed with synthetic aperture radar, ASCAT and WRF over France](https://doi.org/10.5194/wes-5-375-2020)
* [Jourdier (2020) - Evaluation of ERA5, MERRA-2, COSMO-REA6, NEWA and AROME to simulate wind power production over France](https://doi.org/10.5194/asr-17-63-2020)
* [Kalverla et al. (2020) - Quality of wind characteristics in recent wind atlases over the North Sea](https://doi.org/10.1002/qj.3748)
* [Schelbergen et al. (2020) - Clustering wind profile shapes to estimate airborne wind energy production](https://doi.org/10.5194/wes-5-1097-2020)

This notebook provides details of:
1. ERA5 wind data products retrieval from the CDS.
1. The creation of the ERA5 Zarr wind store that is included in the EOOffshore catalog.
1. A brief look at this Zarr store, including a demonstration of wind speed calculation.

This ERA5 Zarr store has been uploaded to [Zenodo](https://zenodo.org/record/6974217).

**How to cite:** 
1. O'Callaghan, D. and McBreen, S.: Scalable Offshore Wind Analysis With Pangeo, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2746, [https://doi.org/10.5194/egusphere-egu22-2746](https://doi.org/10.5194/egusphere-egu22-2746), 2022.
1. [O'Callaghan, D. and McBreen, S.: EOOffshore: ERA5 Wind Data for the Irish Continental Shelf Region,  (1.0.0) [Data set], Zenodo, 2022.](https://zenodo.org/record/6974217) [![https://doi.org/10.5281/zenodo.6974217](https://zenodo.org/badge/DOI/10.5281/zenodo.6974217.svg)](https://doi.org/10.5281/zenodo.6974217)

**Note: more extensive usage of the EOOffshore ERA5 Zarr store may be found in the following notebooks:**
* [Offshore Wind in Irish Areas Of Interest](./Offshore_Wind_AOI.ipynb)
* [Comparison of Offshore Wind Speed Extrapolation and Power Density Estimation](./Comparison_Wind_Power.ipynb)

--------------------------------------

## ERA5 Wind Data Products

The EOOffshore project uses the [*ERA5 hourly data on single levels from 1979 to present*](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=overview) data set, which provides hourly data from 1979 to the present day, at single levels (atmospheric, ocean-wave and land surface quantities). The following data variables are relevant:

| Variable | Unit | Height (metres above sea level) | Description |
| ----------- | ----------- | ----------- | ----------- |
| `u10` | $m s^{-1}$ | 10 | U (eastward) wind component |
| `v10` | $m s^{-1}$ | 10 | V (northward) wind component |
| `u100` | $m s^{-1}$ | 100 | U (eastward) wind component |
| `v100` | $m s^{-1}$ | 100 | V (northward) wind component |
| `fsr` | $m$ | Surface | Forecast surface roughness |
| `p140209` |  $kg$ $m^{-3}$ | Surface | Air density over the oceans |
| `lsm` | Dimensionless | n/a | Land-sea mask |

Monthly products containing these variables, covering the [Irish Continental Shelf (ICS)](https://www.marine.ie/Home/site-area/irelands-marine-resource/real-map-ireland) region coordinates, were retrieved using the [CDS Python API](https://cds.climate.copernicus.eu/api-how-to):

|       |  |
| ----------- | ----------- |
| **Observation / Models** | Reanalysis |
| **Processing level** | Level-3 |
| **Data type** | Gridded (latitude/longitude) |
| **Horizontal coverage** | ICS bounding box [58, -25.9, 46, -4.9] |
| **Horizontal resolution** | 0.25° × 0.25° |
| **Vertical coverage** | Single level |
| **Temporal coverage** | 2001-01-01T00:00:00 to 2021-09-30T23:00:00 |
| **Temporal resolution** | Hourly |
| **Update frequency** | Daily (5 day latency) |
| **File format** | NetCDF-4 (converted from GRIB) |
| **Total retrieved products** | 249 |
| **Total products size** | 9.9G |

-----------------------------------------------------

## ERA5 Wind Zarr Store

The retrieved NetCDF products were loaded using [`xarray.open_mfdataset()`](https://xarray.pydata.org/en/stable/generated/xarray.open_mfdataset.html), combined by their grid coordinates and concatenated along the `time` dimension. A preprocessor function computed the following new variables from those contained in the retrieved CDS products, using [MetPy](https://unidata.github.io/MetPy/latest/index.html) functions decorated with [`@dask.delayed`](https://docs.dask.org/en/stable/delayed.html#decorator) for lazy execution by [Dask](https://docs.dask.org/en/stable/):

| Variable | Unit | Height (metres above sea level) | Description |
| ----------- | ----------- | ----------- | ----------- |
| `wind_speed` | $m s^{-1}$ | 10, 100 | Wind speed calculated from U and V wind components with [`metpy.calc.wind_speed()`](https://unidata.github.io/MetPy/latest/api/generated/metpy.calc.wind_speed.html) |
| `wind_direction` | degree | 10, 100 | Wind direction calculated from U and V wind components with [`metpy.calc.wind_direction()`](https://unidata.github.io/MetPy/latest/api/generated/metpy.calc.wind_direction.html) |

A `height` coordinate dimension was also added for these new 10m and 100m variables. The data set was chunked in space (`latitude`, `longitude` dimensions), and persisted to a single, chunked, compressed [Zarr](https://zarr.readthedocs.io/en/stable/) store (16G), which is a cloud-optimised format suitable for multi-dimensional arrays. A `time` chunk size was specified that resulted in a low number of `time` chunks, as this approach is more suitable for subsequent processing of variables over time for Areas Of Interest (AOIs).

As requested by the [ECMWF - Licence to Use Copernicus Products](https://apps.ecmwf.int/datasets/licences/copernicus/), this Zarr store was:
* Generated using Copernicus Climate Change Service information [2001 - 2021]

-----------------------------------------------
## ERA5 in EOOffshore Catalog

### Open the catalog and view the ERA5 metadata

All EOOffshore data sets, including the ERA5 Zarr store described above, are accessible using the EOOffshore [Intake](https://intake.readthedocs.io/en/latest/) catalog. Each [catalog](https://intake.readthedocs.io/en/latest/catalog.html) entry provides a description and metadata associated with the corresponding data set, defined in a [YAML configuration file](https://intake.readthedocs.io/en/latest/catalog.html#yaml-format). The EOOffshore catalog configuration was originally influenced by the [Pangeo Cloud Data Store atmosphere.yaml catalog configuration](https://github.com/pangeo-data/pangeo-datastore/blob/master/intake-catalogs/atmosphere.yaml). 

To view the ERA5 metadata:

In [1]:
from intake import open_catalog

catalog = open_catalog('data/intake-catalogs/eooffshore_ics.yaml')

catalog.eooffshore_ics_era5_single_level_hourly_wind

eooffshore_ics_era5_single_level_hourly_wind:
  args:
    storage_options: null
    urlpath: /data/eo/zarr/cds/era5/eooffshore_ics_era5_single_level_hourly_wind.zarr
  description: EOOffshore Project 2001 - 2021 Concatenated wind variable products
    from Copernicus Climate Change Service data set "ERA5 hourly data on single levels
    from 1979 to present", for Irish Continental Shelf. Wind speed and direction have
    been calculated from the uX and vX variables. Generated using Copernicus Climate
    Change Service information [2001 - 2021].
  driver: intake_xarray.xzarr.ZarrSource
  metadata:
    catalog_dir: /opt/eooffshore/notebooks/datasets/data/intake-catalogs/
    tags:
    - atmosphere
    - wind
    - era5
    - cds
    - ocean
    title: 2001 - 2021 Concatenated wind variable products from Copernicus Climate
      Change Service data set 'ERA5 hourly data on single levels from 1979 to present',
      for Irish Continental Shelf.
    url: https://cds.climate.copernicus.eu/c

----------------------------------------------------------------
### Load the catalog ERA5 Zarr store

Intake catalog entries typically specify a [driver](https://intake.readthedocs.io/en/latest/catalog.html#driver-selection) to be used when loading the corresponding data set. The ERA5 entry specifies [`intake_xarray.xzarr.ZarrSource`](https://intake-xarray.readthedocs.io/en/latest/api.html#intake_xarray.xzarr.ZarrSource), a driver implementation provided by the [intake-xarray](https://intake-xarray.readthedocs.io/) library. This enables NetCDF and Zarr data sets to be loaded using [xarray](https://docs.xarray.dev/en/stable/index.html), a library for processing N-D labeled arrays and datasets. As xarray labels take the form of dimensions, coordinates and attributes on top of [NumPy](https://numpy.org/)-like arrays, it is particularly suited to data sets such as ERA5 whose variables feature latitude/longitude grid coordinates.

This intake driver will load the associated dataset into an [`xarray.Dataset`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html). To enable support for potentially large data sets, the [`to_dask()`](https://intake.readthedocs.io/en/latest/quickstart.html?#working-with-dask) function is used to load the underlying variable arrays with [Dask](https://docs.dask.org/en/latest/), a parallel, out-of-core computing library. The [`ZarrSource`](https://intake-xarray.readthedocs.io/en/latest/api.html#intake_xarray.xzarr.ZarrSource) implementation will load the data set variables into [Dask arrays](https://docs.dask.org/en/latest/array.html), which will be loaded and processed in parallel as [chunks](https://docs.dask.org/en/latest/array.html) during subsequent computation. As discussed above, variable chunk sizes may be specified during Zarr store creation.

Here is the ERA5 store loaded into an [`xarray.Dataset`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html):

* All variables have associated coordinate dimensions:
  * `time` - hourly
  * `latitude` and `longitude` - the corresponding coordinate grid
* The `wind_speed` and `wind_direction` variables have a `height` coordinate dimension, reflecting the 10m and 100m (above sea level) variables in the products retrieved from the CDS.
* A low number of `time` chunks have been specified, to support subsequent computation across time for smaller AOI grid coordinates.

In [2]:
ds = catalog.eooffshore_ics_era5_single_level_hourly_wind.to_dask()
ds

Unnamed: 0,Array,Chunk
Bytes,2.82 GiB,119.21 MiB
Shape,"(181872, 49, 85)","(50000, 25, 25)"
Count,33 Tasks,32 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.82 GiB 119.21 MiB Shape (181872, 49, 85) (50000, 25, 25) Count 33 Tasks 32 Chunks Type float32 numpy.ndarray",85  49  181872,

Unnamed: 0,Array,Chunk
Bytes,2.82 GiB,119.21 MiB
Shape,"(181872, 49, 85)","(50000, 25, 25)"
Count,33 Tasks,32 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.82 GiB,119.21 MiB
Shape,"(181872, 49, 85)","(50000, 25, 25)"
Count,33 Tasks,32 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.82 GiB 119.21 MiB Shape (181872, 49, 85) (50000, 25, 25) Count 33 Tasks 32 Chunks Type float32 numpy.ndarray",85  49  181872,

Unnamed: 0,Array,Chunk
Bytes,2.82 GiB,119.21 MiB
Shape,"(181872, 49, 85)","(50000, 25, 25)"
Count,33 Tasks,32 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.82 GiB,119.21 MiB
Shape,"(181872, 49, 85)","(50000, 25, 25)"
Count,33 Tasks,32 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.82 GiB 119.21 MiB Shape (181872, 49, 85) (50000, 25, 25) Count 33 Tasks 32 Chunks Type float32 numpy.ndarray",85  49  181872,

Unnamed: 0,Array,Chunk
Bytes,2.82 GiB,119.21 MiB
Shape,"(181872, 49, 85)","(50000, 25, 25)"
Count,33 Tasks,32 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.82 GiB,119.21 MiB
Shape,"(181872, 49, 85)","(50000, 25, 25)"
Count,33 Tasks,32 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.82 GiB 119.21 MiB Shape (181872, 49, 85) (50000, 25, 25) Count 33 Tasks 32 Chunks Type float32 numpy.ndarray",85  49  181872,

Unnamed: 0,Array,Chunk
Bytes,2.82 GiB,119.21 MiB
Shape,"(181872, 49, 85)","(50000, 25, 25)"
Count,33 Tasks,32 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.82 GiB,119.21 MiB
Shape,"(181872, 49, 85)","(50000, 25, 25)"
Count,33 Tasks,32 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.82 GiB 119.21 MiB Shape (181872, 49, 85) (50000, 25, 25) Count 33 Tasks 32 Chunks Type float32 numpy.ndarray",85  49  181872,

Unnamed: 0,Array,Chunk
Bytes,2.82 GiB,119.21 MiB
Shape,"(181872, 49, 85)","(50000, 25, 25)"
Count,33 Tasks,32 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.82 GiB,119.21 MiB
Shape,"(181872, 49, 85)","(50000, 25, 25)"
Count,33 Tasks,32 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.82 GiB 119.21 MiB Shape (181872, 49, 85) (50000, 25, 25) Count 33 Tasks 32 Chunks Type float32 numpy.ndarray",85  49  181872,

Unnamed: 0,Array,Chunk
Bytes,2.82 GiB,119.21 MiB
Shape,"(181872, 49, 85)","(50000, 25, 25)"
Count,33 Tasks,32 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.82 GiB,119.21 MiB
Shape,"(181872, 49, 85)","(50000, 25, 25)"
Count,33 Tasks,32 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.82 GiB 119.21 MiB Shape (181872, 49, 85) (50000, 25, 25) Count 33 Tasks 32 Chunks Type float32 numpy.ndarray",85  49  181872,

Unnamed: 0,Array,Chunk
Bytes,2.82 GiB,119.21 MiB
Shape,"(181872, 49, 85)","(50000, 25, 25)"
Count,33 Tasks,32 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.64 GiB,119.21 MiB
Shape,"(2, 181872, 49, 85)","(1, 50000, 25, 25)"
Count,65 Tasks,64 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 5.64 GiB 119.21 MiB Shape (2, 181872, 49, 85) (1, 50000, 25, 25) Count 65 Tasks 64 Chunks Type float32 numpy.ndarray",2  1  85  49  181872,

Unnamed: 0,Array,Chunk
Bytes,5.64 GiB,119.21 MiB
Shape,"(2, 181872, 49, 85)","(1, 50000, 25, 25)"
Count,65 Tasks,64 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.64 GiB,119.21 MiB
Shape,"(2, 181872, 49, 85)","(1, 50000, 25, 25)"
Count,65 Tasks,64 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 5.64 GiB 119.21 MiB Shape (2, 181872, 49, 85) (1, 50000, 25, 25) Count 65 Tasks 64 Chunks Type float32 numpy.ndarray",2  1  85  49  181872,

Unnamed: 0,Array,Chunk
Bytes,5.64 GiB,119.21 MiB
Shape,"(2, 181872, 49, 85)","(1, 50000, 25, 25)"
Count,65 Tasks,64 Chunks
Type,float32,numpy.ndarray


----------------------------------------------------------------
### ERA5 wind speed (2001 - 2021)

Each variable in the ERA5 data set, for example, wind speed, is loaded into an [`xarray.DataArray`](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html):

In [3]:
ds.wind_speed

Unnamed: 0,Array,Chunk
Bytes,5.64 GiB,119.21 MiB
Shape,"(2, 181872, 49, 85)","(1, 50000, 25, 25)"
Count,65 Tasks,64 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 5.64 GiB 119.21 MiB Shape (2, 181872, 49, 85) (1, 50000, 25, 25) Count 65 Tasks 64 Chunks Type float32 numpy.ndarray",2  1  85  49  181872,

Unnamed: 0,Array,Chunk
Bytes,5.64 GiB,119.21 MiB
Shape,"(2, 181872, 49, 85)","(1, 50000, 25, 25)"
Count,65 Tasks,64 Chunks
Type,float32,numpy.ndarray


#### Calculate mean wind speed over `time` dimension for all heights at AOI grid coordinates

Using Dask, the data set loading process is lazy, where no data is loaded inititally. Instead, data loading is [delayed until execution time, where variables will be loaded and processed in parallel according to the corresponding chunks specification](https://tutorial.dask.org/01x_lazy.html). Dask arrays implement a subset of the NumPy [`ndarray`](https://numpy.org/doc/stable/reference/arrays.ndarray.html) interface using blocked algorithms, and the original variable arrays will be split into smaller chunk arrays, enabling computation on arrays larger than memory using all available cores. The blocked algorithms are coordinated using [Dask graphs](https://docs.dask.org/en/stable/graphs.html).

To perform some analysis at known AOI latitude/longitude coordinates, the [`xarray.DataArray.sel(..., method='nearest')`](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.sel.html) function may be used to select a subset of the data array (or data set) at coordinates nearest to the specified parameters. Here, mean wind speed over the `time` dimension is determined for the specified coordinates, where Dask graph execution is triggered by calling [`compute()`](https://docs.dask.org/en/stable/api.html#dask.compute).  The resulting variable values will be contained in a NumPy `ndarray`.

Graph execution is managed by a [task scheduler](https://docs.dask.org/en/stable/scheduling.html). The default scheduler (used for executing this notebook) executes computations with [local threads](https://docs.dask.org/en/stable/scheduling.html#local-threads). However, execution may also be performed on a [distributed cluster](https://docs.dask.org/en/stable/scheduling.html#dask-distributed-local) **without any change to the `xarray` code used here**.

In [4]:
ds.wind_speed.sel(longitude=-5.4302, latitude=53.4836, method='nearest').mean(dim='time').compute()