Comparison of Offshore Wind Speed Extrapolation and Power Density Estimation

Comparison of Offshore Wind Speed Extrapolation and Power Density Estimation¶

Introduction¶

This notebook demonstrates the utility of the Pangeo software ecosystem in the comparison of methods for offshore wind speed extrapolation and power density estimation, increasing wind measurement coverage of offshore renewable energy assessment areas in the Irish Continental Shelf (ICS) region. It uses the EOOffshore wind data catalog created for this region, consisting of a collection of analysis-ready, cloud-optimized (ARCO) datasets featuring up to 21 years of available in situ, reanalysis, and satellite observation wind data products. Scalable processing and visualization of this ARCO catalog is demonstrated with analysis of provided data variables and derivation of new variables as required for AOIs, avoiding redundant storage and processing requirements for areas not under assessment. The notebook findings are used for AOI assessment described in the accompanying Offshore Wind in Irish Areas Of Interest notebook.

Data	Provider	Time	# Products	Products Size (GB)	ARCO Zarr (GB)
ASCAT Near Real Time L3 Sea Winds	Copernicus Marine Service	2016-01 to 2021-09	324	16	11
ASCAT Reprocessed L3 Sea Winds	Copernicus Marine Service	2007-01 to 2021-07	412	21	14
CCMP Wind V2.1 NRT	Remote Sensing Systems	2015-01 to 2021-09	2,436	109	0.5
ERA5 Hourly Single Level	Copernicus Climate Change Service	2001-01 to 2021-09	249	10	16
Met Éireann Re-Analysis (MÉRA)	Met Éireann	2001-01 to 2016-12	1,920	226	196
Sentinel-1 L2 OCN	Copernicus Open Access Hub; Alaska Satellite Facility	2015-06 to 2021-09	17,698	241	12
Irish Weather Buoy Network	Marine Institute	2001-05 to 2021-09	1	0.08	n/a

The following sections describe:

EOOffshore catalog data sets loading. Catalog creation is described in the following notebooks:
Comparison of in situ buoy and catalog wind speed.
Comparison of wind speeds provided by the catalog data sets.
Methods for extrapolation of catalog wind speed to typical wind turbine hub heights.
Extrapolation of catalog wind speed to typical wind turbine hub heights.
Wind power density estimates comparison with provided data.

How to cite: O’Callaghan, D. and McBreen, S.: Scalable Offshore Wind Analysis With Pangeo, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-2746, https://doi.org/10.5194/egusphere-egu22-2746, 2022.

Load EOOffshore Catalog Data¶

Imports required for subsequent processing

%matplotlib inline
from abc import ABC, abstractmethod
import cartopy.crs as ccrs
from collections import defaultdict
import copy
import dask.array as da
from dataclasses import dataclass, field
from datetime import datetime
from functools import lru_cache
import geopandas as gpd
from intake import Catalog, open_catalog
from IPython.display import Markdown
import itertools
import math
import matplotlib as mpl
import matplotlib.pyplot as plt
from metpy.units import units
import numpy as np
import pandas as pd
import regionmask
from scipy.interpolate import interpn
from scipy.special import gamma
from scipy.stats import pearsonr, weibull_min
import seaborn as sns
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.neighbors import NearestNeighbors
from typing import Dict, List, Tuple
import xarray as xr

mpl.rcParams['figure.figsize'] = [8, 5]
sns.set_style('whitegrid', {'axes.labelcolor': '0 ', "text.color": '0', 'xtick.color': '0', 'ytick.color': '0', 'font.sans-serif': ['DejaVu Sans', 'Liberation Sans', 'Bitstream Vera Sans', 'sans-serif'],})
sns.set_context('notebook', font_scale=1)

import warnings
warnings.filterwarnings('ignore')

Open the catalog¶

All EOOffshore data sets, for example, the ERA5 Zarr store, are accessible using the EOOffshore Intake catalog. Each catalog entry provides a description and metadata associated with the corresponding data set, defined in a YAML configuration file. The EOOffshore catalog configuration was originally influenced by the Pangeo Cloud Data Store atmosphere.yaml catalog configuration.

catalog = open_catalog('data/intake-catalogs/eooffshore_ics.yaml')

Load the catalog Zarr stores¶

Intake catalog entries typically specify a driver to be used when loading the corresponding data set. The ASCAT entries specify intake_xarray.xzarr.ZarrSource, a driver implementation provided by the intake-xarray library. This enables NetCDF and Zarr data sets to be loaded using xarray, a library for processing N-D labeled arrays and datasets. As xarray labels take the form of dimensions, coordinates and attributes on top of NumPy-like arrays, it is particularly suited to data sets such as ASCAT whose variables feature latitude/longitude grid coordinates.

This intake driver will load the associated dataset into an xarray.Dataset. To enable support for potentially large data sets, the to_dask() function is used to load the underlying variable arrays with Dask, a parallel, out-of-core computing library. The ZarrSource implementation will load the data set variables into Dask arrays, which will be loaded and processed in parallel as chunks during subsequent computation. Variable chunk sizes have been specified during Zarr store creation.

Here, a number of utility classes are created to enable processing of catalog entries throughout the notebook:

All data sets are loaded by an EooffshoreDataset instance.
XarrayEooffshoreDataset supports Zarr store data sets:
- to_dask() loads the store
- Variables are renamed if required, ensuring consistent names for all data sets, for example, wind_speed and wind_direction.
- Longitude rotation from [0, 360] to [-180, 180] is performed if required
CurvilinearEooffshoreDataset supports Zarr stores featuring two-dimensional curvilinear grids:
- xarray.DataArray.sel(..., method='nearest') does not support selection from two-dimensional curvilinear grids, for example, using latitude/longitude coordinates
- Two-dimenional grid selection is enabled by the use of a K-Nearest Neighbours (kNN) learner from scikit-learn (sklearn.neighbors.NearestNeighbors), which is fit with the grid coordinates when the Zarr store is loaded. To match the behaviour of xarray.DataArray.sel(..., method='nearest'), k is set to 1 to return the nearest coordinates to the specified parameters.
NewaOffshoreDataset supports the multiple AOI NEWA Zarr stores contained in the catalog, as described in the NEWA Wind Data for Irish Continental Shelf region notebook.
WindfarmEooffshoreDataset provides access to a synthetic wind farm AOI data set created for the EOOffshore project, and functionality for selecting corresponding data subsets from the Zarr stores, using GeoPandas and regionmask.
IwbEooffshoreDataset provides access to an EOffshore data set containing observations from the Irish Weather Buoy Network, and functionality for selecting corresponding data subsets from the Zarr stores.

@dataclass
class EooffshoreDataset:
    """Base EOOffshore dataset class, associated with a single Intake Catalog entry"""
    name: str # Short name used in plots
    catalog: Catalog = field(repr=False, hash=False, compare=False)
    dataset_id: str # Catalog identifier
    round_test_dataset_time: str = None

    def aoi_dataset(self, aoi: str):
        """Retrieve dataset subset for specified area of interest if required"""
        return self

@dataclass
class XarrayEooffshoreDataset(EooffshoreDataset):
    """Dataset backed by a Zarr store, NetCDF file etc."""
    dataset: xr.Dataset = field(init=False, hash=False, compare=False, repr=False)
    rename_variables: dict[str, str] = None
    rotate_longitude: bool = False # Rotate from [0, 360] to [-180, 180]
    
    def __post_init__(self):
        """Load the dataset from the Intake catalog, and perform some common initialisation"""
        ds = self.catalog[self.dataset_id].to_dask()
        
        # Enable consistent variable naming
        if self.rename_variables:
            ds = ds.rename(self.rename_variables)
            
        # Add a 'Season' coordinate derived from the 'time' coordinate
        self.dataset = ds.assign_coords({'Season': ds.time.dt.season})
           
    def sel_grid_nearest(self, latitude, longitude):
        """Select a subset of the dataset, nearest to the specified coordinates"""
        if self.rotate_longitude:
            longitude %= 360     
        return self._nearest_latitude_longitude(latitude, longitude)

    def _nearest_latitude_longitude(self, latitude, longitude):
        """Select a subset of the associated 1-D grid dataset, nearest to the specified coordinates"""
        return self.dataset.sel(longitude=longitude, latitude=latitude, method='nearest')

@dataclass
class CurvilinearEooffshoreDataset(XarrayEooffshoreDataset):
    """
    Additional functionality to handle issue where `xarray.Dataset.sel()` doesn't support calls to sel() for 2-D latitude/longitude grids. 
    This is enabled by the use of a K-Nearest Neighbours (kNN) learner
    """
    grid_neighbours: NearestNeighbors = field(init=False, hash=False, compare=False, repr=False)

    def __post_init__(self):
        """Fit nearest neighbours learner using dataset latitude/longitude grid, used in subsequent calls to sel()"""
        super().__post_init__()
        
        longitude = self.dataset.longitude
        latitude = self.dataset.latitude

        self.grid_neighbours = NearestNeighbors(n_neighbors=1, algorithm='auto').fit(np.stack((longitude.values.reshape(-1,), latitude.values.reshape(-1,)), axis=1))

    def _nearest_latitude_longitude(self, latitude, longitude):
        """Select a subset of the associated 2-D grid dataset, using the top kNN (k=1) neighbours to the specified coordinates"""
        # Select one neighbour similar to `xarray.Dataset.sel(..., method='nearest')`
        k = 1
        distances, indices = self.grid_neighbours.kneighbors(np.array([longitude, latitude]).reshape(1,-1), n_neighbors=k)
        yindex, xindex = np.unravel_index(indices, self.dataset.latitude.shape)
        ydim, xdim = self.dataset.latitude.dims
        return self.dataset.isel({ydim: yindex[0][0], xdim: xindex[0][0]})

        
@dataclass
class NewaEooffshoreDataset(EooffshoreDataset):
    """Additional functionality required for processing NEWA datasets. It acts as a mapper from AOI to underlying `xarray.Dataset`""" 
    aoi_mapping: Dict[str, str] = None
    datasets: Dict[str, CurvilinearEooffshoreDataset] = field(init=False, hash=False, compare=False)
    
    def __post_init__(self):
        """Preprocessing of the datasets loaded from the Intake catalog"""
        self.datasets = {aoi: self._load_newa_dataset(self.name, self.catalog, dataset_id) for aoi, dataset_id in self.aoi_mapping.items()}

    def aoi_dataset(self, aoi: str) -> EooffshoreDataset:
        """Retrieve dataset subset for area of interest"""
        aoids = self.datasets.get(aoi, None) 
        return aoids

    @classmethod
    @lru_cache
    def _load_newa_dataset(cls, name, catalog, dataset_id):
        """Loads Zarr store associated with an AOI"""
        newa_ds = CurvilinearEooffshoreDataset(name=name, 
                                               catalog=catalog, 
                                               dataset_id=dataset_id,
                                               rename_variables={'WS': 'wind_speed', 
                                                                 'WD': 'wind_direction', 
                                                                 'PD': 'power_density', 
                                                                 'XLAT': 'latitude', 
                                                                 'XLON': 'longitude'}
                                              )

        ds = newa_ds.dataset.drop(['south_north', 'west_east'])

        # Add provided 10m wind variable data to corresponding variable with a height dimension
        wind_variables = {}
        for variable_name_10m, variable_name in {'WS10': 'wind_speed', 'WD10': 'wind_direction'}.items(): 
            wind_variables[variable_name] = xr.concat((ds[variable_name], ds[variable_name_10m].assign_coords(height=10).expand_dims(['height'])), dim='height').sortby('height')
        ds = ds.reindex({'height': wind_variables['wind_speed'].height}).chunk({'height':1})
        for variable_name, wind_variable in wind_variables.items():
            ds[variable_name] = wind_variable
        
        newa_ds.dataset = ds
        return newa_ds
    
@dataclass
class WindfarmEooffshoreDataset(EooffshoreDataset):
    """Wind farm location shapefile dataset"""
    dataframe: gpd.GeoDataFrame = field(init=False, hash=False, compare=False, repr=False)
    windfarms: List[str] = None 
    windfarm_mask: xr.DataArray = field(init=False, hash=False, compare=False, repr=False)
    selection_buffer: float = 0.1

    def __post_init__(self):
        """Load the wind farm dataset from the catalog shapefile, and create a corresponding mask used for subsequent dataset subset selection"""
        self.dataframe = gpd.read_file(self.catalog[self.dataset_id].urlpath).to_crs("EPSG:4326")
        if not self.windfarms:
            self.windfarms = self.dataframe.Name.to_list()
        else:
            self.dataframe = self.dataframe[self.dataframe.Name.isin(self.windfarms)]
        
        # Higher-resolution mask that's used when dataset mask is insufficient during subsequent windfarm AOI retrieval
        self.windfarm_mask = regionmask.mask_geopandas(self.dataframe, np.arange(-11, -5, .007), np.arange(55.42, 51.0, -.007)).rename({'lat': 'latitude', 'lon':'longitude'})
        
    def windfarm_dataset(self, windfarm: str, dataset: XarrayEooffshoreDataset, spatial_mean: bool = False) -> xr.Dataset:
        """Select a dataset subset for the specified wind farm, using the wind farm dataset mask"""
        windfarm_id = self.dataframe[self.dataframe.Name==windfarm].id.item()
        ds = dataset.aoi_dataset(windfarm).dataset
    
        dataset_windfarm_mask = regionmask.mask_geopandas(self.dataframe, ds, lon_name='longitude', lat_name='latitude')

        # If the wind farm is present in the dataset mask, use that directly
        # Otherwise find the closest dataset coordinate to the high-res mask wind farm coordinates
        if windfarm_id in np.unique(dataset_windfarm_mask):
            windfarmds = ds.where(dataset_windfarm_mask==windfarm_id, drop=True)
        else:
            highresmask = self.windfarm_mask
            windfarm_lat = highresmask.where(highresmask==windfarm_id, drop=True).latitude.values
            windfarm_lon = highresmask.where(highresmask==windfarm_id, drop=True).longitude.values
            if dataset.rotate_longitude:
                windfarm_lon %= 360            
            windfarmds = ds.where((ds.latitude >= windfarm_lat.min() - self.selection_buffer) & (ds.latitude <= windfarm_lat.max() + self.selection_buffer) & (ds.longitude >= windfarm_lon.min() - self.selection_buffer) & (ds.longitude <= windfarm_lon.max() + self.selection_buffer), drop=True)
            
        if spatial_mean:
            # Calculate the mean for all wind farm subset coordinates
            # Both 1-d and 2-d grid dimensions are supported
            windfarmds = windfarmds.mean(dim=set(ds.latitude.dims) | set(ds.longitude.dims), keep_attrs=True) 
        return windfarmds.assign_coords({'Windfarm': windfarm}).expand_dims('Windfarm')

@dataclass
class Buoy:
    name: str
    longitude: float
    latitude: float
    
@dataclass
class IwbEooffshoreDataset(EooffshoreDataset):
    """Irish Weather Buoy dataset"""
    dataframe: pd.DataFrame = field(init=False, hash=False, compare=False, repr=False)
    buoy_coordinates: Dict[str, Buoy] = field(init=False)
    
    def __post_init__(self):
        """Preprocessing of the data frame loaded from the Intake catalog"""
        self.dataframe = self.catalog[self.dataset_id].read()
        # Retain only buoys still in use
        self.dataframe = self.dataframe[self.dataframe.station_id.isin(['M2', 'M3', 'M4', 'M5', 'M6'])].rename(columns={'longitude (degrees_east)': 'longitude', 'latitude (degrees_north)': 'latitude'})
        self.dataframe['wind_speed3'] = self.dataframe['WindSpeed (knots)'].apply(lambda x:(x*units.knot).to(units.meter_per_second).magnitude)
        
        # https://stackoverflow.com/questions/62917882/convert-datetime64ns-utc-pandas-column-to-datetime
        # https://stackoverflow.com/questions/16628819/convert-pandas-timezone-aware-datetimeindex-to-naive-timestamp-but-in-certain-t/62656878#62656878
        # Subsequent matching isn't possible if the time type is e.g. datetime64[ns, UTC]
        self.dataframe['time'] = pd.to_datetime(self.dataframe['time (UTC)']).dt.tz_convert(None)
        self.dataframe = self.dataframe[['station_id', 'time', 'longitude', 'latitude', 'WindDirection (degrees true)', 'wind_speed3']]
        self.buoy_coordinates = {name: Buoy(name=name, longitude=x['longitude'], latitude=x['latitude']) for name, x in self.dataframe.groupby('station_id')[['longitude', 'latitude']].agg(['unique']).applymap(lambda x:x[0]).droplevel(1, axis=1).T.to_dict().items()}

    @property
    def buoys(self):
        return list(self.buoy_coordinates.keys())
    
    def buoy_dataset(self, buoy: str, dataset: XarrayEooffshoreDataset) -> xr.Dataset:
        """Select a dataset subset using the coordinates of the specified buoy AOI"""
        buoyds = None
        try:
            buoyds = dataset.aoi_dataset(aoi=buoy).sel_grid_nearest(longitude=self.buoy_coordinates[buoy].longitude, latitude=self.buoy_coordinates[buoy].latitude)
            buoyds = buoyds.assign_coords({'Buoy': buoy}).expand_dims('Buoy')
        except:
            # If an AOI can't be found in the dataset, ignore it
            pass
        return buoyds

    def buoy_iwbdata(self, buoy: str) -> pd.DataFrame:
        """Retrieve IWB data subset for the specified buoy"""
        return self.dataframe[self.dataframe.station_id==buoy].copy(deep=True)

    def match_data(self, buoy: str, dataset: EooffshoreDataset, variable: str, height: int = 10) -> pd.DataFrame:
        """Match IWB data to Zarr store data for the specified buoy"""
        buoydf = self.buoy_iwbdata(buoy)
        
        # First, select the buoy dataset subset containing all available observations,
        buoyds = self.buoy_dataset(buoy, dataset)
        # Drop any null values prior to matching
        buoyds = buoyds.where(buoyds[variable].sel(height=height).notnull(), drop=True)
        ## Remove any duplicated times
        buoyds = buoyds.sel(time=~buoyds.get_index('time').duplicated())

        # Round time now if required
        if dataset.round_test_dataset_time is not None:
            dstime = np.unique(buoyds.time.dt.round(dataset.round_test_dataset_time).values)
        else:
            dstime = buoyds.time.values 
        
        buoydf = buoydf[buoydf.time.isin(dstime)]
                
        # Select closest times to the available data
        buoyds = buoyds.sel(time=buoydf.time.values, method='nearest')
        
        dsvariable = f'{dataset.name}_{variable}'
        buoydf[dsvariable] = buoyds.sel(Buoy=buoy, height=height)[variable].values
        return buoydf.dropna()

Load the ASCAT REProcessed store¶

Data is provided at 10m above surface level (height dimension).
The original wind_to_dir variable is rotated by 180 degrees to generate a (from) wind_direction, consistent with the other data sets.
It is possible to filter ASCAT data the provided quality flag masks (wvc_quality_flag). Consequently, two EooffshoreDataset instances are created, containing:
- All ASCAT observations
- Observations filtered using wvc_quality_flag
The ASCAT Wind Data for Irish Continental Shelf region notebook describes the store creation and quality flags.

ascatall = XarrayEooffshoreDataset(name='ASCAT-ALL', 
                                   catalog=catalog, dataset_id='eooffshore_ics_cms_WIND_GLO_WIND_L3_REP_OBSERVATIONS_012_005_MetOp_ASCAT',
                                   rotate_longitude=True,
                                   rename_variables={'lat': 'latitude', 'lon': 'longitude'},
                                   round_test_dataset_time='H',
                                  )

ascatwqf = XarrayEooffshoreDataset(name='ASCAT-WQF', 
                                   catalog=catalog, dataset_id='eooffshore_ics_cms_WIND_GLO_WIND_L3_REP_OBSERVATIONS_012_005_MetOp_ASCAT',
                                   rotate_longitude=True,
                                   rename_variables={'lat': 'latitude', 'lon': 'longitude'},
                                   round_test_dataset_time='H',
                                  )

# Retain data based on corresponding wind vector cell quality flag
ASCAT_NO_FLAG_SET = 0
ASCAT_SMALL_WIND_LESS_THAN_OR_EQUAL_TO_3_M_S = 2048
ASCAT_SOME_PORTION_OF_WVC_IS_OVER_LAND = 32768

acceptable_ascat_flag_masks = [ASCAT_NO_FLAG_SET, 
                               ASCAT_NO_FLAG_SET, # Included twice for subsequent combination sums
                               ASCAT_SMALL_WIND_LESS_THAN_OR_EQUAL_TO_3_M_S, 
                               ASCAT_SOME_PORTION_OF_WVC_IS_OVER_LAND]
# Accept all flag mask combinations
acceptable_ascat_flag_masks = np.unique([sum(x) for x in itertools.combinations(acceptable_ascat_flag_masks, 2)])
ascatwqf.dataset = ascatwqf.dataset.where(ascatwqf.dataset.wvc_quality_flag.isin(acceptable_ascat_flag_masks))

Load the CCMP store¶

Data is provided at 10m above surface level (height dimension).
Retain data having at least one observation used to derive CCMP wind vector components.
The CCMP Wind Data for Irish Continental Shelf region notebook describes the store creation and nobs filtering.

ccmp = XarrayEooffshoreDataset(name='CCMP', catalog=catalog, dataset_id='eooffshore_ics_ccmp_v02_1_nrt_wind', rotate_longitude=True)
ccmp.dataset = ccmp.dataset.where(ccmp.dataset.nobs >= 1)

Load the ERA5 store¶

Data is provided at both 10m and 100m above surface level (height dimension).
The ERA5 Wind Data for Irish Continental Shelf region notebook describes the store creation.

era5 = XarrayEooffshoreDataset(name='ERA5', catalog=catalog, dataset_id='eooffshore_ics_era5_single_level_hourly_wind')

Load the MÉRA store¶

Data is provided at multiple heights above surface level (height dimension).
The MÉRA Wind Data for Irish Continental Shelf region notebook describes the store creation.

mera = CurvilinearEooffshoreDataset(name='MÉRA', 
                                    catalog=catalog, 
                                    dataset_id='eooffshore_ics_mera_wind_ANALYSIS', 
                                    rotate_longitude=True,
                                   )

Load the NEWA stores¶

The NewaEOOffshoreDataset instance contains a map of AOI -> CurvilinearEooffshoreDataset, with each of the latter backed by a xarray.Dataset
Data is provided at multiple heights above surface level (height dimension).
Wind power density is provided.
The NEWA Wind Data for Irish Continental Shelf region notebook describes creation of the AOI stores.

newa = NewaEooffshoreDataset(name='NEWA', catalog=catalog, dataset_id='NEWA', aoi_mapping={'M2': 'eooffshore_ics_newa_irishsea', 
                                                                                           'M3': 'eooffshore_ics_newa_m3', 
                                                                                           'M4': 'eooffshore_ics_newa_m4', 
                                                                                           'M5': 'eooffshore_ics_newa_celticsea',
                                                                                           'Irish Sea 1': 'eooffshore_ics_newa_irishsea',
                                                                                           'Irish Sea 2': 'eooffshore_ics_newa_irishsea',
                                                                                           'Irish Sea 3': 'eooffshore_ics_newa_irishsea',
                                                                                           'Celtic Sea 1': 'eooffshore_ics_newa_celticsea',
                                                                                           'Celtic Sea 2': 'eooffshore_ics_newa_celticsea'})

Load the Sentinel-1 store¶

Data is provided at 10m above surface level (height dimension).
Retain grid data with:
- Wind data quality in [0, 2], i.e. filter poor/bad quality data
- 200+ SAR observations
- Wind speed $\ge 1$
The Sentinel-1 Wind Data for Irish Continental Shelf region notebook describes the store creation and data filtering.

sentinel1 = CurvilinearEooffshoreDataset(name='Sentinel-1', 
                                           catalog=catalog, 
                                           dataset_id='eooffshore_ics_level3_sentinel1_ocn', 
                                           rename_variables={'owiWindDirection': 'wind_direction', 'owiWindSpeed': 'wind_speed'},
                                           round_test_dataset_time='H',
                                           )

sentinel1.dataset = sentinel1.dataset.where((sentinel1.dataset.owiWindQuality >= 0) & (sentinel1.dataset.owiWindQuality <= 2))

sentinel1.dataset['total_sar_observations'] = sentinel1.dataset.wind_speed.count('time').compute()

# Exclude cells with low numbers of observations, and very low 10m wind speed values, including 0s
# The latter have been identified in analysis of wind speed distributions, and appear to be redundant
sentinel1.dataset = sentinel1.dataset.where((sentinel1.dataset.total_sar_observations >= 200) & (sentinel1.dataset.wind_speed.sel(height=10) > 1))

Load AOI catalog data¶

This section describes the AOI data sets contained in the EOOffshore catalog:

Irish Weather Buoy Network coordinates and data
Synthentic wind farm geometries

Load the Irish Weather Buoy Network data¶

The Irish Weather Buoy Network (IWB), managed by the Marine Institute in collaboration with Met Éireann and the UK Met Office, is a network of five operational buoys providing regular sea surface observations including wind speed and direction. Hourly IWB data for 2001 to 2021 was retrieved from the Marine Institute ERDDAP server and stored in a CSV file. The corresponding EOOffshore catalog entry is directly loaded into a pandas.DataFrame, retaining the wind speed, direction, and buoy coordinates data.

The satellite and reanalysis data sets contained in the EOOffshore catalog provide wind speed values at 10m above surface level. As the IWB buoy wind speed measurements are taken at 3m, it is necessary to extrapolate the latter to 10m values. As Remmers et al., 2019 used the log law method to perform a similar operation, it is applied here. Further details are provided later in this notebook.

def loglaw_extrapolate(z: int, wind_speed: float, zref: int = 10, z0: float = 0.0002):
    # Used for 3m ->10m extrapolation by Remmers et al., 2019
    return (math.log(z/z0)/math.log(zref/z0)) * wind_speed

iwb = IwbEooffshoreDataset(name='IWB', catalog=catalog, dataset_id='eooffshore_iwb_hourly')

# 3m extrapolated to 10m, similar to Remmers et al., 2019
iwb.dataframe['wind_speed10'] = loglaw_extrapolate(10, iwb.dataframe.wind_speed3, 3)

Load the synthentic wind farm shapefile¶

A synthetic wind farm AOI data set was created for the EOOffshore project, illustrating potential wind farm locations in the Irish Sea and North Celtic Sea. The data set is a shapefile containing wind farm polygon geometries, which is loaded into a geopandas.GeoDataFrame. These polygons, along with the IWB buoy coordinates, are used to select AOI data subsets of the Zarr store variables.

eoowindfarmds = WindfarmEooffshoreDataset(name='EOOffshore WF', catalog=catalog, dataset_id='eooffshore_windfarms')

Comparison of In situ Buoy and Data Set Wind Speed¶

This section provides a comparison of in situ IWB buoy 10m wind speed values with corresponding wind speeds provided by the EOOffshore catalog data stores. All comparisons in this notebook generate:

Metrics table containing entries for all stores
Density scatter plot for each IWB, data set pair

Comparison metrics¶

The following metrics are calculated for all comparisons in this notebook:

Root Mean Square Error (RMSE)

$\mathrm{RMSE}(y, \hat{y}) = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}$

$y$: reference (true) variable, e.g. IWB buoy wind speed for a particular coordinate
$\hat{y}$: test (predicted) variable, e.g. corresponding ERA5 wind speed for a particular coordinate
$n$: number of samples/observations

Uses sklearn.metrics.mean_squared_error(squared=False)

Mean Absolute Error (MAE)

$\mathrm{MAE}(y, \hat{y}) = \frac{1}{n} \sum_{i=1}^{n}|(y_i - \hat{y}_i)|$

Uses sklearn.metrics.mean_absolute_error()

Bias

$\mathrm{Bias}(y, \hat{y}) = \frac{1}{n} \sum_{i=1}^{n}(\hat{y}_i - y_i)$

$\pmb{R^2}$

$R^2(y, \hat{y}) = 1 - \dfrac{\sum_{i=1}^{n}|(y_i - \hat{y}_i)|}{\sum_{i=1}^{n}|(y_i - \bar{y})|}$

where $\bar{y} = \frac{1}{n}\sum_{i=1}^{n}y_i$

Uses sklearn.metrics.r2_score()

Pearson Correlation Coefficient ($r$)

$r(y, \hat{y}) = \dfrac{\sum(y - \bar{y})(\hat{y} - \bar{\hat{y}})}{\sqrt{\sum(y - \bar{y})^2\sum(\hat{y} - \bar{\hat{y}})^2}}$

where $\bar{\hat{y}} = \frac{1}{n}\sum_{i=1}^{n}\hat{y_i}$

Uses scipy.stats.pearsonr()

The metrics calculations and density scatter plots are implemented in the following utility functions. Using Dask, the data set loading process is lazy, where no data is loaded inititally. Instead, data loading is delayed until execution time, where variables will be loaded and processed in parallel according to the corresponding chunks specification. Dask arrays implement a subset of the NumPy ndarray interface using blocked algorithms, and the original variable arrays will be split into smaller chunk arrays, enabling computation on arrays larger than memory using all available cores. The blocked algorithms are coordinated using Dask graphs.

Here, a subset of IWB and Zarr store data will be selected for each AOI, using matched coordinates and time, and a matched variable selected for comparison. Although Dask graph execution is typically triggered by calling compute(), this will also automatically occur when a subset variable’s values are accessed (by means of metrics calculations and scatter plots). The resulting variable values will be contained in a NumPy ndarray.

Graph execution is managed by a task scheduler. The default scheduler (used for executing this notebook) executes computations with local threads. However, execution may also be performed on a distributed cluster without any change to the xarray code used in this notebook.

# Wind turbine speed cut-in/cut-out limits (m/s)
CUT_IN_WIND_SPEED = 3
CUT_OUT_WIND_SPEED = 25

def calculate_metrics(y_true: np.array, y_pred: np.array):
    """Calculate comparison metrics using true (reference) and predicted (test) values"""
    metrics = {'$n$': y_true.shape[0],
               'RMSE': mean_squared_error(y_true=y_true, y_pred=y_pred, squared=False),
               'MAE': mean_absolute_error(y_true=y_true, y_pred=y_pred),
               'Bias': np.mean(y_pred - y_true),
               '$R^2$': r2_score(y_true=y_true, y_pred=y_pred),               
              }
    # In certain cases, e.g. Weibull-based power density estimation, only mean values will be specified
    if len(y_true) > 1:
        metrics.update({'$r$': pearsonr(y_true, y_pred)[0]})
        
    return pd.Series(metrics, name=None)

def density_scatter_plot(x, y, ax, bins=20):
    """Create a density scatter plot for comparing variable values from two data sets"""
    # Handle large number of points: https://stackoverflow.com/questions/20105364/how-can-i-make-a-scatter-plot-colored-by-density-in-matplotlib
    data, x_e, y_e = np.histogram2d(x, y, bins=bins, density=True)
    z = interpn(( 0.5*(x_e[1:] + x_e[:-1]) , 0.5*(y_e[1:]+y_e[:-1])), data, np.vstack([x,y]).T, method="splinef2d", bounds_error=False)

    # Plot all data
    z[np.where(np.isnan(z))] = 0.0

    # Sort the points by density, so that the densest points are plotted last
    idx = z.argsort()
    x, y, z = x[idx], y[idx], z[idx]

    ax.scatter(x, y, c=z, cmap='viridis')

def buoy_metrics(variable: str, 
                 height: int, 
                 variable_min: float, 
                 variable_max: float): 
    """Compare IWB buoy variable values with those provided by the catalog Zarr stores"""
    allmetrics = []
    datasets = [ascatall, ascatwqf, ccmp, era5, mera, newa, sentinel1]
    
    variable_lim=(int(variable_min)-1, int(variable_max)+1)
    
    dataset_cols = 4
    subplot_height = 5.5
    subplot_wspace = 0.25
    figwidth = 1 + (subplot_height + subplot_wspace) * dataset_cols
    fig, axes = plt.subplots(nrows=2, ncols=dataset_cols, figsize=(figwidth, subplot_height * 2))
    plt.subplots_adjust(wspace=subplot_wspace, hspace=0.35)
    axes = axes.flatten()

    for i, dataset in enumerate(datasets):
        ax=axes[i]

        buoydf = None
        # Match data for all buoys
        for buoy in iwb.buoy_coordinates:
            try:
                buoydf = pd.concat((buoydf, iwb.match_data(buoy, dataset, variable)), ignore_index=True)
            except:
                print(f'Unable to match IWB {buoy} data with {dataset.name} data')

        dsvariable = f'{dataset.name}_{variable}'
        iwbvariable = f'{variable}{height}'
        buoydf = buoydf[(buoydf[iwbvariable] >= variable_min) & (buoydf[iwbvariable] <= variable_max) & (buoydf[dsvariable] >= variable_min) & (buoydf[dsvariable] <= variable_max)]
        if len(buoydf) == 0:
            print(f'No filtered {variable} available for match of IWB data with {dataset.name} data')
            ax.set_visible(False)
            continue
          
        metricsdf = calculate_metrics(y_true=buoydf[iwbvariable], y_pred=buoydf[dsvariable]).to_frame().rename(columns={0: dataset.name}).T
        allmetrics.append(metricsdf)
        
        density_scatter_plot(x=buoydf[dsvariable].values, y=buoydf[iwbvariable].values, ax=ax)
        sns.regplot(x=dsvariable, y=iwbvariable, data=buoydf, ax=ax, scatter=False);
        ax.set(adjustable='box', aspect='equal')
        ax.set_title(f'{dataset.name} {height}m wind speed ($n$ = {len(buoydf):,})')
        ax.set_xlim(variable_lim)
        ax.set_ylim(variable_lim)
        ax.set_xlabel(dataset.name);
        ax.set_ylabel(f'{iwb.name}');
    axes[-1].set_visible(False)

    fig.suptitle(f'{iwb.name} buoys - {height}m wind speed')
    
    allmetricsdf = pd.concat(allmetrics)
    dfstyle = {'$n$': '{:,.0f}'}
    dfstyle.update({col: '{:.3f}' for col in allmetricsdf.columns[1:]})
    return allmetricsdf.style.format(dfstyle)

Wind speed summary¶

Mean 10m wind speed for IWB buoy coordinates

buoymeanws = []
for buoy in iwb.buoys:
    datasets = [ascatall, ascatwqf, ccmp, era5, mera, newa.aoi_dataset(buoy), sentinel1]
    meanws = {'IWB': iwb.buoy_iwbdata(buoy).wind_speed10.mean()}
    meanws.update({dataset.name: iwb.buoy_dataset(buoy, dataset).wind_speed.sel(height=10).mean(dim='time').values[0] for dataset in datasets if dataset})
    buoymeanws.append(meanws)
iwbmeanwsdf = pd.DataFrame(buoymeanws, index=iwb.buoys)
iwbmeanwsdf.style.format('{:.2f}')

	IWB	ASCAT-ALL	ASCAT-WQF	CCMP	ERA5	MÉRA	NEWA	Sentinel-1
M2	8.14	7.96	7.98	8.29	7.80	8.00	8.24	8.15
M3	8.75	8.73	8.73	8.75	8.51	8.82	9.02	8.44
M4	9.20	9.09	9.08	9.20	8.86	9.26	9.51	9.06
M5	8.46	8.19	8.19	8.16	8.03	8.42	8.49	8.18
M6	9.37	9.72	9.72	9.68	9.38	9.53	nan	nan

IWB wind speed comparison with data sets¶

IWB buoy 10m wind speed compared with all other data sets

buoy_metrics(variable='wind_speed', height=10, variable_min=CUT_IN_WIND_SPEED, variable_max=CUT_OUT_WIND_SPEED)

Unable to match IWB M6 data with NEWA data
Unable to match IWB M6 data with Sentinel-1 data

	$n$	RMSE	MAE	Bias	$R^2$	$r$
ASCAT-ALL	21,257	1.310	0.979	-0.072	0.863	0.932
ASCAT-WQF	21,005	1.300	0.973	-0.075	0.865	0.933
CCMP	18,976	1.456	1.077	0.328	0.814	0.925
ERA5	492,050	1.440	1.102	-0.374	0.835	0.921
MÉRA	114,744	1.403	1.071	-0.238	0.843	0.925
NEWA	195,899	1.948	1.447	0.335	0.669	0.859
Sentinel-1	1,618	1.453	1.097	-0.540	0.827	0.928

Data Sets Wind Speed Comparison¶

In contrast to the previous section that compares IWB buoy data with corresponding satellite and reanalysis Zarr stores contained in the EOOffshore catalog, the remainder of this notebook focuses on comparisons between the Zarr stores. The approach is as follows:

Select a reference data set (analagous to the role of IWB data in the previous section).
Select one or more test data sets.
Select AOI subsets of the reference and test data sets, using IWB buoy and wind farm coordinates, and the time dimension.
Calculate metrics for each reference-test data set comparison, and generate corresponding density scatter plots, for the specified variable/height parameters.

Both the MÉRA and NEWA Zarr stores are used as reference data sets, as they provide data at multiple heights above the surface. They are not strictly considered as ground truth data sets, instead they are used here to demonstrate how the combination of the xarray, Dask, and Zarr libraries results in a scalable solution that can be applied to data sets of arbitrary sizes (related to heterogeneous spatial and temporal resolutions), as is the case with the EOOffshore catalog wind data sets. Parallel, out-of-core, computation of underlying variable chunks is achievable with identical calls to the xarray API. In the event of a future data set becoming available containing wind measurements at multiple heights (for example, LiDAR data) with high spatial coverage, it could be used as a more suitable reference data set.

Here are base classes that will be used for wind speed extrapolation. They are defined here as they are required for metrics calculation, and additional details will be provided later in this notebook.

class Extrapolator(ABC):
    """Base Extrapolator class"""
    
    @abstractmethod
    def extrapolate(self, ds: xr.Dataset, variable_name: str, heights: list[int]):
        """Extrapolate data set variable to required heights"""
        pass
    
    def _add_extrapolated_variable(self, ds: xr.Dataset, variable_name: str, extrapolated: dict, reindex_height: bool, variable_attrs: dict={}) -> xr.Dataset: 
        """Adds extrapolated values to existing variable, at height, and returns an updated Dataset"""
        height_attrs = ds.height.attrs
        ea = xr.concat([vda.assign_coords(height=height).expand_dims(['height']) for height, vda in extrapolated.items()], dim='height').sortby('height')
        if reindex_height:
            ds = ds.reindex({'height': ea.height}).chunk({'height':1})
        ds[variable_name] = ea
        # Height attributes are overwritten, restore them now
        ds.height.attrs.update(height_attrs)
        
        if variable_attrs:
            ds[variable_name].attrs.update(variable_attrs)
            
        return ds
    
class WindSpeedExtrapolator(Extrapolator):
    """Base wind speed Extrapolator class"""
    
    def extrapolate(self, ds: xr.Dataset, heights: list[int], variable_name: str = 'wind_speed'):
        """Extrapolate wind speed to required heights"""
        
        # Retain pre-existing height if required
        extrapolated_wind_speed = {height: ds[variable_name].sel(height=height) for height in ds.height.values if height not in heights}

        # Extrapolate to other heights
        for height in [h for h in heights if h not in extrapolated_wind_speed]:
            extrapolated_wind_speed[height] = self._extrapolate_wind_speed(ds=ds, variable_name=variable_name, extrapolation_height=height, reference_height=10)
        
        # Reindex height dimension if required
        reindex_height = len([h for h in heights if h not in ds.height.values]) > 0
        return self._add_extrapolated_variable(ds, variable_name, extrapolated_wind_speed, reindex_height=reindex_height)
    
    @abstractmethod
    def _extrapolate_wind_speed(self, ds: xr.Dataset, variable_name, extrapolation_height: int, reference_height: int) -> xr.DataArray:
        """Wind speed extrapolation method implementation"""
        pass

These classes implement reference and test data set comparison, enabling AOI subset matching, metrics calculation, and density scatter plot generation.

@dataclass
class AoiDatasetComparison:
    referenceds: xr.Dataset
    testds: xr.Dataset
    round_test_dataset_time: str = None
    
@dataclass
class AoiWindSpeedTestDataset:
    eooffshore_dataset: EooffshoreDataset
    extrapolator: WindSpeedExtrapolator = None

@dataclass
class AoiWindSpeedMetrics:
    reference_dataset: EooffshoreDataset = field(repr=False, hash=False, compare=False)
    reference_name: str = field(init=False)
    test_datasets: List[AoiWindSpeedTestDataset] = field(repr=False, hash=False, compare=False)
    comparison_height: int
    method: str = None
    variable: str = 'wind_speed'
    variable_min: float = CUT_IN_WIND_SPEED
    variable_max: float = CUT_OUT_WIND_SPEED
    variable_lim: Tuple[float, float] = field(init=False)
    buoys: List[str] = field(default_factory=lambda: ['M2', 'M3', 'M4', 'M5'])
    windfarms: List[str] = field(default_factory=lambda: eoowindfarmds.windfarms)
    discard_matches: bool = True
    matchesdf: pd.DataFrame = field(default=None, repr=False, hash=False, compare=False)
    metricsdf: pd.DataFrame = field(default=None, repr=False, hash=False, compare=False)
    
    def __post_init__(self):
        self._init_variables()
        self.compare_datasets()

    def _init_variables(self):
        """Perform extrapolation/estimation to required height"""
        self.reference_name = self.reference_dataset.name
        self.variable_lim = (int(self.variable_min)-1, int(self.variable_max)+1)

        for test_dataset in [x for x in self.test_datasets if x.extrapolator is not None]:
            test_dataset.eooffshore_dataset.dataset = test_dataset.extrapolator.extrapolate(ds=test_dataset.eooffshore_dataset.dataset, 
                                                                                            heights=[self.comparison_height])

    def compare_datasets(self):
        """Compare reference data set with all test data sets, for all IWB and wind farm AOIs"""
        for test_eooffshore_dataset in [x.eooffshore_dataset for x in self.test_datasets]:
            dataset_comparisons = [AoiDatasetComparison(referenceds=iwb.buoy_dataset(buoy, self.reference_dataset), 
                                                        testds=iwb.buoy_dataset(buoy, test_eooffshore_dataset),
                                                        round_test_dataset_time=test_eooffshore_dataset.round_test_dataset_time) for buoy in self.buoys]
            dataset_comparisons += [AoiDatasetComparison(referenceds=eoowindfarmds.windfarm_dataset(windfarm, self.reference_dataset, True), 
                                                         testds=eoowindfarmds.windfarm_dataset(windfarm, test_eooffshore_dataset, True),
                                                         round_test_dataset_time=test_eooffshore_dataset.round_test_dataset_time) for windfarm in self.windfarms]
            self._compare_test_dataset(dataset_comparisons, test_eooffshore_dataset.name)

        self._comparison_plot()

        dfstyle = {'$n$': '&nbsp;{:,.0f}'} 
        dfstyle.update({col: '&nbsp;{:.3f}' for col in self.metricsdf.columns[1:]})
        self.metricsdf = self.metricsdf.style.format(dfstyle)
        display(self.metricsdf)
        if self.discard_matches:
            self.matchesdf = None
        
    def _match_variables(self, dataset_comparison, name):
        """Match datasets on time and non-null variable values"""
        
        def preprocess_variable(ds, name, deduplicate_time=False):
            """Time-related preprocessing"""
            variable = ds.sel(height=self.comparison_height)[name]
            # Drop any null values prior to matching times
            variable = variable.where(variable.notnull(), drop=True)
            if deduplicate_time:
                # Remove any duplicated times
                variable = variable.sel(time=~variable.get_index('time').duplicated())
            return variable

        reference_variable = preprocess_variable(dataset_comparison.referenceds, name)
        test_variable = preprocess_variable(dataset_comparison.testds, name, deduplicate_time=True)
            
        if 'time' in test_variable.dims:
            # Match time
            if dataset_comparison.round_test_dataset_time is None:
                test_variable_time = test_variable.time.values
            else:
                test_variable_time = np.unique(test_variable.time.dt.round(dataset_comparison.round_test_dataset_time).values)
                
            reference_variable = reference_variable.sel(time=np.isin(reference_variable.time, test_variable_time))
            test_variable = test_variable.sel(time=reference_variable.time, method='nearest')
            
        return reference_variable, test_variable
        
    def _compare_test_dataset(self, dataset_comparisons: List[AoiDatasetComparison], test_name: str):
        """Compare reference-test data set pair for all AOIs"""
        test_matches = []
        for dataset_comparison in dataset_comparisons:
            reference_variable, test_variable = self._match_variables(dataset_comparison, self.variable)
        
            # Certain AOIs may not be found in the test dataset
            if len(test_variable) == 0:
                continue
                
            matchdf = pd.DataFrame({self.reference_name: reference_variable.values.reshape(-1,), 
                                    'Test': test_variable.values.reshape(-1,)})
            matchdf = matchdf.dropna()

            # Filter as required
            matchdf = matchdf[(matchdf[self.reference_name] >= self.variable_min) & (matchdf[self.reference_name] <= self.variable_max) & (matchdf.Test >= self.variable_min) & (matchdf.Test <= self.variable_max)]
            test_matches.append(matchdf)

        test_matchesdf = pd.concat(test_matches, ignore_index=True)
        test_matchesdf['Dataset'] = test_name
        self.matchesdf = pd.concat((self.matchesdf, test_matchesdf), ignore_index=True)        
        test_metricsdf = self._calculate_metrics(y_true=test_matchesdf[self.reference_name], 
                                                 y_pred=test_matchesdf.Test).to_frame().rename(columns={0: test_name}).T
        self.metricsdf = pd.concat((self.metricsdf, test_metricsdf))

    def _calculate_metrics(self, y_true: np.array, y_pred: np.array):
        """Calculate comparison metrics using true (reference) and predicted (test) values"""
        metrics = {'$n$': y_true.shape[0],
                   'RMSE': mean_squared_error(y_true=y_true, y_pred=y_pred, squared=False),
                   'MAE': mean_absolute_error(y_true=y_true, y_pred=y_pred),
                   'Bias': np.mean(y_pred - y_true),
                   '$R^2$': r2_score(y_true=y_true, y_pred=y_pred),
                   '$r$': pearsonr(y_true, y_pred)[0]
                  }
        return pd.Series(metrics, name=None)

    def _comparison_plot(self):
        """Create a density scatter plot for all reference-test data set pairs"""
        num_test_datasets = len(self.test_datasets)
        ncols = 3 if num_test_datasets > 4 else num_test_datasets
        nrows = 2 if num_test_datasets > ncols else 1
        subplot_height = 5.5
        subplot_wspace = 0.25
        figwidth = 1 + (subplot_height + subplot_wspace) * ncols
        fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(figwidth, subplot_height * nrows))
        plt.subplots_adjust(wspace=subplot_wspace, hspace=0.35)
        axes = axes.flatten() if num_test_datasets > 1 else [axes]

        variable_label = self.variable.replace('_', ' ')
        for i, test_name in enumerate(sorted(self.matchesdf.Dataset.unique())):
            ax = axes[i]
            test_matchesdf = self.matchesdf[self.matchesdf.Dataset==test_name]
            density_scatter_plot(x=test_matchesdf.Test.values, y=test_matchesdf[self.reference_name].values, ax=ax)
            sns.regplot(x='Test', y=self.reference_name, data=test_matchesdf, ax=ax, scatter=False);
            ax.set(adjustable='box', aspect='equal')
            ax.set_title(f'{test_name} {self.comparison_height}m {variable_label} ($n$ = {len(test_matchesdf):,})')
            ax.set_xlim(self.variable_lim)
            ax.set_ylim(self.variable_lim)
            ax.set_xlabel(test_name);
            ax.set_ylabel(self.reference_name);
        # Disable any axes if required
        for i in range(1, 1 + (ncols * nrows) - num_test_datasets):
            axes[-i].set_visible(False)
        fig.suptitle(f'{self.reference_name} {self.comparison_height}m {variable_label} (IWB buoy, wind farm AOIs)')

10m wind speed¶

This section provides 10m wind speed comparison outputs

MÉRA 10m wind speed¶

MÉRA 10m wind speed is compared with corresponding wind speed provided by other data sets at the IWB buoy and wind farm AOI coordinates.

Sentinel-1 is excluded as data starts from mid-2017, while MÉRA provides data until the end of 2016.
The M6 IWB buoy is excluded as NEWA data is unavailable for these coordinates

AoiWindSpeedMetrics(reference_dataset=mera,
                    test_datasets=[AoiWindSpeedTestDataset(eooffshore_dataset=x) for x in [ascatall, ascatwqf, ccmp, era5, newa]],
                    comparison_height=10);

	$n$	RMSE	MAE	Bias	$R^2$	$r$
ASCAT-ALL	5,363	1.393	1.059	-0.151	0.847	0.921
ASCAT-WQF	4,495	1.368	1.049	-0.200	0.852	0.925
CCMP	11,225	1.496	1.141	0.172	0.816	0.911
ERA5	370,942	1.442	1.113	-0.731	0.832	0.937
NEWA	184,816	1.792	1.326	0.142	0.743	0.875

NEWA 10m wind speed¶

NEWA 10m wind speed is compared with corresponding wind speed provided by other data sets at the IWB buoy and wind farm AOI coordinates.

Sentinel-1 is included as NEWA provides data until the end of 2016.
The M6 IWB buoy is excluded as NEWA data is unavailable for these coordinates

AoiWindSpeedMetrics(reference_dataset=newa,
                    test_datasets=[AoiWindSpeedTestDataset(eooffshore_dataset=x) for x in [ascatall, ascatwqf, ccmp, era5, mera, sentinel1]],
                    comparison_height=10);

	$n$	RMSE	MAE	Bias	$R^2$	$r$
ASCAT-ALL	24,829	1.733	1.288	-0.255	0.770	0.881
ASCAT-WQF	20,828	1.699	1.262	-0.289	0.782	0.888
CCMP	20,665	1.627	1.216	0.052	0.786	0.895
ERA5	689,818	1.936	1.465	-0.899	0.709	0.878
MÉRA	184,816	1.792	1.326	-0.142	0.753	0.875
Sentinel-1	2,325	2.317	1.822	-1.264	0.572	0.838

Wind Speed Extrapolation¶

Wind speed extrapolation is the process of extrapolating data values at provided heights to one or more required heights for which data is unavailable, generating a wind speed profile at an AOI. For satellite or reanalysis, data is usually restricted to values at 10m above the surface level. Consequently, extrapolation may be used with these 10m data to generate wind speed values at typical wind turbine hub heights. The following publications provide examples of wind speed extrapolation:

This section is not intended to be a complete analysis of possible extrapolation methods. Instead, its objective is to demonstrate how a comparison of methods may be implemented using the scalable approach offered by the combination of the xarray, Dask and Zarr libraries, applied to data at multiple heights provided by MÉRA and NEWA.

Power law extrapolation¶

The power law method is often used for extrapolating 10m wind speed observations to hub heights (for example, Remmers et al., 2019; Schelbergen et al., 2020). It is defined as follows (Hsu et al., 1994 - Determining the Power-Law Wind-Profile Exponent under Near-Neutral Stability Conditions at Sea):

$u = u_r\left(\dfrac{z}{z_r}\right)^\alpha$

$u$: wind speed ($m s^{-1}$) at required height $z$ ($m$)
$z_r$: reference height (e.g. 10m for data provided by ERA5, Sentinel-1 etc.)
$u_r$: wind speed at $z_r$
$\alpha$: function of atmospheric stability and surface characteristics

This class provides an implementation of power law extrapolation that will operate on the EOOffshore Zarr stores

class PowerLawWindSpeedExtrapolator(WindSpeedExtrapolator):
    """Wind speed Extrapolator implementation, using the power law method"""
    alpha: float = 0.11 # Hsu 1994

    def __init__(self, alpha: float=0.11):
        self.alpha = alpha
    
    def _extrapolate_wind_speed(self, ds: xr.Dataset, variable_name: str, extrapolation_height: int, reference_height: int) -> xr.DataArray:
        """Wind speed extrapolation method implementation"""
        return ds[variable_name].sel(height=reference_height) * (extrapolation_height/reference_height)**self.alpha
    
powerlaw_extrapolator = PowerLawWindSpeedExtrapolator()

MÉRA 125m wind speed comparison¶

10m wind speed is extrapolated using the power law method, and compared with the corresponding MÉRA 125m wind speed at the IWB buoy and wind farm AOI coordinates.

This data structure is used to record wind speed extrapolation comparison metrics obtained throughout this notebook

wind_speed_extrapolation_metrics = []

test_datasets = [AoiWindSpeedTestDataset(eooffshore_dataset=ascatall, extrapolator=powerlaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=ascatwqf, extrapolator=powerlaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=ccmp, extrapolator=powerlaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=era5, extrapolator=powerlaw_extrapolator)]

wind_speed_extrapolation_metrics.append(AoiWindSpeedMetrics(reference_dataset=mera,
                                                            test_datasets=test_datasets,
                                                            comparison_height=125,
                                                            method='Power Law'))

	$n$	RMSE	MAE	Bias	$R^2$	$r$
ASCAT-ALL	5,521	2.185	1.721	0.740	0.744	0.889
ASCAT-WQF	4,627	2.152	1.699	0.730	0.753	0.894
CCMP	11,461	2.504	1.995	1.259	0.651	0.881
ERA5	385,710	1.668	1.299	0.058	0.853	0.924

NEWA 150m wind speed comparison¶

150m wind speed is extrapolated using the power law method, and compared with the corresponding NEWA 150m wind speed at the IWB buoy and wind farm AOI coordinates.

test_datasets = [AoiWindSpeedTestDataset(eooffshore_dataset=ascatall, extrapolator=powerlaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=ascatwqf, extrapolator=powerlaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=ccmp, extrapolator=powerlaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=era5, extrapolator=powerlaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=sentinel1, extrapolator=powerlaw_extrapolator)
                ]

wind_speed_extrapolation_metrics.append(AoiWindSpeedMetrics(reference_dataset=newa,
                                                            test_datasets=test_datasets,
                                                            comparison_height=150,
                                                            method='Power Law'))

	$n$	RMSE	MAE	Bias	$R^2$	$r$
ASCAT-ALL	25,408	2.814	2.215	1.013	0.610	0.828
ASCAT-WQF	21,274	2.785	2.198	1.027	0.622	0.835
CCMP	21,078	2.964	2.342	1.480	0.526	0.837
ERA5	716,184	2.277	1.748	0.144	0.741	0.866
Sentinel-1	2,424	2.813	2.167	-0.306	0.580	0.791

Power law $\alpha$ estimation¶

A value of $\alpha$ = 0.1 or 0.11 is often used as a good approximation for use at sea with the power law method (Hsu et al., 1994). However, the positive Bias obtained with the corresponding comparisons (in contrast to the 10m comparisons Bias) may identify that the extrapolated wind speeds are overestimated, suggesting that an alternative $\alpha$ value may be more suitable. For example, the ERA5 data set provides wind speed at both 10m and 100m, which may be used to calculate $\alpha$ as follows:

$\alpha = \dfrac{\mathrm{ln}\left(\dfrac{u_2}{u_1}\right)}{\mathrm{ln}\left(\dfrac{z_2}{z_1}\right)}$

$z_1$: height 1, e.g. 10m
$z_2$: height 2, e.g. 100m
$u_1$: wind speed ($m s^{-1}$) at height $z_1$, e.g. ERA5 10m wind speed
$u_2$: wind speed ($m s^{-1}$) at height $z_2$, e.g. ERA5 100m wind speed

Here, the following are computed:

An ERA5 powerlaw_alpha variable, computed at each time for all grid coordinates. This will be used for extrapolating ERA5 10m wind speed to heights other than 100m.
A power law extrapolator using powerlaw_alpha.

Note:

xarray provides implementations of certain NumPy universal functions (ufunc), including xarray.ufuncs.log() as used to calculate $\alpha$. Using ufuncs ensures that lazy computation applies, i.e. the corresponding Dask graph execution does not occur until compute() is called.

default_powerlaw_era5 = copy.deepcopy(era5)
default_powerlaw_era5.name='ERA5 (alpha=0.11)'

alpha_powerlaw_era5 = copy.deepcopy(era5)
alpha_powerlaw_era5.name='ERA5 (alpha=ERA5)'

era5.dataset['powerlaw_alpha'] = (xr.ufuncs.log(era5.dataset.wind_speed.sel(height=100)/era5.dataset.wind_speed.sel(height=10))/math.log(100/10))

era5_alpha_powerlaw_extrapolator = PowerLawWindSpeedExtrapolator(alpha=era5.dataset.powerlaw_alpha)

ERA5 10m wind speed is extrapolated using the Power Law method with power_alpha, and compared with the corresponding ERA5 15m wind speed at the IWB buoy and wind farm AOI coordinates.

test_datasets=[AoiWindSpeedTestDataset(eooffshore_dataset=default_powerlaw_era5, extrapolator=powerlaw_extrapolator),
               AoiWindSpeedTestDataset(eooffshore_dataset=alpha_powerlaw_era5, extrapolator=era5_alpha_powerlaw_extrapolator)
              ]
AoiWindSpeedMetrics(reference_dataset=era5,
                    test_datasets=test_datasets,
                    comparison_height=100);

	$n$	RMSE	MAE	Bias	$R^2$	$r$
ERA5 (alpha=0.11)	1,520,056	0.823	0.689	0.325	0.961	0.984
ERA5 (alpha=ERA5)	1,527,165	0.000	0.000	-0.000	1.000	1.000

An exact match with no errors is obtained. Here is the ERA5 power_alpha distribution

mean_alphadf = era5.dataset.powerlaw_alpha.mean(dim='time').to_dataframe()
mean_alphadf.hist(bins=30);

mean_alphadf.describe().T

	count	mean	std	min	25%	50%	75%	max
powerlaw_alpha	4165.0	0.080109	0.034877	0.06259	0.06878	0.070319	0.072935	0.249005

Power Law extrapolation comparisons are now made as follows:

ERA5 10m wind speed will be extrapolated using the powerlaw_alpha variable, computed at each time for all grid coordinates.
A power law extrapolator with $\alpha$ = mean ERA5 powerlaw_alpha will be used for the other data sets.

mean_era5_alpha_powerlaw_extrapolator = PowerLawWindSpeedExtrapolator(alpha=mean_alphadf.powerlaw_alpha.mean())

MÉRA 125m wind speed comparison using ERA5 power law $\alpha$¶

test_datasets = [AoiWindSpeedTestDataset(eooffshore_dataset=ascatall, extrapolator=mean_era5_alpha_powerlaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=ascatwqf, extrapolator=mean_era5_alpha_powerlaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=ccmp, extrapolator=mean_era5_alpha_powerlaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=era5, extrapolator=era5_alpha_powerlaw_extrapolator)]

wind_speed_extrapolation_metrics.append(AoiWindSpeedMetrics(reference_dataset=mera,
                                                            test_datasets=test_datasets,
                                                            comparison_height=125,
                                                            method='Power Law (ERA5-based alpha)'))

	$n$	RMSE	MAE	Bias	$R^2$	$r$
ASCAT-ALL	5,509	1.990	1.542	-0.071	0.789	0.890
ASCAT-WQF	4,614	1.957	1.520	-0.086	0.797	0.895
CCMP	11,482	2.117	1.655	0.438	0.758	0.884
ERA5	383,488	1.427	1.075	-0.309	0.891	0.947

NEWA 150m wind speed comparison using ERA5 power law $\alpha$¶

test_datasets = [AoiWindSpeedTestDataset(eooffshore_dataset=ascatall, extrapolator=mean_era5_alpha_powerlaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=ascatwqf, extrapolator=mean_era5_alpha_powerlaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=ccmp, extrapolator=mean_era5_alpha_powerlaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=era5, extrapolator=era5_alpha_powerlaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=sentinel1, extrapolator=mean_era5_alpha_powerlaw_extrapolator),
                ]

wind_speed_extrapolation_metrics.append(AoiWindSpeedMetrics(reference_dataset=newa,
                                                            test_datasets=test_datasets,
                                                            comparison_height=150,
                                                            method='Power Law (ERA5-based alpha)'))

	$n$	RMSE	MAE	Bias	$R^2$	$r$
ASCAT-ALL	25,363	2.559	1.993	0.126	0.680	0.829
ASCAT-WQF	21,232	2.523	1.966	0.133	0.692	0.836
CCMP	21,109	2.531	1.975	0.597	0.664	0.841
ERA5	711,857	1.983	1.453	-0.255	0.803	0.899
Sentinel-1	2,410	2.930	2.282	-1.105	0.544	0.791

Log law extrapolation¶

The log law method is an alternative extrapolation method. Here, it is implemented as defined by Remmers et al., 2019:

$u = \dfrac{\mathrm{ln}\left(\dfrac{z}{z_0}\right)}{\mathrm{ln}\left(\dfrac{z_r}{z_0}\right)} \times u_r$

$u$: wind speed ($m s^{-1}$) at required height $z$ ($m$)
$z_r$: reference height (3m in the case of IWB data, 10m in the case of ERA5 data etc.)
$u_r$: wind speed at $z_r$ (provided wind speed)
$z_0$: surface roughness

class LogLawWindSpeedExtrapolator(WindSpeedExtrapolator):
    """Wind speed Extrapolator implementation, using the log law method as defined in Remmers et al., 2019"""
    z0: float = 0.0002 # surface roughness length

    def __init__(self, z0: float=0.0002):
        self.z0 = z0
    
    def _extrapolate_wind_speed(self, ds: xr.Dataset, variable_name: str, extrapolation_height: int, reference_height: int) -> xr.DataArray:
        """Wind speed extrapolation method implementation"""
        # Use the implementation defined earlier in this notebook
        return loglaw_extrapolate(z=extrapolation_height, wind_speed=ds[variable_name].sel(height=reference_height), zref=reference_height, z0=self.z0)

loglaw_extrapolator = LogLawWindSpeedExtrapolator()

MÉRA 125m wind speed comparison¶

10m wind speed is extrapolated using the log law method, and compared with the corresponding MÉRA 125m wind speed at the IWB buoy and wind farm AOI coordinates.

test_datasets = [AoiWindSpeedTestDataset(eooffshore_dataset=ascatall, extrapolator=loglaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=ascatwqf, extrapolator=loglaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=ccmp, extrapolator=loglaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=era5, extrapolator=loglaw_extrapolator)]

wind_speed_extrapolation_metrics.append(AoiWindSpeedMetrics(reference_dataset=mera,
                                                            test_datasets=test_datasets,
                                                            comparison_height=125,
                                                            method='Log Law'))

	$n$	RMSE	MAE	Bias	$R^2$	$r$
ASCAT-ALL	5,512	1.993	1.546	0.006	0.788	0.890
ASCAT-WQF	4,615	1.959	1.524	-0.007	0.796	0.895
CCMP	11,477	2.142	1.678	0.518	0.751	0.884
ERA5	384,064	1.772	1.363	-0.621	0.834	0.924

NEWA 150m wind speed comparison¶

10m wind speed is extrapolated using the log law method, and compared with the corresponding NEWA 150m wind speed at the IWB buoy and wind farm AOI coordinates.

test_datasets = [AoiWindSpeedTestDataset(eooffshore_dataset=ascatall, extrapolator=loglaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=ascatwqf, extrapolator=loglaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=ccmp, extrapolator=loglaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=era5, extrapolator=loglaw_extrapolator),
                 AoiWindSpeedTestDataset(eooffshore_dataset=sentinel1, extrapolator=loglaw_extrapolator),
                ]

wind_speed_extrapolation_metrics.append(AoiWindSpeedMetrics(reference_dataset=newa,
                                                            test_datasets=test_datasets,
                                                            comparison_height=150,
                                                            method='Log Law'))

	$n$	RMSE	MAE	Bias	$R^2$	$r$
ASCAT-ALL	25,368	2.565	2.000	0.194	0.679	0.829
ASCAT-WQF	21,238	2.529	1.973	0.202	0.691	0.836
CCMP	21,104	2.554	1.995	0.665	0.658	0.841
ERA5	712,940	2.324	1.782	-0.607	0.731	0.866
Sentinel-1	2,410	2.912	2.265	-1.043	0.550	0.791

Extrapolation metrics review¶

The following tables contain the extrapolation metrics obtained from the MÉRA and NEWA comparisons, where the “best” scores are highlighted.

def dataset_metrics_table(metrics: List[AoiWindSpeedMetrics]) -> pd.DataFrame:
    """Display comparison metrics, highlighting best scores"""
    dataset_metrics=[]
    for method_metrics in metrics:
        metricsdf = method_metrics.metricsdf.data.reset_index()
        mdf = metricsdf.copy(deep=True).rename(columns={'index': 'Dataset'})
        mdf.insert(loc=1, column='Method', value=method_metrics.method)
        dataset_metrics.append(mdf)
    
    dataset_metricsdf = pd.concat(dataset_metrics, ignore_index=True).sort_values(by='Dataset').set_index(['Dataset', 'Method'])
    dfstyle = {'$n$': '{:,.0f}'}
    dfstyle.update({col: '{:.3f}' for col in dataset_metricsdf.columns[1:]})
    
    highlight_props = 'color:white;background-color:darkblue'

    def highlight_max(s, props=highlight_props):
        return np.where(s == np.nanmax(s.values), props, '')

    def highlight_min(s, props=highlight_props):
        values = np.abs(s.values)
        return np.where(values == np.nanmin(values), props, '')

    idx = pd.IndexSlice
    styleddf = dataset_metricsdf.style

    for dataset in dataset_metricsdf.index.unique(level=0):
        min_slice_ = idx[idx[dataset,:], idx['RMSE':'Bias']]
        max_slice_ = idx[idx[dataset,:], idx['$R^2$':'$r$']]
        styleddf = styleddf.apply(highlight_min, axis=0, subset=min_slice_)\
                           .apply(highlight_max, axis=0, subset=max_slice_)
    return styleddf.format(dfstyle)


reference_wind_speed_metrics = defaultdict(list)
for m in wind_speed_extrapolation_metrics:
    reference_wind_speed_metrics[m.reference_name].append(m)

for reference_dataset_label, method_metrics in reference_wind_speed_metrics.items():
    display(Markdown('-----------------'))
    display(Markdown(f'**Wind Speed Extrapolation Methods: compared with {reference_dataset_label}**'))
    display(dataset_metrics_table(method_metrics))
    print('\n')

Wind Speed Extrapolation Methods: compared with MÉRA

		$n$	RMSE	MAE	Bias	$R^2$	$r$
Dataset	Method
ASCAT-ALL	Power Law	5,521	2.185	1.721	0.740	0.744	0.889
	Power Law (ERA5-based alpha)	5,509	1.990	1.542	-0.071	0.789	0.890
	Log Law	5,512	1.993	1.546	0.006	0.788	0.890
ASCAT-WQF	Power Law	4,627	2.152	1.699	0.730	0.753	0.894
	Power Law (ERA5-based alpha)	4,614	1.957	1.520	-0.086	0.797	0.895
	Log Law	4,615	1.959	1.524	-0.007	0.796	0.895
CCMP	Power Law	11,461	2.504	1.995	1.259	0.651	0.881
	Power Law (ERA5-based alpha)	11,482	2.117	1.655	0.438	0.758	0.884
	Log Law	11,477	2.142	1.678	0.518	0.751	0.884
ERA5	Power Law	385,710	1.668	1.299	0.058	0.853	0.924
	Power Law (ERA5-based alpha)	383,488	1.427	1.075	-0.309	0.891	0.947
	Log Law	384,064	1.772	1.363	-0.621	0.834	0.924

Wind Speed Extrapolation Methods: compared with NEWA

		$n$	RMSE	MAE	Bias	$R^2$	$r$
Dataset	Method
ASCAT-ALL	Power Law	25,408	2.814	2.215	1.013	0.610	0.828
	Power Law (ERA5-based alpha)	25,363	2.559	1.993	0.126	0.680	0.829
	Log Law	25,368	2.565	2.000	0.194	0.679	0.829
ASCAT-WQF	Power Law	21,274	2.785	2.198	1.027	0.622	0.835
	Power Law (ERA5-based alpha)	21,232	2.523	1.966	0.133	0.692	0.836
	Log Law	21,238	2.529	1.973	0.202	0.691	0.836
CCMP	Power Law	21,078	2.964	2.342	1.480	0.526	0.837
	Power Law (ERA5-based alpha)	21,109	2.531	1.975	0.597	0.664	0.841
	Log Law	21,104	2.554	1.995	0.665	0.658	0.841
ERA5	Power Law	716,184	2.277	1.748	0.144	0.741	0.866
	Power Law (ERA5-based alpha)	711,857	1.983	1.453	-0.255	0.803	0.899
	Log Law	712,940	2.324	1.782	-0.607	0.731	0.866
Sentinel-1	Power Law	2,424	2.813	2.167	-0.306	0.580	0.791
	Power Law (ERA5-based alpha)	2,410	2.930	2.282	-1.105	0.544	0.791
	Log Law	2,410	2.912	2.265	-1.043	0.550	0.791

Power Density Estimation¶

Offshore wind power density estimates may be generated for an AOI using the available wind speed values at the corresponding coordinates. When used in combination with wind speed extrapolation as described above, it is possible to estimate power at typical wind turbine hub heights. The following publications provide examples of power density estimation using wind speed:

Estimation implementation¶

Power density estimation at multiple heights is implemented as an Extrapolator, as used for wind speed extrapolation above. Here is the base class used for all implementations.

class PowerDensityEstimator(Extrapolator):
    """Base power density estimator class, implemented as an Extrapolator to reuse the creation of new variables with height dimension"""
    attrs: dict[str, str] = {
                            'long_name': 'Power density (estimated using wind speed)',
                            'units': 'Wm**-2'
                            }
    
    min_wind_speed: float = CUT_IN_WIND_SPEED
    max_wind_speed: float = CUT_OUT_WIND_SPEED
    wind_speed_variable_name: str = 'wind_speed'

    def __init__(self, air_density: float=1.225):
        self.air_density = air_density

    def extrapolate(self, ds: xr.Dataset, heights: list[int], variable_name: str = 'power_density'):
        """Estimate power density at required heights"""
        estimated_power_density = {height: self._estimate_power_density(wind_speed=self._filter_wind_speed(ds[self.wind_speed_variable_name].sel(height=height)))
                                           for height in heights}
        
        return self._add_extrapolated_variable(ds, variable_name, estimated_power_density, reindex_height=False, variable_attrs=self.attrs)

    def _filter_wind_speed(self, wind_speed: xr.DataArray):
        """Filters to turbine cut in/out range"""
        return wind_speed.where((wind_speed >= self.min_wind_speed) & (wind_speed <= self.max_wind_speed))
        
    @abstractmethod
    def _estimate_power_density(self, wind_speed: xr.DataArray) -> xr.DataArray:
        """Power density estimation implementation"""
        pass

Power Density Estimation using cubed wind speed¶

When a data set provides a complete time series of wind speed observations, power density may be estimated from the cube of the wind speed (de Montera et al., 2020):

$P = \dfrac{1}{2}\rho U^3$

$P$: wind power density ($W m^{-2}$)
$\rho$: air density ($kg$ $m^{-3}$), a default value of 1.225 is typically used over oceans
$U$: wind speed ($m s^{-1}$)

class CubedWindSpeedPowerDensityEstimator(PowerDensityEstimator):
    """Power density estimation using complete wind speed time series"""

    def _estimate_power_density(self, wind_speed: xr.DataArray) -> xr.DataArray:
        """Power density estimation implementation"""
        return 0.5 * self.air_density * (wind_speed**3)

AOI wind speed distributions and Weibull parameters¶

AOI wind speed is typically represented by a Weibull distribution of observations, characterized by a shape parameter ($k$, dimensionless) and a scale parameter ($\lambda$, $m s^{-1}$) (Ahsbahs et al., 2020). The Weibull probability density function (PDF) is as follows:

$\mathrm{PDF}(U) = \dfrac{k}{\lambda}\left(\dfrac{U}{k} \right)^{k-1} \epsilon^{-(U/\lambda)^k}$

A minimum number of observations or samples are required for Weibull parameter fitting, this is relevant for SAR data such as the Sentinel-1 data set. Consequently, wind speeds from all directional sectors have been used for fitting (Badger et al., 2016), where the same process is applied in this notebook.

Mean wind power density is related to the corresponding Weibull parameters as follows (Ahsbahs et al., 2020):

$P = \dfrac{1}{2}\rho \lambda^3 \Gamma \left(1 + \dfrac{3}{k}\right)$

$P$: wind power density ($W m^{-2}$)
$U$: wind speed ($m s^{-1}$)
$\rho$: air density ($kg$ $m^{-3}$), a default value of 1.225 is typically used over oceans
$\Gamma$: gamma function (scipy.special.gamma() is used in this notebook)
$k$: Weibull shape parameter (dimensionless)
$\lambda$: Weibull scale parameter ($m s^{-1}$)

Using the scipy.stats.weibull_min() implementation for Weibull parameter fitting is sufficient for individual AOI assessment. This approach does not scale well, preventing Weibull parameter fitting for all AOI grid coordinates in data sets with high spatial resolutions. However, Justus et al., 1976 (Methods for Estimating Wind Speed Frequency Distributions) proposed parameter estimation as follows, which has been used with Sentinel-1 wind speed data (de Montera et al., 2020)

$k = \left(\dfrac{\sigma}{\mu} \right)^{-1.086}$

$\lambda = \dfrac{\mu}{\Gamma \left(\frac{1}{k} + 1 \right)}$

$\mu$: mean wind speed ($m s^{-1}$)
$\sigma$: wind speed standard deviation ($m s^{-1}$)

This approach can be implemented using xarray’s lazy computation, where the corresponding Dask graph execution does not occur until compute() is called, and will scale to accomodate all data sets in the EOOffshore catalog.

class WeibullWindSpeedPowerDensityEstimator(PowerDensityEstimator):
    """Power density estimation using Weibull parameters"""

    def _filter_wind_speed(self, wind_speed: xr.DataArray):
        """Filters to turbine cut in/out range"""
        # No filtering is performed prior to Weibull parameter fitting
        return wind_speed

    def _estimate_power_density(self, wind_speed: xr.DataArray) -> xr.DataArray:
        """Power density estimation implementation"""
        shape, scale = self.fit_parameters(wind_speed)
        return 0.5 * self.air_density * (scale**3) * gamma(1+(3/shape))        

    def fit_parameters(self, wind_speed: xr.DataArray) -> Tuple[xr.DataArray, xr.DataArray]:
        """Fit Weibull parameters"""
        # As  used in de Montera et al., 2020
        mean_wind_speed = wind_speed.mean(dim='time')
        shape = (wind_speed.std(dim='time')/mean_wind_speed)**-1.086
        scale = mean_wind_speed / gamma((1/shape)+1)
        return shape, scale

Power Density Comparison¶

Select a reference data set (analagous to the role of IWB data in the previous section).
Select one or more test data sets.
Select AOI subsets of the reference and test data sets, using IWB buoy and wind farm coordinates, and the time dimension.
Calculate metrics for each reference-test data set comparison, and generate corresponding density scatter plots, for the specified variable/height parameters.

Comparison of power density estimates is similar to the previous approach used with wind speed comparison. The NEWA Zarr stores is used as the reference data set, as it is the sole data set that provides power density at multiple heights above the surface. As before, this is not strictly considered as a ground truth data set, and may be substituted by a future data set containing wind power measurements at multiple heights with high spatial coverage.

In the case of data sets where the Weibull parameters are used to estimate power density, a time series will not be available. Consequently, mean power estimate bias, in percentage, as defined by Dörenkämper et al. (2020) - The Making of the New European Wind Atlas – Part 2: Production and Evaluation, is the sole metric used for wind power comparisons:

$\overline{P}_{Bias} = 100 \times \dfrac{\overline{\hat{P}} - \overline{P}}{\overline{P}}$

$\overline{P}$: mean reference (true) power density, e.g. NEWA power density for AOI grid coordinates
$\overline{\hat{P}}$: mean test (predicted) power density, e.g. corresponding power density estimated from ERA5 wind speed for AOI grid coordinates

Mean power density may be estimated using either the cubed wind speed or Weibull methods. These classes implement reference and test data set power comparison, enabling AOI subset matching, metrics calculation, and box plot generation.

@dataclass
class AoiPowerDensityTestDataset(AoiWindSpeedTestDataset):
    estimator: PowerDensityEstimator = None

@dataclass
class AoiPowerDensityMetrics(AoiWindSpeedMetrics):
    test_datasets: List[AoiPowerDensityTestDataset] = field(repr=False, hash=False, compare=False)
    variable: str = 'power_density' 
    variable_min: float = 16 # min power based on CUT_IN_WIND_SPEED
    variable_max: float = 9640 # max power based on CUT_OUT_WIND_SPEED
    discard_matches: bool = False
    pbias_label: str = '$\overline{P}_{Bias}$'

    def _init_variables(self):
        """Power density estimation using extrapolated wind speed"""
        super()._init_variables()

        for test_dataset in self.test_datasets:
            test_dataset.eooffshore_dataset.dataset = test_dataset.estimator.extrapolate(ds=test_dataset.eooffshore_dataset.dataset, 
                                                                                         heights=[self.comparison_height])

    def _match_variables(self, dataset_comparison, name):
        """Match datasets, using time if available"""
        # Mean power density is used for comparison, as certain test datasets won't have time dimension
        # for power density variable (e.g. those that are Weibull-based)
        # The wind speed variable is used for time matching as it will always have a time dimension
        # The matched times are used to select the corresponding power density as required        
        reference_windspeed_variable, test_windspeed_variable = super()._match_variables(dataset_comparison, 'wind_speed')
        
        def match_power_density(ds, windspeed_variable):
            power_density = ds.sel(height=self.comparison_height)[self.variable]
            if 'time' in power_density.dims:
                power_density = power_density.sel(time=windspeed_variable.time).mean(dim='time')
            return power_density
            
        reference_variable = match_power_density(dataset_comparison.referenceds, reference_windspeed_variable)
        test_variable = match_power_density(dataset_comparison.testds, test_windspeed_variable)
            
        return reference_variable, test_variable


    def calculate_percentage_bias(self, y_true: np.array, y_pred: np.array) -> np.array :
        """Power comparison metric"""
        return 100 * (y_pred - y_true)/y_true

    def _calculate_metrics(self, y_true: np.array, y_pred: np.array):
        """Calculate comparison metrics using true (reference) and predicted (test) values"""
        metrics = {'$n$': y_true.shape[0],
                   'Bias': np.mean(y_pred - y_true),
                   self.pbias_label: np.mean(self.calculate_percentage_bias(y_true, y_pred))
                  }
        return pd.Series(metrics, name=None)

    def _comparison_plot(self):
        """Create a box plot containing data set percentage bias distributions"""
        # Single plot illustrating percentage bias distribution per test dataset
        self.matchesdf[self.pbias_label] = self.calculate_percentage_bias(self.matchesdf.NEWA, self.matchesdf.Test)
        ax = sns.boxplot(data=self.matchesdf, x='Dataset', y=self.pbias_label)
        ax.set_title(f'{self.reference_name} {self.comparison_height}m {self.variable.replace("_", " ")} (IWB buoy, wind farm AOIs)')
        ax.set_ylim((-30,30));

NEWA 150m power density¶

10m wind speed is extrapolated and used to estimate power density, which is compared with the corresponding NEWA 150m power density at the IWB buoy and wind farm AOI coordinates. Each data set is extrapolated as follows, based on the previous metrics tables obtained from MÉRA and NEWA wind speed comparisons:

ASCAT-NRT/REP: Power Law (ERA5-based alpha)
CCMP: Power Law (ERA5-based alpha)
ERA5: Power Law (ERA5-based alpha)
Sentinel-1: Power Law with default $\alpha$ = 0.11

As NEWA also provides an air density variable (RHO), its mean value will be used for estimation instead of the default value of 1.225 that is typically used.

newa_rhodf = pd.DataFrame({'RHO': [dataset.RHO.mean().compute().item() for dataset in {ds.dataset_id: ds.dataset for ds in newa.datasets.values()}.values()]})
newa_rhodf.describe().T

	count	mean	std	min	25%	50%	75%	max
RHO	4.0	1.233518	0.000991	1.2326	1.232793	1.233351	1.234075	1.234769

cubed_windspeed_estimator = CubedWindSpeedPowerDensityEstimator(air_density=newa_rhodf.RHO.mean())
weibull_power_estimator = WeibullWindSpeedPowerDensityEstimator(air_density=newa_rhodf.RHO.mean())

test_datasets = [AoiPowerDensityTestDataset(eooffshore_dataset=ascatall, 
                                            extrapolator=mean_era5_alpha_powerlaw_extrapolator, 
                                            estimator=cubed_windspeed_estimator),
                 AoiPowerDensityTestDataset(eooffshore_dataset=ascatwqf, 
                                            extrapolator=mean_era5_alpha_powerlaw_extrapolator, 
                                            estimator=weibull_power_estimator),
                 AoiPowerDensityTestDataset(eooffshore_dataset=ccmp, 
                                            extrapolator=mean_era5_alpha_powerlaw_extrapolator, 
                                            estimator=cubed_windspeed_estimator),
                 AoiPowerDensityTestDataset(eooffshore_dataset=era5, 
                                            extrapolator=era5_alpha_powerlaw_extrapolator, 
                                            estimator=cubed_windspeed_estimator),
                 AoiPowerDensityTestDataset(eooffshore_dataset=sentinel1, 
                                            extrapolator=powerlaw_extrapolator, 
                                            estimator=weibull_power_estimator)
                ]

AoiPowerDensityMetrics(reference_dataset=newa,
                       test_datasets=test_datasets,
                       comparison_height=150,
                       method='Cubed Wind Speed/Weibull');

	$n$	Bias	$\overline{P}_{Bias}$
ASCAT-ALL	9	-36.525	-3.026
ASCAT-WQF	6	-29.087	-2.570
CCMP	9	131.221	12.967
ERA5	9	-84.781	-8.305
Sentinel-1	9	85.333	6.872

Offshore Wind in Irish Areas Of Interest

Wind Atlas Development

Comparison of Offshore Wind Speed Extrapolation and Power Density Estimation

Contents

Comparison of Offshore Wind Speed Extrapolation and Power Density Estimation¶

Introduction¶

Load EOOffshore Catalog Data¶

Open the catalog¶

Load the catalog Zarr stores¶

Load the ASCAT REProcessed store¶

Load the CCMP store¶

Load the ERA5 store¶

Load the MÉRA store¶

Load the NEWA stores¶

Load the Sentinel-1 store¶

Load AOI catalog data¶

Load the Irish Weather Buoy Network data¶

Load the synthentic wind farm shapefile¶

Comparison of In situ Buoy and Data Set Wind Speed¶

Comparison metrics¶

Wind speed summary¶

IWB wind speed comparison with data sets¶

Data Sets Wind Speed Comparison¶

10m wind speed¶

MÉRA 10m wind speed¶

NEWA 10m wind speed¶

Wind Speed Extrapolation¶

Power law extrapolation¶

MÉRA 125m wind speed comparison¶

NEWA 150m wind speed comparison¶

Power law \(\alpha\) estimation¶

MÉRA 125m wind speed comparison using ERA5 power law \(\alpha\)¶

NEWA 150m wind speed comparison using ERA5 power law \(\alpha\)¶

Log law extrapolation¶

MÉRA 125m wind speed comparison¶

NEWA 150m wind speed comparison¶

Extrapolation metrics review¶

Power Density Estimation¶

Estimation implementation¶

Power Density Estimation using cubed wind speed¶

AOI wind speed distributions and Weibull parameters¶

Power Density Comparison¶

NEWA 150m power density¶