API

object_store

class resampling.object_store.ObjectStore(endpoint_url: str, aws_access_key_id: str, aws_secret_access_key: str, aws_session_token: str, bucket: str)

Bases: object

Manages interactions with an S3-compatible object storage system and Zarr datasets.

This class allows you to configure S3 credentials, interact with Zarr datasets stored in an S3 bucket, and perform various operations such as extracting datasets and checking for the existence of Zarr stores.

Parameters:
  • endpoint_url (str) – The URL of the S3-compatible object storage endpoint.

  • aws_access_key_id (str) – The AWS access key for authentication.

  • aws_secret_access_key (str) – The AWS secret key for authentication.

  • aws_session_token (str) – AWS session token for temporary credentials.

  • bucket (str) – The name of the S3 bucket where Zarr datasets are stored.

check_zarr_exists(zarr_store_path: str) bool

Checks if a Zarr store exists in the specified S3 path.

Parameters:

zarr_store_path (str) – The relative path to the Zarr store within the S3 bucket.

Returns:

True if the Zarr store exists, False otherwise.

Return type:

bool

Raises:

Exception – If an error occurs while accessing the S3 path.

create_empty_zarr(zarr_name: str, coordinate_ranges: Dict[str, List[int | List[int]]], variables: List[str]) Dataset

Creates an empty Zarr store with the specified coordinate ranges and variables on S3.

Parameters:
  • zarr_name (str) – The name of the Zarr store to create.

  • coordinate_ranges (Dict[str, List[Union[int, List[int]]]]) – A dictionary of coordinate ranges for each dimension in the Dataset.

  • variables (List[str]) – A list of variable names to include in the Dataset.

Returns:

The created xarray Dataset.

Return type:

xarray.Dataset

delete_zarr(zarr_store_path: str) None

Deletes a Zarr store from the specified S3 path.

Parameters:

zarr_store_path (str) – The path to the Zarr store to be deleted within the S3 bucket.

Returns:

None

Return type:

None

Raises:
  • FileNotFoundError – If the Zarr store does not exist.

  • Exception – If an error occurs while attempting to delete the Zarr store.

extract_zarr(name: str, var: str | None = None, lon_range: Tuple[float, float] | None = None, lat_range: Tuple[float, float] | None = None) Dataset

Extracts a Zarr dataset from the specified S3 bucket. Optionally, subsets the dataset based on variable, longitude, and latitude ranges.

Parameters:
  • name (str) – The name of the Zarr dataset within the bucket.

  • var (Optional[str]) – The variable to extract from the dataset. If None, the full dataset is returned.

  • lon_range (Optional[Tuple[float, float]]) – The longitude range to subset the dataset (min, max). If None, no subsetting is performed.

  • lat_range (Optional[Tuple[float, float]]) – The latitude range to subset the dataset (min, max). If None, no subsetting is performed.

Returns:

The extracted and optionally subsetted xarray dataset.

Return type:

xarray.Dataset

Raises:

ValueError – If the specified variable is not found in the dataset.

write_zarr(dataset: DataTree | Dataset, name: str | None = None, mode: str | None = None) None

Writes a Dataset or DataTree to a Zarr store on S3.

Parameters:
  • dataset (datatree.DataTree | xarray.Dataset) – The xarray Dataset or datatree DataTree to be written to Zarr format.

  • name (Optional[str]) – The name of the Zarr store. If None, a default name with the current timestamp is used.

  • mode (Optional[str]) – The mode to open the Zarr store. Default is ‘w’ for write. Other options include ‘a’ for append and ‘r+’ for read and write.

Returns:

None

Return type:

None

write_zarr_batch(zarr_store_path: str, variable_name: str, batch_values: ndarray, indexes: list) None

Writes a batch of values to a specific variable in a Zarr store on S3.

Parameters:
  • zarr_store_path (str) – The path to the Zarr store within the S3 bucket.

  • variable_name (str) – The name of the variable to which the batch values will be written.

  • batch_values (np.ndarray) – A NumPy array of values to be written to the Zarr store.

  • indexes (list) – A list of dictionaries representing the indices for each dimension of the variable.

Returns:

None

Return type:

None

Raises:
  • IndexError – If the provided indices are out of bounds for the specified variable.

  • ValueError – If the batch of values is empty or contains NaN values.

my_store

resampling.my_store.get_my_store(config_file=None)

A function that initiates an ObjectStore instance based on credentials stored in the config file. If no config file is given, it defaults to ‘config/config.toml’.

Parameters:

config_file – Optional; Path to the configuration file. If not provided, it defaults to ‘config/config.toml’.

Returns:

ObjectStore instance based on credentials saved in the configuration file.

Return type:

ObjectStore

define_windows

resampling.define_windows.define_windows(resampler: List[Dict[str, Any]], ds: Dataset) Tuple[List[Dict[str, int | float | List[int | float]]], List[Dict[str, int]], Dict[str, List[int | List[int | float]]]]

Defines the windows (intervals) for each dimension based on the provided resampler configuration, and includes any dimensions present in the dataset but not specified in the resampler.

Parameters:
  • resampler (List[Dict[str, Any]]) – A list of dictionaries specifying the resampling parameters for each dimension. Each dictionary must include: * dimension (str): The name of the dimension to resample. * step (float): The step size for the resampling. * range (Tuple[float, float]): The range of values for the dimension as (start, end). * invert (bool, optional): Whether to invert the dimension coordinates. Defaults to False.

  • ds (xarray.Dataset) – The xarray dataset to compare the resampler with.

Returns:

A tuple containing; A list of dictionaries where each dictionary represents a combination of intervals for each dimension. A list of dictionaries where each dictionary represents a combination of indices for each dimension. A dictionary where keys are dimension names and values are lists of intervals for each dimension.

Return type:

Tuple[ List[Dict[str, Union[int, float, List[Union[int, float]]]]], List[Dict[str, int]], Dict[str, List[Union[int, List[Union[int, float]]]]] ]

down_scale

resampling.down_scale.down_scale_in_batches(ds: Dataset, my_store: ObjectStore, dest_zarr: str, resampler: List[Dict[str, str | float | Tuple[float, float] | bool]], variables: List[str], batch_size: int, workers: int, logs: bool | None = True, over_write: bool | None = True, start_batch: int | None = None, end_batch: int | None = None) None

Downscale the dataset in batches and store the results in a Zarr format.

This function processes a large dataset by splitting it into smaller windows, downscaling each window, and then storing the downscaled data in a Zarr store in batches. It utilizes threading to process the windows concurrently.

Parameters:
  • ds (xr.Dataset) – The input xarray dataset to be downscaled.

  • my_store (ObjectStore) – An instance of ObjectStore which handles interactions with Zarr stores, such as checking existence, deleting, creating, and writing to the Zarr store.

  • dest_zarr (str) – The path or identifier of the destination Zarr store where the downscaled data will be saved.

  • resampler (List[Dict[str, Union[str, float, Tuple[float, float], bool]]]) – A list of dictionaries specifying the resampling parameters for each dimension. Each dictionary must include: * dimension (str): The name of the dimension to resample. * step (float): The step size for the resampling. * range (Tuple[float, float]): The range of values for the dimension as (start, end). * invert (bool, optional): Whether to invert the dimension coordinates. Defaults to False.

  • variables (List[str]) – A list of variable names to be processed and downscaled.

  • batch_size (int) – The number of windows to process in a single batch.

  • workers (int) – The number of worker threads to use for parallel processing.

  • logs (Optional[bool]) – Whether to log progress messages. Defaults to True.

Returns:

None

Return type:

None

This function performs the following steps:

  1. It starts resource monitoring and sets up logging.

  2. It calculates the necessary windows and indices for processing the dataset.

  3. It checks if the target Zarr store exists. If it does, the store is deleted and recreated.

  4. It iteratively processes each variable by: * Splitting the dataset into windows of data. * Downscaling the data within each window using multithreading. * Writing the downscaled data to the Zarr store in batches.

  5. Logs the progress and completion of each batch and variable.

resampling.down_scale.down_scale_on_the_fly(ds: Dataset, resampler: List[Dict[str, str | float | Tuple[float, float] | bool]]) Dataset

Downscale an xarray.Dataset by resampling its dimensions based on specified parameters.

This function performs downscaling on-the-fly by: * Creating new coordinates for each dimension to match the specified step sizes. * Slicing and interpolating the dataset based on these new coordinates.

Parameters:
  • ds (xarray.Dataset) – The xarray.Dataset to downscale.

  • resampler (List[Dict[str, Union[str, float, Tuple[float, float], bool]]]) – A list of dictionaries specifying the resampling parameters for each dimension. Each dictionary must include: * dimension (str): The name of the dimension to resample. * step (float): The step size for the resampling. * range (Tuple[float, float]): The range of values for the dimension as (start, end). * invert (bool, optional): Whether to invert the dimension coordinates. Defaults to False.

Returns:

A downscaled xarray.Dataset with interpolated values on new coordinates.

Return type:

xarray.Dataset

Raises:

ValueError – If generated coordinates for a dimension are empty due to invalid range or step values.

plot_logs

resampling.plot_logs.plot_logs(resource_log: str | None = 'log_resources.log', event_log: str | None = 'log_events.log', show: bool | None = False) None

Plots resource usage and event data from log files.

This function reads resource and event logs from specified files and generates a plot that shows memory usage and active threads over time, as well as event markers for variable (VAR) events.

Parameters:
  • resource_log (Optional[str]) – Path to the resource log file. Default is ‘log_resources.log’.

  • event_log (Optional[str]) – Path to the event log file. Default is ‘log_events.log’.

  • show (Optional[bool]) – If True, display the plot. If False, save the plot to a file. Default is False.

Returns:

None

Return type:

None

plot_zarr

resampling.plot_zarr.plot_dataset(ds: Dataset, var: str, name: str) None

Plots a variable from an xarray Dataset as an image.

This function extracts the specified variable from the xarray Dataset, visualizes it using imshow, and saves the plot as a PNG file.

Parameters:
  • ds (xr.Dataset) – The xarray Dataset containing the variable to plot.

  • var (str) – The name of the variable to plot from the Dataset.

  • name (str) – The name to use for the output PNG file (excluding extension).

Returns:

None

Return type:

None

Raises:
  • KeyError – If the variable var is not found in the Dataset.

  • ValueError – If the Dataset does not contain the variable data.

transform

resampling.transform.combine_datasets(datasets)

Combine multiple xarray.Dataset objects into a single dataset.

Parameters:

datasets – (list of xarray.Dataset): List of datasets to combine.

Return combined_ds (xarray.Dataset):

Combined dataset.

resampling.transform.expand_to_global_coverage(ds: Dataset, step_lon: float | int, step_lat: float | int) Dataset

Expands a dataset to cover the global latitude and longitude range of -90 to 90 degrees latitude and -180 to 180 degrees longitude. The resolution of the expanded dataset is defined by the step sizes provided. Areas where the original dataset did not have data are filled with NaN values.

Parameters:
  • ds (xr.Dataset) – The original xarray.Dataset, which should have coordinates ‘longitude’ and ‘latitude’.

  • step_lon (Union[float, int]) – The resolution of the new dataset in the longitude dimension.

  • step_lat (Union[float, int]) – The resolution of the new dataset in the latitude dimension.

Returns:

A xarray.Dataset that covers the global latitude and longitude range with the specified resolution.

Return type:

xr.Dataset

resampling.transform.make_pyramid(ds, pixels_per_tile, version, levels) DataTree

Transform xarray dataset into datatree pyramid ready to be used in carbonplan smart viewer.

Parameters:
  • ds – Xarray.Dataset

  • pixels_per_tile

  • version – will be stored as output dataset parameter

  • levels – int, number of zoomlevels in the pyramid.

Returns:

xarray datatree.