API
object_store
- class resampling.object_store.ObjectStore(endpoint_url: str, aws_access_key_id: str, aws_secret_access_key: str, aws_session_token: str, bucket: str)
Bases:
objectManages interactions with an S3-compatible object storage system and Zarr datasets.
This class allows you to configure S3 credentials, interact with Zarr datasets stored in an S3 bucket, and perform various operations such as extracting datasets and checking for the existence of Zarr stores.
- Parameters:
endpoint_url (str) – The URL of the S3-compatible object storage endpoint.
aws_access_key_id (str) – The AWS access key for authentication.
aws_secret_access_key (str) – The AWS secret key for authentication.
aws_session_token (str) – AWS session token for temporary credentials.
bucket (str) – The name of the S3 bucket where Zarr datasets are stored.
- check_zarr_exists(zarr_store_path: str) bool
Checks if a Zarr store exists in the specified S3 path.
- create_empty_zarr(zarr_name: str, coordinate_ranges: Dict[str, List[int | List[int]]], variables: List[str]) Dataset
Creates an empty Zarr store with the specified coordinate ranges and variables on S3.
- Parameters:
- Returns:
The created xarray Dataset.
- Return type:
xarray.Dataset
- delete_zarr(zarr_store_path: str) None
Deletes a Zarr store from the specified S3 path.
- Parameters:
zarr_store_path (str) – The path to the Zarr store to be deleted within the S3 bucket.
- Returns:
None
- Return type:
None
- Raises:
FileNotFoundError – If the Zarr store does not exist.
Exception – If an error occurs while attempting to delete the Zarr store.
- extract_zarr(name: str, var: str | None = None, lon_range: Tuple[float, float] | None = None, lat_range: Tuple[float, float] | None = None) Dataset
Extracts a Zarr dataset from the specified S3 bucket. Optionally, subsets the dataset based on variable, longitude, and latitude ranges.
- Parameters:
name (str) – The name of the Zarr dataset within the bucket.
var (Optional[str]) – The variable to extract from the dataset. If None, the full dataset is returned.
lon_range (Optional[Tuple[float, float]]) – The longitude range to subset the dataset (min, max). If None, no subsetting is performed.
lat_range (Optional[Tuple[float, float]]) – The latitude range to subset the dataset (min, max). If None, no subsetting is performed.
- Returns:
The extracted and optionally subsetted xarray dataset.
- Return type:
xarray.Dataset
- Raises:
ValueError – If the specified variable is not found in the dataset.
- write_zarr(dataset: DataTree | Dataset, name: str | None = None, mode: str | None = None) None
Writes a Dataset or DataTree to a Zarr store on S3.
- Parameters:
dataset (datatree.DataTree | xarray.Dataset) – The xarray Dataset or datatree DataTree to be written to Zarr format.
name (Optional[str]) – The name of the Zarr store. If None, a default name with the current timestamp is used.
mode (Optional[str]) – The mode to open the Zarr store. Default is ‘w’ for write. Other options include ‘a’ for append and ‘r+’ for read and write.
- Returns:
None
- Return type:
None
- write_zarr_batch(zarr_store_path: str, variable_name: str, batch_values: ndarray, indexes: list) None
Writes a batch of values to a specific variable in a Zarr store on S3.
- Parameters:
zarr_store_path (str) – The path to the Zarr store within the S3 bucket.
variable_name (str) – The name of the variable to which the batch values will be written.
batch_values (np.ndarray) – A NumPy array of values to be written to the Zarr store.
indexes (list) – A list of dictionaries representing the indices for each dimension of the variable.
- Returns:
None
- Return type:
None
- Raises:
IndexError – If the provided indices are out of bounds for the specified variable.
ValueError – If the batch of values is empty or contains NaN values.
down_scale
- resampling.down_scale.down_scale_in_batches(ds: Dataset, my_store: ObjectStore, dest_zarr: str, resampler: List[Dict[str, str | float | Tuple[float, float] | bool]], variables: List[str], batch_size: int, workers: int, logs: bool | None = True, over_write: bool | None = True, start_batch: int | None = None, end_batch: int | None = None) None
Downscale the dataset in batches and store the results in a Zarr format.
This function processes a large dataset by splitting it into smaller windows, downscaling each window, and then storing the downscaled data in a Zarr store in batches. It utilizes threading to process the windows concurrently.
- Parameters:
ds (xr.Dataset) – The input xarray dataset to be downscaled.
my_store (ObjectStore) – An instance of ObjectStore which handles interactions with Zarr stores, such as checking existence, deleting, creating, and writing to the Zarr store.
dest_zarr (str) – The path or identifier of the destination Zarr store where the downscaled data will be saved.
resampler (List[Dict[str, Union[str, float, Tuple[float, float], bool]]]) – A list of dictionaries specifying the resampling parameters for each dimension. Each dictionary must include: * dimension (str): The name of the dimension to resample. * step (float): The step size for the resampling. * range (Tuple[float, float]): The range of values for the dimension as (start, end). * invert (bool, optional): Whether to invert the dimension coordinates. Defaults to False.
variables (List[str]) – A list of variable names to be processed and downscaled.
batch_size (int) – The number of windows to process in a single batch.
workers (int) – The number of worker threads to use for parallel processing.
logs (Optional[bool]) – Whether to log progress messages. Defaults to True.
- Returns:
None
- Return type:
None
This function performs the following steps:
It starts resource monitoring and sets up logging.
It calculates the necessary windows and indices for processing the dataset.
It checks if the target Zarr store exists. If it does, the store is deleted and recreated.
It iteratively processes each variable by: * Splitting the dataset into windows of data. * Downscaling the data within each window using multithreading. * Writing the downscaled data to the Zarr store in batches.
Logs the progress and completion of each batch and variable.
- resampling.down_scale.down_scale_on_the_fly(ds: Dataset, resampler: List[Dict[str, str | float | Tuple[float, float] | bool]]) Dataset
Downscale an xarray.Dataset by resampling its dimensions based on specified parameters.
This function performs downscaling on-the-fly by: * Creating new coordinates for each dimension to match the specified step sizes. * Slicing and interpolating the dataset based on these new coordinates.
- Parameters:
ds (xarray.Dataset) – The xarray.Dataset to downscale.
resampler (List[Dict[str, Union[str, float, Tuple[float, float], bool]]]) – A list of dictionaries specifying the resampling parameters for each dimension. Each dictionary must include: * dimension (str): The name of the dimension to resample. * step (float): The step size for the resampling. * range (Tuple[float, float]): The range of values for the dimension as (start, end). * invert (bool, optional): Whether to invert the dimension coordinates. Defaults to False.
- Returns:
A downscaled xarray.Dataset with interpolated values on new coordinates.
- Return type:
xarray.Dataset
- Raises:
ValueError – If generated coordinates for a dimension are empty due to invalid range or step values.
plot_logs
- resampling.plot_logs.plot_logs(resource_log: str | None = 'log_resources.log', event_log: str | None = 'log_events.log', show: bool | None = False) None
Plots resource usage and event data from log files.
This function reads resource and event logs from specified files and generates a plot that shows memory usage and active threads over time, as well as event markers for variable (VAR) events.
- Parameters:
- Returns:
None
- Return type:
None
plot_zarr
- resampling.plot_zarr.plot_dataset(ds: Dataset, var: str, name: str) None
Plots a variable from an xarray Dataset as an image.
This function extracts the specified variable from the xarray Dataset, visualizes it using imshow, and saves the plot as a PNG file.
- Parameters:
- Returns:
None
- Return type:
None
- Raises:
KeyError – If the variable var is not found in the Dataset.
ValueError – If the Dataset does not contain the variable data.
make_global
- resampling.make_global.expand_to_global_coverage(ds, step_lon, step_lat)
Expands a dataset to global latitude and longitude coverage, aligning the original coordinates with the global grid.
- Parameters:
ds – Input xarray.Dataset with latitude, longitude, and data variables.
step_lon – Longitude resolution for the global dataset.
step_lat – Latitude resolution for the global dataset.
- Returns:
Expanded xarray.Dataset with global coverage.