ochanticipy.utils package

Submodules

ochanticipy.utils.check_extra_imports module

Check imports that are in extras_require.

ochanticipy.utils.check_extra_imports.check_extra_imports(libraries: list, subpackage: str)[source]

Check that libraries are installed and available.

Parameters:
  • libraries (str) – List of libraries to check.

  • subpackage (str) – String of subpackage defined for extra_requires that import should warn to install from.

ochanticipy.utils.check_file_existence module

Function for checking file existence.

ochanticipy.utils.check_file_existence.check_file_existence(wrapper=None, enabled=None, adapter=None, proxy=<class 'FunctionWrapper'>) F[source]

Don’t overwrite existing data.

Avoid recreating data if it already exists and if clobber not toggled by user. Used to wrap functions that accept filepath as a keyword argument.

Parameters:
  • wrapped (function) – The function to wrap. The function must have “filepath” as a keyword parameter, and it can also have an optional “clobber” boolean keyword parameter.

  • instance (Optional[DataSource]) – Object the wrapped function is bound to. Not used within, but ensures that instance methods do not pass self to args.

  • args (list) – List of positional arguments.

  • kwargs (dict) – Dictionary of keyword arguments

Returns:

  • If filepath exists and clobber is False, returns filepath.

  • Otherwise, returns the result of the decorated function.

Raises:

KeyError – If filepath or clobber are not passed as kwargs.

ochanticipy.utils.dates module

Functions for dealing with dates.

ochanticipy.utils.dates.compare_dekads_gt(dekad1: Tuple[int, int], dekad2: Tuple[int, int]) bool[source]

Is year1/dekad1 greater than year2/dekad2.

Compare two pairs of years and dekads, that the first pair are greater than the second pair.

ochanticipy.utils.dates.compare_dekads_gte(dekad1: Tuple[int, int], dekad2: Tuple[int, int]) bool[source]

Is year1/dekad1 greater than or equal to year2/dekad2.

Compare two pairs of years and dekads, that the first pair are greater than or equal to the second pair.

ochanticipy.utils.dates.compare_dekads_lt(dekad1: Tuple[int, int], dekad2: Tuple[int, int]) bool[source]

Is year1/dekad1 less than year2/dekad2.

Compare two pairs of years and dekads, that the first pair are less than the second pair.

ochanticipy.utils.dates.compare_dekads_lte(dekad1: Tuple[int, int], dekad2: Tuple[int, int]) bool[source]

Is year1/dekad1 less than or equal to year2/dekad2.

Compare two pairs of years and dekads, that the first pair are less than or equal to the second pair.

ochanticipy.utils.dates.date_to_dekad(date_obj: date) Tuple[int, int][source]

Compute dekad and year from date.

Dekad computed from date. This is based on the common dekadal definition of the 1st and 2nd dekad of a month being the first 10 day periods, and the 3rd dekad being the remaining days within that month.

ochanticipy.utils.dates.dekad_to_date(dekad: Tuple[int, int]) date[source]

Compute date from dekad and year.

Date computed from dekad and year in datetime object, corresponding to first day of the dekad. This is based on the common dekadal definition of the 1st and 2nd dekad of a month being the first 10 day periods, and the 3rd dekad being the remaining days within that month.

ochanticipy.utils.dates.expand_dekads(dekad1: Tuple[int, int], dekad2: Tuple[int, int]) List[Tuple[int, int]][source]

Expand for all years/dekads between two dates.

Takes input year and dekads and returns a list of year/dekad lists.

ochanticipy.utils.dates.get_date_from_user_input(input_date: date | str) date[source]

Return date from string or date input.

Processes input data in either datetime.date format or as an ISO8601 string. Generates error message if different object provided.

Parameters:

input_date (Union[date, str]) – datetime.date object or ISO8601 string.

Returns:

datetime.date

Return type:

date

ochanticipy.utils.dates.get_dekadal_date(input_date: date | str | Tuple[int, int] | None, default_date: date | str | Tuple[int, int] | None = None) Tuple[int, int][source]

Calculate dekadal date from general input.

Processes input input_date and returns two values, the year and dekad. Input can be of format datetime.date, an ISO8601 date string, an already calculated (year, dekad) format date, or None. If None, default_date is returned. default_date can also be passed in the above formats.

ochanticipy.utils.geoboundingbox module

Functionality to retrieve and modify boundary coordinates.

It is possible to create an GeoBoundingBox object either from lat_max, lat_min, lon_max, lon_min coordinates, or from a shapefile that has been read in with geopandas.

class ochanticipy.utils.geoboundingbox.GeoBoundingBox(lat_max: float, lat_min: float, lon_max: float, lon_min: float)[source]

Bases: object

Create an object containing the bounds of an area.

Standard geographic coordinate system is used where latitude runs from -90 to 90 degrees, and latitude from -180 to 180. North must always be greater than south, and east greater than west.

Parameters:
  • lat_max (float) – The northern latitude boundary of the area (degrees). The value must be between -90 and 90, and greater than or equal to the southern boundary.

  • lat_min (float) – The southern latitude boundary of the area (degrees). The value must be between -90 and 90, and less than or equal to the northern boundary.

  • lon_max (float) – The easternmost longitude boundary of the area (degrees). The value must be between -180 and 180, and greater than or equal to the western boundary.

  • lon_min (float) – The westernmost longitude boundary of the area (degrees). The value must be between -180 and 180, and less than or equal to the eastern boundary.

classmethod from_shape(shape: GeoSeries | GeoDataFrame) GeoBoundingBox[source]

Create GeoBoundingBox from a geopandas object.

Parameters:

shape (geopandas.GeoSeries, geopandas.GeoDataFrame) – A shape whose bounds will be retrieved

Return type:

GeoBoundingBox from the total bounds of the GeoDataFrame

Examples

>>> import geopandas as gpd
>>> df_admin_boundaries = gpd.read_file("admin0_boundaries.gpkg")
>>> geobb = GeoBoundingBox.from_shape(df_admin_boundaries)
get_filename_repr(precision: int = 0) str[source]

Get succinct boundary representation for usage in filenames.

Parameters:

precision (int, default = 0) – Precision, i.e. number of decimal places to round to. Default is 0 for ints.

Return type:

String containing N, S, E and W coordinates.

property lat_max: float

Get the northern latitude boundary of the area (degrees).

property lat_min: float

Get the southern latitude boundary of the area (degrees).

property lon_max: float

Get the eastern longitude boundary of the area (degrees).

property lon_min: float

Get the western longitude boundary of the area (degrees).

round_coords(offset_val: float = 0.0, round_val: int | float = 1) GeoBoundingBox[source]

Round the bounding box coordinates.

Rounding is always done outside the original bounding box, i.e. the resulting bounding box is always equal or larger than the original bounding box. Rounding can only be done once per instance.

Parameters:
  • offset_val (float, default = 0.0) – Offset the coordinates by this factor.

  • round_val (int or float, default = 1) – Rounds to the nearest round_val. Can be an int for integer rounding or float for decimal rounding. If 1, round to integers.

Return type:

GeoBoundingBox instance with rounded and offset coordinates

ochanticipy.utils.hdx_api module

Use HDX python API to download data.

ochanticipy.utils.hdx_api.load_resource_from_hdx(hdx_dataset: str, hdx_resource_name: str, output_filepath: Path) Path[source]

Use the HDX API to download a dataset based on the address and dataset ID.

Parameters:
  • hdx_dataset (str) – The name of the HDX dataset where the resource is located. Can be found by taking the portion of the url after data.humdata.org/dataset/

  • hdx_resource_name (str) – Resources name on HDX. Can be found by taking the filename as it appears on the dataset page.

  • output_filepath (Path) – Target filepath for the dataset

Return type:

The full path of the downloaded dataset

ochanticipy.utils.io module

Function for I/O.

ochanticipy.utils.io.download_url(url: str, save_path: Path, chunk_size: int = 2048)[source]

Download the file located at url to save_path.

Parameters:
  • url (str) – url that contains the file to be downloaded

  • save_path (Path) – path to the location the file should be saved

  • chunk_size (int) – number of bytes to save at once

ochanticipy.utils.io.parse_yaml(filename: str | Path) dict[source]

Read in a yaml file.

Parameters:
  • filename (str, Path) –

  • file (The full filepath of the YAML) –

Return type:

A dictionary with the YAML file contents

ochanticipy.utils.io.unzip(zip_file_path: Path, save_dir: Path)[source]

Unzip a file.

Parameters:
  • zip_file_path (Path) – path to the location the zip file is saved

  • save_dir (Path) – dir path to which the content of the zip file should be saved

ochanticipy.utils.raster module

Utilities to manipulate and analyze raster data.

The raster module provides accessor utilities for xarray data arrays and datasets accessible using the oap accessor. These functions are available just by importing directly the library using import ochanticipy.

Since rioxarray already extends xarray, this module’s extensions inherit from the RasterArray and RasterDataset extensions respectively. This ensures cleaner code in the module as rio methods are available immediately, but also means a couple of design decisions are followed.

The xarray.DataArray and xarray.Dataset extensions here inherit from rioxarray base classes. Thus, methods that are identical for both objects are defined in a mixin class OapRasterMixin which can be inherited by the two respective extensions.

class ochanticipy.utils.raster.OapRasterArray(xarray_object)[source]

Bases: OapRasterMixin, RasterArray

OCHA AnticiPy extension for xarray.DataArray.

compute_raster_stats(gdf: GeoDataFrame, feature_col: str, stats_list: List[str] | None = None, percentile_list: List[int] | None = None, all_touched: bool = False) DataFrame[source]

Compute raster statistics for polygon geometry.

compute_raster_stats() is designed to quickly compute raster statistics across a polygon and its features.

Parameters:
  • gdf (geopandas.GeoDataFrame) – GeoDataFrame with row per area for stats computation. If pd.DataFrame is passed, geometry column must have the name geometry.

  • feature_col (str) – Column in gdf to use as row/feature identifier.

  • stats_list (Optional[List[str]], optional) – List of statistics to calculate, by default None. Passed to get_attr().

  • percentile_list (Optional[List[int]], optional) – List of percentiles to compute, by default None.

  • all_touched (bool, optional) – If True all cells touching the region will be included, by default False. If False, only cells with their centre in the region will be included.

Returns:

Dataframe with computed statistics.

Return type:

pandas.DataFrame

Examples

>>> import geopandas as gpd
>>> import xarray as xr
>>> import rioxarray
>>> from shapely.geometry import Polygon
>>>
>>> # compute raster stats on simple data
>>> d = {
...     "name": ["area_a", "area_b"],
...     "geometry": [
...         Polygon([(0, 0), (0, 2), (2, 2), (2, 0)]),
...         Polygon([(2, 0), (2, 2), (3, 2), (3, 0)]),
...     ],
... }
>>> gdf = gpd.GeoDataFrame(d)
>>>
>>> da = xr.DataArray(
...     [[1, 2, 3], [4, 5, 6]],
...     dims=("y", "x"),
...     coords={"y": [1.5, 0.5], "x": [0.5, 1.5, 2.5]},
... ).rio.write_crs("EPSG:4326")
>>>
>>> da.oap.compute_raster_stats(
...     gdf=gdf,
...     feature_col="name"
... ) 
   mean_name            std_name min_name max_name sum_name count_name    name # noqa: E501
0       3.0  1.5811388300841898        1        5     12.0          4  area_a  # noqa: E501
1       4.5                 1.5        3        6      9.0          2  area_b  # noqa: E501
class ochanticipy.utils.raster.OapRasterDataset(xarray_object)[source]

Bases: OapRasterMixin, RasterDataset

OCHA AnticiPy extension for xarray.Dataset.

compute_raster_stats(var_names: List[str] | str | None = None, **kwargs: Any)[source]

Compute raster statistics across dataset arrays.

compute_raster_stats() calculates raster statistics on component data arrays of a dataset. By default, calculates on all non-coordinate variables, unless a list of variable names is passed in, which then have statistics calculated for them.

Parameters:
  • var_names (Union[List[str], str, None], optional) – Dataset data array variables to calculate raster statistics on.

  • kwargs (Any) – Keyword arguments passed to the array method compute_raster_stats()

Returns:

List of raster statistics data frames.

Return type:

List[pandas.DataFrame]

Examples

>>> import geopandas as gpd
>>> import xarray as xr
>>> import rioxarray
>>> from shapely.geometry import Polygon
>>>
>>> # compute raster stats on simple data
>>> d = {
...     "name": ["area_a", "area_b"],
...     "geometry": [
...         Polygon([(0, 0), (0, 2), (2, 2), (2, 0)]),
...         Polygon([(2, 0), (2, 2), (3, 2), (3, 0)]),
...     ],
... }
>>> gdf = gpd.GeoDataFrame(d)
>>>
>>> ds = xr.DataArray(
...     [[1, 2, 3], [4, 5, 6]],
...     dims=("y", "x"),
...     coords={"y": [1.5, 0.5], "x": [0.5, 1.5, 2.5]},
... ).rio.write_crs("EPSG:4326").to_dataset(name="data")
>>>
>>> ds.oap.compute_raster_stats(
...    var_names=["data"],
...    gdf=gdf,
...    feature_col="name"
... ) 
[       mean                 std      min      max      sum      count    name # noqa: E501
0       3.0  1.5811388300841898        1        5     12.0          4  area_a  # noqa: E501
1       4.5                 1.5        3        6      9.0          2  area_b] # noqa: E501
get_raster_array(var_name: str) DataArray[source]

Get xarray.DataArray from variable and keep dimensions.

Accessing a component xarray.DataArray using the non-coordinate variable name loses and dimensions set through rio or oap. This includes x_dim, y_dim, and t_dim that have to be specifically set using rio.set_spatial_dims() or oap.set_time_dim() respectively. For any dataset ds, ds.get_raster_array("var") will retrieve the data array without losing the dimensions. Using ds["var"] will lose the dimensions.

Parameters:

var_name (str) – Name of variable.

Returns:

A data array.

Return type:

xarray.DataArray

Examples

>>> import xarray
>>> import numpy
>>> temp = 15 + 8 * numpy.random.randn(4, 4, 3)
>>> precip = 10 * np.random.rand(4, 4, 3)
>>> ds = xarray.Dataset(
...   {
...     "temperature": (["lat", "lon", "F"], temp),
...     "precipitation": (["lat", "lon", "F"], precip)
...   },
...   coords={
...     "lat":numpy.array([87, 88, 89, 90]),
...     "lon":numpy.array([5, 120, 199, 360]),
...     "F": pd.date_range("2014-09-06", periods=3)
...   }
... )
>>> ds.oap.set_time_dim("F", inplace=True)
>>> da = ds.oap.get_raster_array("temperature")
>>> da.oap.t_dim
'F'
>>> # directly accessing array loses set dimensions
>>> ds['temperature'].oap.t_dim 
Traceback (most recent call last):
    ...
rioxarray.exceptions.DimensionError: Time dimension not found.
    'oap.set_time_dim()' or using 'rename()' to change the
    dimension name to 't' can address this.
Data variable: temperature
class ochanticipy.utils.raster.OapRasterMixin(xarray_obj)[source]

Bases: object

OCHA AnticiPy mixin base class.

change_longitude_range(to_180_range: bool = True, inplace: bool = False) DataArray | Dataset | None[source]

Convert longitude range between -180 to 180 and 0 to 360.

The standard longitude range is from -180 to 180, while some applications use 0 to 360. This includes `rasterstats.zonal_stats <https://pypi.org/project/rasterstats/>`_, which assumes ranges from 0 to 360.

change_longitude_range() will convert between the two coordinate ranges based on its current state. By default it will use the -180 to 180 range unless to_180_range is False, then it will use 0-360 If coordinates lie solely between 0 and 180 then there is no need for conversion and the input will be returned.

Parameters:
  • to_180_range (bool, default = True) – If True, the returned range is -180 to 180 Else, the returned range is 0 to 360

  • inplace (bool, optional) – If True, will overwrite existing data array. Default is False.

Returns:

Dataset with transformed longitude coordinates.

Return type:

Union[xarray.DataArray, xarray.Dataset]

Examples

>>> import xarray
>>> import numpy
>>> import pandas
>>> temp = 15 + 8 * numpy.random.randn(4, 4, 3)
>>> precip = 10 * numpy.random.rand(4, 4, 3)
>>> ds = xarray.Dataset(
...   {
...     "temperature": (["lat", "lon", "time"], temp),
...     "precipitation": (["lat", "lon", "time"], precip)
...   },
...   coords={
...     "lat":numpy.array([87, 88, 89, 90]),
...     "lon":numpy.array([5, 120, 199, 360]),
...     "time": pandas.date_range("2014-09-06", periods=3)
...   }
... )
>>> ds_inv = ds.oap.change_longitude_range()
>>> ds_inv.get_index("lon")
Index([-161, 0, 5, 120], dtype='int64', name='lon')
>>> # invert coordinates back to original, in place
>>> ds_inv.oap.change_longitude_range(to_180_range=False, inplace=True)
>>> ds_inv.get_index("lon")
Index([0, 5, 120, 199], dtype='int64', name='lon')
correct_calendar(inplace: bool = False) DataArray | Dataset | None[source]

Correct calendar attribute for recognition by xarray.

Some datasets come with a wrong calendar attribute that isn’t recognized by xarray. This function corrects the coordinate attribute to ensure that a calendar attribute exists and specifies a calendar alias that is supportable by xarray.cftime_range and NetCDF in general.

Currently ensures that calendar attributes that are either specified with units="months since" or calendar="360" explicitly have calendar="360_day". This is based on discussions in this GitHub issue. If and when further issues are found with calendar attributes, support for conversion will be added here.

Parameters:

inplace (bool, optional) – If True, it will modify the dataarray in place. Otherwise it will return a modified copy.

Returns:

Data array or dataset with transformed calendar coordinate.

Return type:

Union[xarray.DataArray, xarray.Dataset]

Examples

>>> import xarray
>>> import numpy
>>> da = xarray.DataArray(
...  numpy.arange(64).reshape(4,4,4),
...  coords={"lat":numpy.array([87, 88, 89, 90]),
...          "lon":numpy.array([5, 120, 199, 360]),
...          "t":numpy.array([10,11,12,13])}
... )
>>> da["t"].attrs["units"] = "months since 1960-01-01"
>>> da_crct = da.oap.correct_calendar()
>>> da_crct["t"].attrs["calendar"]
'360_day'
invert_coordinates(inplace: bool = False) DataArray | Dataset | None[source]

Invert latitude and longitude in data array.

This function checks for inversion of latitude and longitude and inverts them if needed. Datasets with inverted coordinates can produce incorrect results in certain functions like rasterstats.zonal_stats(). Correctly ordered coordinates should be:

  • latitude: Largest to smallest.

  • longitude: Smallest to largest.

If data array already has correct coordinate ordering, it is directly returned. Function largely copied from https://github.com/perrygeo/python-rasterstats/issues/218.

Parameters:

inplace (bool, optional) – If True, will overwrite existing data array. Default is False.

Returns:

Data array or dataset with correct coordinate ordering.

Return type:

Union[xarray.DataArray, xarray.Dataset]

Examples

>>> import xarray
>>> import numpy
>>> da = xarray.DataArray(
...  numpy.arange(16).reshape(4,4),
...  coords={"lat":numpy.array([87, 88, 89, 90]),
...          "lon":numpy.array([70, 69, 68, 67])}
... )
>>> da.oap.invert_coordinates(inplace=True)
>>> da.get_index("lon")
Index([67, 68, 69, 70], dtype='int64', name='lon')
>>> da.get_index("lat")
Index([90, 89, 88, 87], dtype='int64', name='lat')
property longitude_range

The longitude range.

The longitude range indicates if coordinates are between -180 and 180 (indicated by ‘180’) or 0 and 360 (indicated by ‘360’).

Type:

str

set_time_dim(t_dim: str, inplace: bool = False) DataArray | Dataset | None[source]

Set the time dimension of the dataset.

Parameters:
  • t_dim (str) – The name of the time dimension.

  • inplace (bool, optional) – If True, it will modify the dataarray in place. Otherwise it will return a modified copy.

Returns:

Data array or dataset with time dimension.

Return type:

Union[xarray.DataArray, xarray.Dataset]

Examples

>>> import xarray
>>> import numpy
>>> da = xarray.DataArray(
...  numpy.arange(64).reshape(4,4,4),
...  coords={"lat":numpy.array([87, 88, 89, 90]),
...          "lon":numpy.array([5, 120, 199, 360]),
...          "F":numpy.array([10,11,12,13])}
... )
>>> da.oap.set_time_dim(t_dim="F", inplace=True)
>>> da.oap.t_dim
'F'
property t_dim

The dimension for time.

Type:

str

x_dim: str
y_dim: str

Module contents

General utilities.