synthesizer.utils.stats

Statistical functions for weighted means, medians, and quantiles.

This module provides functions to calculate weighted means, medians, and quantiles. All of these are helper wrappers around existing numpy functionality.

Example usage:
from synthesizer.utils.stats import (

weighted_mean, weighted_median, weighted_quantile, binned_weighted_quantile,

)

data = [1, 2, 3, 4, 5] weights = [0.1, 0.2, 0.3, 0.4, 0.5] mean = weighted_mean(data, weights) median = weighted_median(data, weights) quantiles = weighted_quantile(

data, [0.25, 0.5, 0.75], sample_weight=weights,

) binned_quantiles = binned_weighted_quantile(

data, data, weights, bins=[0, 2, 4, 6], quantiles=[0.25, 0.5]

)

Functions

synthesizer.utils.stats.binned_weighted_quantile(x, y, weights, bins, quantiles)[source]

Calculate the weighted quantiles of y in bins of x.

Parameters:
  • x (np.ndarray or list) – The x values to bin by.

  • y (np.ndarray or list) – The y values to calculate the quantiles of.

  • weights (np.ndarray or list) – The weights to apply to the y values.

  • bins (np.ndarray or list) – The bins to use for the x values.

  • quantiles (np.ndarray or list) – The quantiles to calculate.

Returns:

The weighted quantiles of y in the bins of x.

Return type:

np.ndarray

synthesizer.utils.stats.n_weighted_moment(values, weights, n)[source]

Calculate the n-th weighted moment of the values.

Parameters:
  • values (np.ndarray or list) – The values to calculate the moment of.

  • weights (np.ndarray or list) – The weights to apply to the values.

  • n (int) – The order of the moment to calculate.

Returns:

The n-th weighted moment of the values.

Return type:

float

synthesizer.utils.stats.weighted_mean(data, weights)[source]

Calculate the weighted mean.

This is just a helpful alias around np.average which provides a weighted mean more efficient than using a combination of np.sum and np.mean.

Parameters:
  • data (list or np.ndarray) – The data to calculate the mean of.

  • weights (list or np.ndarray) – The weights to apply to the data.

Returns:

The weighted mean.

Return type:

float

synthesizer.utils.stats.weighted_median(data, weights)[source]

Calculate the weighted median.

Parameters:
  • data (list or numpy.array) – The data to calculate the median of.

  • weights (list or numpy.array) – The weights to apply to the data.

synthesizer.utils.stats.weighted_quantile(values, quantiles, sample_weight=None, values_sorted=False, old_style=False)[source]

Calculate a weighted quantile.

Taken from From https://stackoverflow.com/a/29677616/1718096.

Very close to numpy.percentile, but supports weights.

Parameters:
  • values (np.ndarray or list) – The values to compute the quantiles of.

  • quantiles (np.ndarray or list) – The quantiles to compute. Must be in [0, 1].

  • sample_weight (np.ndarray or list) – The weights to apply to the values.

  • values_sorted (bool) – If True, then values will not be sorted before the calculation.

  • old_style (bool) – If True, then the computed quantiles will be returned in the same style as numpy.percentile.

Returns:

The computed quantiles.

Return type:

np.ndarray