synference.utils¶

Utility functions for synference.

Functions

synference.utils.analyze_feature_contributions(base_distribution, observations, method='mahalanobis', feature_names=None, contamination=0.1, confidence=0.95)[source]¶

Analyze which features contribute most to outlier detection in distance-based methods.

Parameters:¶

base_distributionarray-like, shape (n_samples, n_features): Reference distribution data
observationsarray-like, shape (n_obs, n_features): Observations to analyze
methodstr, default=’mahalanobis’: Method to use: ‘mahalanobis’, ‘robust_mahalanobis’, or ‘standardized_euclidean’
feature_nameslist, optional: Names of features for plotting
contaminationfloat, default=0.1: Expected proportion of outliers (for robust methods)
confidencefloat, default=0.95: Confidence level for thresholds

Returns:¶

: dict : Dictionary containing feature contribution analysis

synference.utils.asinh_err_to_f_jy(f_asinh, f_asinh_err, f_b=unyt_quantity(5, 'nJy'))[source]¶

Convert asinh magnitude error to flux error in Jy.

Parameters:

f_asinh (ndarray) – Flux in asinh magnitude scale.
f_asinh_err (ndarray) – Flux error in asinh magnitude scale.
f_b (unyt_array) – Softening parameter (transition point for the asinh scale).

Return type:

unyt_array

Returns:

Flux error in Jy.

synference.utils.asinh_to_f_jy(f_asinh, f_b=unyt_quantity(5, 'nJy'))[source]¶

Convert asinh magnitude to flux in Jy.

Parameters:

f_asinh (ndarray) – Flux in asinh magnitude scale.
f_b (unyt_array) – Softening parameter (transition point for the asinh scale).

Return type:

unyt_array

Returns:

Flux in Jy.

synference.utils.asinh_to_snr(f_asinh, f_asinh_err, f_b=unyt_quantity(5, 'nJy'))[source]¶

Convert asinh magnitude and error to signal-to-noise ratio.

Parameters: f_asinh: Flux in asinh magnitude scale. f_asinh_err: Flux error in asinh magnitude scale. f_b: Softening parameter (transition point for the asinh scale).

Returns: Signal-to-noise ratio.

synference.utils.average_coverage_error(true_values, intervals, alpha)[source]¶

Calculates the Average Coverage Error (ACE).

ACE indicates the reliability of a prediction interval. A value close to zero denotes a high reliability.

Parameters:

true_values (List[float]) – A list of the true, observed values (Y(i)).
intervals (List[Tuple[float, float]]) – A list of tuples representing the [L_alpha(i), U_alpha(i)].
alpha (float) – The significance level.

Return type:

float

Returns:

The Average Coverage Error.

synference.utils.average_coverage_probability(true_values, intervals)[source]¶

Calculates the average coverage probability of a set of prediction intervals.

Coverage is the proportion of true values that fall within their corresponding predicted interval.

Parameters:

true_values (List[float]) – A list of the true, observed values.
intervals (List[Tuple[float, float]]) – A list of tuples, where each tuple represents the (lower_bound, upper_bound) of a prediction interval.

Return type:

float

Returns:

The proportion (between 0.0 and 1.0) of true values covered by their intervals.

Raises:

ValueError – If the number of true values and intervals do not match.

synference.utils.calculate_min_max_wav_grid(filterset, max_redshift, min_redshift=0)[source]¶: Calculate the minimum and maximum wavelengths for a given redshift.

synference.utils.check_log_scaling(arr)[source]¶: Check if the input array has dimensions that scale with logarithmic normalization.

synference.utils.check_scaling(arr)[source]¶: Check if the input array has dimensions that scale with normalization.

synference.utils.combine_rank_files(size, filepath, num_galaxies, starts, ends)[source]¶

Combine the rank files into a single file.

Parameters:

output_file (str) – The name of the output file.
size (int) – The number of MPI ranks.
filepath (str) – The template filepath for the rank files.
num_galaxies (int) – The total number of galaxies.
starts (list) – The start indices for each rank.
ends (list) – The end indices for each rank.

Returns:

None

synference.utils.compare_methods_feature_importance(base_distribution, observations, feature_names=None)[source]¶: Compare feature importance across different distance-based methods.

synference.utils.convolve_variable_width_gaussian(flux, sigma_pixels, trunc=4.0)[source]¶

Convolves a 1D array with a Gaussian kernel of variable width using Numba for performance.

Parameters:

flux (np.ndarray) – The input 1D flux array.
sigma_pixels (np.ndarray) – An array of the same size as flux, where each value is the Gaussian sigma (in pixels) for the kernel at that position.
trunc (float) – The number of sigmas at which to truncate the kernel.

Returns:

The convolved flux array.

Return type:

np.ndarray

synference.utils.create_database_universal(db_name, password='', host='localhost', user='root', port=31666, db_type='mysql+pymysql', full_url=None)[source]¶

Create database for MySQL, PostgreSQL, or CockroachDB.

Returns the full connection URL for the created database.

Either provide a full URL or the individual parameters.

synference.utils.create_sqlite_db(db_path)[source]¶

Create a SQLite database at the specified path.

Parameters:: db_path (str) – Path to the SQLite database file.

synference.utils.cumsum_dirichlet_prior_transform(unit_cube, alpha)[source]¶

Transform from unit hypercube to cumulative sum of Dirichlet-distributed parameters.

This produces ordered breakpoints on [0,1] by taking the cumulative sum of a Dirichlet distribution. Useful for nested sampling priors with ordered parameters (e.g., transition times, change points, ordered categories).

Uses stick-breaking transformation with Beta distributions appropriate for Dirichlet(alpha, alpha, …, alpha) distribution.

Parameters:

unit_cube (array-like, shape (N,)) – Values from the unit hypercube [0,1]^N
alpha (float) – Dirichlet concentration parameter (same for all dimensions)
Returns
-------
breakpoints (>>> # Transform 3D unit cube to 3 ordered) – Ordered values 0 < breakpoints[0] < … < breakpoints[N-1] < 1 These are cumulative sums of Dirichlet(alpha, …, alpha) with (N+1) components
Examples
--------
breakpoints
[ (>>> unit_cube =)
0.5 (...)

:param : :param … 0.5: :param : :param … 0.5: :param : :param … ]: :param >>> breakpoints = cumsum_dirichlet_prior_transform(: :param … unit_cube: :param : :param … alpha=1.0: :param : :param … ): :param >>> print(: :param … f”Breakpoints: :type … f”Breakpoints: {breakpoints}” :param … ): :param >>> print(: :param … f”All ordered: :type … f”All ordered: {np.all(breakpoints[:-1] < breakpoints[1:])}” :param … ): :param >>> # Equivalent to: :param >>> # txs = np.cumsum(np.random.dirichlet(np.ones(N+1)*alpha)i)[: :type >>> # txs = np.cumsum(np.random.dirichlet(np.ones(N+1)*alpha)i)[: -1]

synference.utils.detect_outliers(base_distribution, observations, method='mahalanobis', contamination=0.1, n_neighbors=20, threshold=None, confidence=0.95, n_components=None, plot=True, **kwargs)[source]¶

Detect outliers in multivariate data using various methods.

Parameters:

base_distribution (np.ndarray) – Reference distribution data of shape (n_samples, n_features).
observations (np.ndarray) – Observations to test for outliers, of shape (n_obs, n_features).
method (str, optional) – Method to use. Options include: ‘mahalanobis’, ‘robust_mahalanobis’, ‘lof’, ‘isolation_forest’, ‘one_class_svm’, ‘pca’, ‘hotelling_t2’, ‘kde’. Defaults to ‘mahalanobis’.
contamination (float, optional) – Expected proportion of outliers (for applicable methods). Defaults to 0.1.
n_neighbors (int, optional) – Number of neighbors for LOF. Defaults to 20.
threshold (float, optional) – Manual threshold for outlier detection. Defaults to None.
confidence (float, optional) – Confidence level for statistical tests. Defaults to 0.95.
n_components (int, optional) – Number of components for PCA (if None, uses all). Defaults to None.
plot (bool, optional) – Whether to plot results (only applicable for some methods). Defaults to True.
**kwargs (Any) – Additional parameters for specific methods.

Returns:

A dictionary containing:

’outlier_mask’ (np.ndarray): Boolean array indicating outliers.

’scores’ (np.ndarray): Outlier scores for each observation.

’threshold_used’ (float): The threshold value used for detection.

’method_info’ (dict): Additional method-specific information.

Return type:

Dict[str, Any]

synference.utils.detect_outliers_pyod(base_distribution, observations, methods=['ecod'], combination='majority', return_scores=False, **kwargs)[source]¶

Detect outliers in multivariate data using pyod methods.

Parameters:¶

base_distributionarray-like, shape (n_samples, n_features)

Reference distribution data

observationsarray-like, shape (n_obs, n_features)

Observations to test for outliers

methodsstr or list of str, default=’ecod’

Method(s) to use from pyod. Available methods: ‘ecod’

combinationstr, default=’majority’

How to combine results from multiple methods: ‘majority’, ‘any’, ‘all’, ‘none’

‘majority’: Outlier if majority of methods flag as outlier
‘any’: Outlier if any method flags as outlier
‘all’: Outlier only if all methods flag as outlier
‘none’: Return individual method results without combination

return_scoresbool, default=False

Whether to return outlier scores along with the mask

**kwargsdict

Additional parameters for specific pyod methods

synference.utils.download_test_data()[source]¶: Downloads test data for Synference using the synference-download CLI tool.

synference.utils.f_jy_err_to_asinh(f_jy, f_jy_err, f_b=unyt_quantity(5, 'nJy'))[source]¶

Convert flux error in Jy to asinh magnitude error.

Parameters:

f_jy (unyt_array) – Flux in Jy.
f_jy_err (unyt_array) – Flux error in Jy.
f_b (unyt_array) – Softening parameter (transition point for the asinh scale).

Return type:

ndarray

Returns:

Magnitude error in asinh scale.

synference.utils.f_jy_to_asinh(f_jy, f_b=unyt_quantity(5, 'nJy'))[source]¶

Convert flux in Jy to asinh magnitude.

Parameters:

f_jy (unyt_array) – Flux in Jy.
f_b (unyt_array) – Softening parameter (transition point for the asinh scale).

Return type:

ndarray

Returns:

Magnitude in asinh scale.

synference.utils.generate_constant_R(R=300, start=unyt_quantity(1, 'Å'), end=unyt_quantity(900000., 'Å'), auto_start_stop=False, filterset=None, **kwargs)[source]¶

Generate a constant R wavelength grid.

Parameters:

R – The resolution of the grid.
start – The starting wavelength of the grid.
end – The ending wavelength of the grid.
auto_start_stop – If True, calculate start and end from the filterset.
filterset – A filter set to calculate the start and end wavelengths.
**kwargs – Additional keyword arguments for filterset calculations.

Returns:

A numpy array of wavelengths in Angstroms.

synference.utils.interval_sharpness(true_values, predicted_means, predicted_sigmas, alpha)[source]¶

Calculates the Interval Sharpness (IS).

IS measures the accuracy of probabilistic forecasting, where a smaller absolute value indicates a better, narrower prediction interval.

Parameters:

true_values (List[float]) – A list of the true, observed values (Y(i)).
predicted_means (List[float]) – A list of predicted mean values (mu(i)).
predicted_sigmas (List[float]) – A list of predicted standard deviations (sigma(i)).
alpha (float) – The significance level.

Return type:

float

Returns:

The Interval Sharpness (IS_alpha), which is always <= 0.

synference.utils.list_parameters(distribution)[source]¶

List parameters for scipy.stats.distribution.

Parameters: distribution: a string or scipy.stats distribution object.

Returns:: A list of distribution parameter strings.

# from https://stackoverflow.com/questions/30453097/getting-the-parameter-names-of-scipy-stats-distributions

synference.utils.load_library_from_hdf5(hdf5_path, photometry_key='Grid/Photometry', parameters_key='Grid/Parameters', filter_codes_attr='FilterCodes', parameters_attr='ParameterNames', parameters_units_attr='ParameterUnits', supp_key='Grid/SupplementaryParameters', supp_attr='SupplementaryParameterNames', supp_units_attr='SupplementaryParameterUnits', phot_unit_attr='PhotometryUnits', spectra_key='Grid/Spectra')[source]¶

Load a grid from an HDF5 file.

Parameters:

hdf5_path (str) – Path to the HDF5 file.
photometry_key (str) – Key for the photometry dataset in the HDF5 file.
parameters_key (str) – Key for the parameters dataset in the HDF5 file.
filter_codes_attr (str) – Attribute name for filter codes in the HDF5 file.
parameters_attr (str) – Attribute name for parameter names in the HDF5 file.
parameters_units_attr (str) – Attribute name for parameter units in HDF5 file.
supp_key (str) – Key for supplementary parameters in the HDF5 file.
supp_attr (str) – Attribute name for supplementary parameter names in the HDF5 file.
supp_units_attr (str) – Attribute name for supplementary parameter units in HDF5 file.
phot_unit_attr (str) – Attribute name for photometry units in the HDF5 file.

Return type:

dict

Returns:

The loaded grid.

synference.utils.make_serializable(obj, allowed_types=None)[source]¶

Recursively convert a nested dictionary/object to be JSON serializable.

Handles common scientific computing types: - NumPy arrays and scalars - PyTorch tensors - JAX arrays - TensorFlow tensors - Pandas Series/DataFrames - Complex numbers - Sets - Bytes - Custom objects with __dict__

Parameters:

obj (Any) – The object to make serializable
allowed_types – Optional list of additional types to allow (e.g., custom classes)

Return type:

Any

Returns:

A JSON-serializable version of the input object

synference.utils.move_to_device(obj, device, visited=None)[source]¶

Move tensors and objects with a .to() method to the specified device.

Recursively traverses an object and its components, moving tensors and objects with a .to() method to the specified device. Also sets a ._device or .device attribute if present.

Gracefully handles read-only attributes by ignoring AttributeError on setattr.

Parameters:

obj (Any) – The object to move.
device (str | device) – The target device (e.g., ‘cpu’, ‘cuda:0’).
visited (Optional[Set[int]]) – A set of object ids to prevent infinite recursion in case of circular references. Should not be set by the user.

Return type:

Any

Returns:

The object, with its components moved to the specified device.

synference.utils.optimize_sfh_xlimit(ax, mass_threshold=0.001, buffer_fraction=0.2)[source]¶

Stolen from EXPANSE.

Optimizes the x-axis limits of a matplotlib plot containing SFR histories to focus on periods after each galaxy has formed a certain fraction of its final mass. Calculates cumulative mass from SFR data.

Parameters:¶

axmatplotlib.axes.Axes: The axes object containing the SFR plots (SFR/yr vs time)
mass_thresholdfloat, optional: Fraction of final stellar mass to use as threshold (default: 0.01 for 1%)
buffer_fractionfloat, optional: Fraction of the active time range to add as buffer (default: 0.1)

Returns:¶

: float - The optimal maximum x value for the plot

synference.utils.rename_overlapping_parameters(lists_dict)[source]¶

Check if N lists have any overlapping parameters and rename them if they do.

Parameters:: lists_dict – Dictionary where keys are list names and values are the lists
Returns:: Dictionary with renamed parameters where overlapping occurred

synference.utils.save_emission_model(model)[source]¶

Save the fixed parameters of the emission model.

Parameters:: model – The emission model object.
Returns:: A dictionary containing fixed parameters, dust attenuation, and dust emission model information.

synference.utils.search_parameter_array(array, parameter_names, constraints)[source]¶

Return indexes in array with columns which meet constraints.

Parameters:

array (np.ndarray) – The data array where rows are entries and columns are parameters.
parameter_names (List[str]) – A list of string names for each column in the array.
constraints (List[Tuple[str, str, Union[int, float]]]) – A list of tuples, where each tuple defines a constraint in the format: (parameter_name, operator_string, value). e.g., (‘mass’, ‘>’, 100)

Returns:

A NumPy array of integer indices for the rows in the input: array that satisfy all of the given constraints.

Return type:

np.ndarray

synference.utils.setup_mpi_named_logger(name, level=20, stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶

Sets up a named logger that only outputs messages from MPI rank 0.

This is more robust than configuring the root logger, as it won’t interfere with the logging settings of other libraries.

Parameters:

name (str) – The name for the logger instance.
level (int) – The logging level for the rank 0 process.
stream (TextIO) – The output stream for the rank 0 process.

Return type:

Logger

Returns:

A configured logging.Logger instance.

synference.utils.timeout_handler(signum, frame)[source]¶: Handler for alarm signal.

synference.utils.transform_spectrum(theory_wave, theory_flux, z, observed_wave, resolution_curve_wave, resolution_curve_r, theory_r=inf, trunc_constant=4.0)[source]¶

Transforms a high-resolution theoretical spectrum to a given redshift and match resolution.

Parameters:

theory_wave (np.ndarray) – Wavelength array of the high-res theoretical spectrum.
theory_flux (np.ndarray) – Flux array of the high-res theoretical spectrum.
z (float) – The redshift to apply to the theoretical spectrum.
observed_wave (np.ndarray) – The target wavelength grid of the observation.
resolution_curve_wave (np.ndarray) – Wavelength points for the resolution curve.
resolution_curve_r (np.ndarray) – The spectral resolution R at each point.
theory_r (Union[float, np.ndarray]) – Intrinsic resolution R of the theoretical model.
trunc_constant (float) – Truncation constant for the Gaussian kernel.

Returns:

A tuple containing the final wavelength array and the: transformed, resampled flux array.

Return type:

Tuple[np.ndarray, np.ndarray]

synference.utils.update_plot(train_loss, val_loss, epoch, time_elapsed=None, trial_number=None, alt_screen=True)[source]¶

Updates a live plot of training and validation loss in the terminal.

This function uses the terminal’s alternate screen buffer to create a full-screen plot that disappears when the script ends, restoring the previous terminal content.

Parameters:

train_loss (List[float]) – A list of the training loss at each epoch.
val_loss (List[float]) – A list of the validation loss at each epoch.
epoch (int) – The current epoch number.
time_elapsed (Optional[float]) – The time elapsed since the start of training.
trial_number (Optional[int]) – The Optuna trial number, for display.
alt_screen (bool) – Whether to use the alternate screen buffer. Defaults to True.

Classes

class synference.utils.CPU_Unpickler(file, *, fix_imports=True, encoding='ASCII', errors='strict', buffers=())[source]¶

Custom unpickler that handles specific Torch storage loading.

find_class(module, name)[source]¶: Find class in the specified module.

class synference.utils.FilterArithmeticParser[source]¶

Parser for filter arithmetic expressions.

Parser for filter arithmetic expressions. Supports operations like: - Basic arithmetic: +, -, *, / - Parentheses for grouping - Constants and coefficients

Examples

“F356W” -> single filter “F356W + F444W” -> filter addition “2 * F356W” -> coefficient multiplication “(F356W + F444W) / 2” -> average of filters “F356W - 0.5 * F444W” -> weighted subtraction

evaluate(tokens, filter_data)[source]¶

Evaluate a list of tokens using provided filter data.

Parameters:

tokens (List[str]) – List of tokens from the expression
filter_data (Dict[str, Union[float, ndarray]]) – Dictionary mapping filter names to their values

Return type:

Union[float, ndarray]

Returns:

Result of the arithmetic operations

is_filter(token)[source]¶

Check if token is a filter name.

Return type:: bool

is_number(token)[source]¶

Check if token is a number.

Return type:: bool

parse_and_evaluate(expression, filter_data)[source]¶

Parse and evaluate a filter arithmetic expression.

Parameters:

expression (str) – String containing the filter arithmetic expression
filter_data (Dict[str, Union[float, ndarray]]) – Dictionary mapping filter names to their values

Return type:

Union[float, ndarray]

Returns:

Result of evaluating the expression

tokenize(expression)[source]¶

Convert string expression into list of tokens.

Return type:: List[str]

Exceptions

TimeoutException

Exception raised when a function times out.