synference.utils¶
Utility functions for synference.
Functions
- synference.utils.analyze_feature_contributions(base_distribution, observations, method='mahalanobis', feature_names=None, contamination=0.1, confidence=0.95)[source]¶
Analyze which features contribute most to outlier detection in distance-based methods.
Parameters:¶
- base_distributionarray-like, shape (n_samples, n_features)
Reference distribution data
- observationsarray-like, shape (n_obs, n_features)
Observations to analyze
- methodstr, default=’mahalanobis’
Method to use: ‘mahalanobis’, ‘robust_mahalanobis’, or ‘standardized_euclidean’
- feature_nameslist, optional
Names of features for plotting
- contaminationfloat, default=0.1
Expected proportion of outliers (for robust methods)
- confidencefloat, default=0.95
Confidence level for thresholds
Returns:¶
: dict : Dictionary containing feature contribution analysis
- synference.utils.asinh_err_to_f_jy(f_asinh, f_asinh_err, f_b=unyt_quantity(5, 'nJy'))[source]¶
Convert asinh magnitude error to flux error in Jy.
- Parameters:
f_asinh (
ndarray) – Flux in asinh magnitude scale.f_asinh_err (
ndarray) – Flux error in asinh magnitude scale.f_b (
unyt_array) – Softening parameter (transition point for the asinh scale).
- Return type:
unyt_array- Returns:
Flux error in Jy.
- synference.utils.asinh_to_f_jy(f_asinh, f_b=unyt_quantity(5, 'nJy'))[source]¶
Convert asinh magnitude to flux in Jy.
- Parameters:
f_asinh (
ndarray) – Flux in asinh magnitude scale.f_b (
unyt_array) – Softening parameter (transition point for the asinh scale).
- Return type:
unyt_array- Returns:
Flux in Jy.
- synference.utils.asinh_to_snr(f_asinh, f_asinh_err, f_b=unyt_quantity(5, 'nJy'))[source]¶
Convert asinh magnitude and error to signal-to-noise ratio.
Parameters: f_asinh: Flux in asinh magnitude scale. f_asinh_err: Flux error in asinh magnitude scale. f_b: Softening parameter (transition point for the asinh scale).
Returns: Signal-to-noise ratio.
- synference.utils.average_coverage_error(true_values, intervals, alpha)[source]¶
Calculates the Average Coverage Error (ACE).
ACE indicates the reliability of a prediction interval. A value close to zero denotes a high reliability.
- Parameters:
true_values (
List[float]) – A list of the true, observed values (Y(i)).intervals (
List[Tuple[float,float]]) – A list of tuples representing the [L_alpha(i), U_alpha(i)].alpha (
float) – The significance level.
- Return type:
float- Returns:
The Average Coverage Error.
- synference.utils.average_coverage_probability(true_values, intervals)[source]¶
Calculates the average coverage probability of a set of prediction intervals.
Coverage is the proportion of true values that fall within their corresponding predicted interval.
- Parameters:
true_values (
List[float]) – A list of the true, observed values.intervals (
List[Tuple[float,float]]) – A list of tuples, where each tuple represents the (lower_bound, upper_bound) of a prediction interval.
- Return type:
float- Returns:
The proportion (between 0.0 and 1.0) of true values covered by their intervals.
- Raises:
ValueError – If the number of true values and intervals do not match.
- synference.utils.calculate_min_max_wav_grid(filterset, max_redshift, min_redshift=0)[source]¶
Calculate the minimum and maximum wavelengths for a given redshift.
- synference.utils.check_log_scaling(arr)[source]¶
Check if the input array has dimensions that scale with logarithmic normalization.
- synference.utils.check_scaling(arr)[source]¶
Check if the input array has dimensions that scale with normalization.
- synference.utils.combine_rank_files(size, filepath, num_galaxies, starts, ends)[source]¶
Combine the rank files into a single file.
- Parameters:
output_file (str) – The name of the output file.
size (int) – The number of MPI ranks.
filepath (str) – The template filepath for the rank files.
num_galaxies (int) – The total number of galaxies.
starts (list) – The start indices for each rank.
ends (list) – The end indices for each rank.
- Returns:
None
- synference.utils.compare_methods_feature_importance(base_distribution, observations, feature_names=None)[source]¶
Compare feature importance across different distance-based methods.
- synference.utils.convolve_variable_width_gaussian(flux, sigma_pixels, trunc=4.0)[source]¶
Convolves a 1D array with a Gaussian kernel of variable width using Numba for performance.
- Parameters:
flux (np.ndarray) – The input 1D flux array.
sigma_pixels (np.ndarray) – An array of the same size as flux, where each value is the Gaussian sigma (in pixels) for the kernel at that position.
trunc (float) – The number of sigmas at which to truncate the kernel.
- Returns:
The convolved flux array.
- Return type:
np.ndarray
- synference.utils.create_database_universal(db_name, password='', host='localhost', user='root', port=31666, db_type='mysql+pymysql', full_url=None)[source]¶
Create database for MySQL, PostgreSQL, or CockroachDB.
Returns the full connection URL for the created database.
Either provide a full URL or the individual parameters.
- synference.utils.create_sqlite_db(db_path)[source]¶
Create a SQLite database at the specified path.
- Parameters:
db_path (
str) – Path to the SQLite database file.
- synference.utils.cumsum_dirichlet_prior_transform(unit_cube, alpha)[source]¶
Transform from unit hypercube to cumulative sum of Dirichlet-distributed parameters.
This produces ordered breakpoints on [0,1] by taking the cumulative sum of a Dirichlet distribution. Useful for nested sampling priors with ordered parameters (e.g., transition times, change points, ordered categories).
Uses stick-breaking transformation with Beta distributions appropriate for Dirichlet(alpha, alpha, …, alpha) distribution.
- Parameters:
unit_cube (array-like, shape (N,)) – Values from the unit hypercube [0,1]^N
alpha (float) – Dirichlet concentration parameter (same for all dimensions)
Returns
-------
breakpoints (>>> # Transform 3D unit cube to 3 ordered) – Ordered values 0 < breakpoints[0] < … < breakpoints[N-1] < 1 These are cumulative sums of Dirichlet(alpha, …, alpha) with (N+1) components
Examples
--------
breakpoints
[ (>>> unit_cube =)
0.5 (...)
:param : :param … 0.5: :param : :param … 0.5: :param : :param … ]: :param >>> breakpoints = cumsum_dirichlet_prior_transform(: :param … unit_cube: :param : :param … alpha=1.0: :param : :param … ): :param >>> print(: :param … f”Breakpoints: :type … f”Breakpoints: {breakpoints}” :param … ): :param >>> print(: :param … f”All ordered: :type … f”All ordered: {np.all(breakpoints[:-1] < breakpoints[1:])}” :param … ): :param >>> # Equivalent to: :param >>> # txs = np.cumsum(np.random.dirichlet(np.ones(N+1)*alpha)i)[: :type >>> # txs = np.cumsum(np.random.dirichlet(np.ones(N+1)*alpha)i)[: -1]
- synference.utils.detect_outliers(base_distribution, observations, method='mahalanobis', contamination=0.1, n_neighbors=20, threshold=None, confidence=0.95, n_components=None, plot=True, **kwargs)[source]¶
Detect outliers in multivariate data using various methods.
- Parameters:
base_distribution (np.ndarray) – Reference distribution data of shape (n_samples, n_features).
observations (np.ndarray) – Observations to test for outliers, of shape (n_obs, n_features).
method (str, optional) – Method to use. Options include: ‘mahalanobis’, ‘robust_mahalanobis’, ‘lof’, ‘isolation_forest’, ‘one_class_svm’, ‘pca’, ‘hotelling_t2’, ‘kde’. Defaults to ‘mahalanobis’.
contamination (float, optional) – Expected proportion of outliers (for applicable methods). Defaults to 0.1.
n_neighbors (int, optional) – Number of neighbors for LOF. Defaults to 20.
threshold (float, optional) – Manual threshold for outlier detection. Defaults to None.
confidence (float, optional) – Confidence level for statistical tests. Defaults to 0.95.
n_components (int, optional) – Number of components for PCA (if None, uses all). Defaults to None.
plot (bool, optional) – Whether to plot results (only applicable for some methods). Defaults to True.
**kwargs (Any) – Additional parameters for specific methods.
- Returns:
A dictionary containing:
’outlier_mask’ (np.ndarray): Boolean array indicating outliers.
’scores’ (np.ndarray): Outlier scores for each observation.
’threshold_used’ (float): The threshold value used for detection.
’method_info’ (dict): Additional method-specific information.
- Return type:
Dict[str, Any]
- synference.utils.detect_outliers_pyod(base_distribution, observations, methods=['ecod'], combination='majority', return_scores=False, **kwargs)[source]¶
Detect outliers in multivariate data using pyod methods.
Parameters:¶
- base_distributionarray-like, shape (n_samples, n_features)
Reference distribution data
- observationsarray-like, shape (n_obs, n_features)
Observations to test for outliers
- methodsstr or list of str, default=’ecod’
Method(s) to use from pyod. Available methods: ‘ecod’
- combinationstr, default=’majority’
- How to combine results from multiple methods: ‘majority’, ‘any’, ‘all’, ‘none’
‘majority’: Outlier if majority of methods flag as outlier
‘any’: Outlier if any method flags as outlier
‘all’: Outlier only if all methods flag as outlier
‘none’: Return individual method results without combination
- return_scoresbool, default=False
Whether to return outlier scores along with the mask
- **kwargsdict
Additional parameters for specific pyod methods
- synference.utils.download_test_data()[source]¶
Downloads test data for Synference using the synference-download CLI tool.
- synference.utils.f_jy_err_to_asinh(f_jy, f_jy_err, f_b=unyt_quantity(5, 'nJy'))[source]¶
Convert flux error in Jy to asinh magnitude error.
- Parameters:
f_jy (
unyt_array) – Flux in Jy.f_jy_err (
unyt_array) – Flux error in Jy.f_b (
unyt_array) – Softening parameter (transition point for the asinh scale).
- Return type:
ndarray- Returns:
Magnitude error in asinh scale.
- synference.utils.f_jy_to_asinh(f_jy, f_b=unyt_quantity(5, 'nJy'))[source]¶
Convert flux in Jy to asinh magnitude.
- Parameters:
f_jy (
unyt_array) – Flux in Jy.f_b (
unyt_array) – Softening parameter (transition point for the asinh scale).
- Return type:
ndarray- Returns:
Magnitude in asinh scale.
- synference.utils.generate_constant_R(R=300, start=unyt_quantity(1, 'Å'), end=unyt_quantity(900000., 'Å'), auto_start_stop=False, filterset=None, **kwargs)[source]¶
Generate a constant R wavelength grid.
- Parameters:
R – The resolution of the grid.
start – The starting wavelength of the grid.
end – The ending wavelength of the grid.
auto_start_stop – If True, calculate start and end from the filterset.
filterset – A filter set to calculate the start and end wavelengths.
**kwargs – Additional keyword arguments for filterset calculations.
- Returns:
A numpy array of wavelengths in Angstroms.
- synference.utils.interval_sharpness(true_values, predicted_means, predicted_sigmas, alpha)[source]¶
Calculates the Interval Sharpness (IS).
IS measures the accuracy of probabilistic forecasting, where a smaller absolute value indicates a better, narrower prediction interval.
- Parameters:
true_values (
List[float]) – A list of the true, observed values (Y(i)).predicted_means (
List[float]) – A list of predicted mean values (mu(i)).predicted_sigmas (
List[float]) – A list of predicted standard deviations (sigma(i)).alpha (
float) – The significance level.
- Return type:
float- Returns:
The Interval Sharpness (IS_alpha), which is always <= 0.
- synference.utils.list_parameters(distribution)[source]¶
List parameters for scipy.stats.distribution.
- Parameters
distribution: a string or scipy.stats distribution object.
- Returns:
A list of distribution parameter strings.
- synference.utils.load_library_from_hdf5(hdf5_path, photometry_key='Grid/Photometry', parameters_key='Grid/Parameters', filter_codes_attr='FilterCodes', parameters_attr='ParameterNames', parameters_units_attr='ParameterUnits', supp_key='Grid/SupplementaryParameters', supp_attr='SupplementaryParameterNames', supp_units_attr='SupplementaryParameterUnits', phot_unit_attr='PhotometryUnits', spectra_key='Grid/Spectra')[source]¶
Load a grid from an HDF5 file.
- Parameters:
hdf5_path (
str) – Path to the HDF5 file.photometry_key (
str) – Key for the photometry dataset in the HDF5 file.parameters_key (
str) – Key for the parameters dataset in the HDF5 file.filter_codes_attr (
str) – Attribute name for filter codes in the HDF5 file.parameters_attr (
str) – Attribute name for parameter names in the HDF5 file.parameters_units_attr (
str) – Attribute name for parameter units in HDF5 file.supp_key (
str) – Key for supplementary parameters in the HDF5 file.supp_attr (
str) – Attribute name for supplementary parameter names in the HDF5 file.supp_units_attr (
str) – Attribute name for supplementary parameter units in HDF5 file.phot_unit_attr (
str) – Attribute name for photometry units in the HDF5 file.
- Return type:
dict- Returns:
The loaded grid.
- synference.utils.make_serializable(obj, allowed_types=None)[source]¶
Recursively convert a nested dictionary/object to be JSON serializable.
Handles common scientific computing types: - NumPy arrays and scalars - PyTorch tensors - JAX arrays - TensorFlow tensors - Pandas Series/DataFrames - Complex numbers - Sets - Bytes - Custom objects with __dict__
- Parameters:
obj (
Any) – The object to make serializableallowed_types – Optional list of additional types to allow (e.g., custom classes)
- Return type:
Any- Returns:
A JSON-serializable version of the input object
- synference.utils.move_to_device(obj, device, visited=None)[source]¶
Move tensors and objects with a .to() method to the specified device.
Recursively traverses an object and its components, moving tensors and objects with a .to() method to the specified device. Also sets a ._device or .device attribute if present.
Gracefully handles read-only attributes by ignoring AttributeError on setattr.
- Parameters:
obj (
Any) – The object to move.device (
str|device) – The target device (e.g., ‘cpu’, ‘cuda:0’).visited (
Optional[Set[int]]) – A set of object ids to prevent infinite recursion in case of circular references. Should not be set by the user.
- Return type:
Any- Returns:
The object, with its components moved to the specified device.
- synference.utils.optimize_sfh_xlimit(ax, mass_threshold=0.001, buffer_fraction=0.2)[source]¶
Stolen from EXPANSE.
Optimizes the x-axis limits of a matplotlib plot containing SFR histories to focus on periods after each galaxy has formed a certain fraction of its final mass. Calculates cumulative mass from SFR data.
Parameters:¶
- axmatplotlib.axes.Axes
The axes object containing the SFR plots (SFR/yr vs time)
- mass_thresholdfloat, optional
Fraction of final stellar mass to use as threshold (default: 0.01 for 1%)
- buffer_fractionfloat, optional
Fraction of the active time range to add as buffer (default: 0.1)
Returns:¶
: float - The optimal maximum x value for the plot
- synference.utils.rename_overlapping_parameters(lists_dict)[source]¶
Check if N lists have any overlapping parameters and rename them if they do.
- Parameters:
lists_dict – Dictionary where keys are list names and values are the lists
- Returns:
Dictionary with renamed parameters where overlapping occurred
- synference.utils.save_emission_model(model)[source]¶
Save the fixed parameters of the emission model.
- Parameters:
model – The emission model object.
- Returns:
A dictionary containing fixed parameters, dust attenuation, and dust emission model information.
- synference.utils.search_parameter_array(array, parameter_names, constraints)[source]¶
Return indexes in array with columns which meet constraints.
- Parameters:
array (np.ndarray) – The data array where rows are entries and columns are parameters.
parameter_names (List[str]) – A list of string names for each column in the array.
constraints (List[Tuple[str, str, Union[int, float]]]) – A list of tuples, where each tuple defines a constraint in the format: (parameter_name, operator_string, value). e.g., (‘mass’, ‘>’, 100)
- Returns:
- A NumPy array of integer indices for the rows in the input
array that satisfy all of the given constraints.
- Return type:
np.ndarray
- synference.utils.setup_mpi_named_logger(name, level=20, stream=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
Sets up a named logger that only outputs messages from MPI rank 0.
This is more robust than configuring the root logger, as it won’t interfere with the logging settings of other libraries.
- Parameters:
name (
str) – The name for the logger instance.level (
int) – The logging level for the rank 0 process.stream (
TextIO) – The output stream for the rank 0 process.
- Return type:
Logger- Returns:
A configured logging.Logger instance.
- synference.utils.transform_spectrum(theory_wave, theory_flux, z, observed_wave, resolution_curve_wave, resolution_curve_r, theory_r=inf, trunc_constant=4.0)[source]¶
Transforms a high-resolution theoretical spectrum to a given redshift and match resolution.
- Parameters:
theory_wave (np.ndarray) – Wavelength array of the high-res theoretical spectrum.
theory_flux (np.ndarray) – Flux array of the high-res theoretical spectrum.
z (float) – The redshift to apply to the theoretical spectrum.
observed_wave (np.ndarray) – The target wavelength grid of the observation.
resolution_curve_wave (np.ndarray) – Wavelength points for the resolution curve.
resolution_curve_r (np.ndarray) – The spectral resolution R at each point.
theory_r (Union[float, np.ndarray]) – Intrinsic resolution R of the theoretical model.
trunc_constant (float) – Truncation constant for the Gaussian kernel.
- Returns:
- A tuple containing the final wavelength array and the
transformed, resampled flux array.
- Return type:
Tuple[np.ndarray, np.ndarray]
- synference.utils.update_plot(train_loss, val_loss, epoch, time_elapsed=None, trial_number=None, alt_screen=True)[source]¶
Updates a live plot of training and validation loss in the terminal.
This function uses the terminal’s alternate screen buffer to create a full-screen plot that disappears when the script ends, restoring the previous terminal content.
- Parameters:
train_loss (List[float]) – A list of the training loss at each epoch.
val_loss (List[float]) – A list of the validation loss at each epoch.
epoch (int) – The current epoch number.
time_elapsed (Optional[float]) – The time elapsed since the start of training.
trial_number (Optional[int]) – The Optuna trial number, for display.
alt_screen (bool) – Whether to use the alternate screen buffer. Defaults to True.
Classes
- class synference.utils.CPU_Unpickler(file, *, fix_imports=True, encoding='ASCII', errors='strict', buffers=())[source]¶
Custom unpickler that handles specific Torch storage loading.
- class synference.utils.FilterArithmeticParser[source]¶
Parser for filter arithmetic expressions.
Parser for filter arithmetic expressions. Supports operations like: - Basic arithmetic: +, -, *, / - Parentheses for grouping - Constants and coefficients
Examples
“F356W” -> single filter “F356W + F444W” -> filter addition “2 * F356W” -> coefficient multiplication “(F356W + F444W) / 2” -> average of filters “F356W - 0.5 * F444W” -> weighted subtraction
- evaluate(tokens, filter_data)[source]¶
Evaluate a list of tokens using provided filter data.
- Parameters:
tokens (
List[str]) – List of tokens from the expressionfilter_data (
Dict[str,Union[float,ndarray]]) – Dictionary mapping filter names to their values
- Return type:
Union[float,ndarray]- Returns:
Result of the arithmetic operations
- parse_and_evaluate(expression, filter_data)[source]¶
Parse and evaluate a filter arithmetic expression.
- Parameters:
expression (
str) – String containing the filter arithmetic expressionfilter_data (
Dict[str,Union[float,ndarray]]) – Dictionary mapping filter names to their values
- Return type:
Union[float,ndarray]- Returns:
Result of evaluating the expression
Exceptions
Exception raised when a function times out. |