synthesizer.pipeline.pipeline_utils¶

A submodule with helpers for writing out Synthesizer pipeline results.

Functions

synthesizer.pipeline.pipeline_utils.cached_split(split_key)[source]¶

Split a key into a list of keys.

This is a cached version of the split function to avoid repeated splitting of the same key.

Parameters:: split_key (str) – The key to split in “key1/key2/…/keyN” format.
Returns:: A list of the split keys.
Return type:: list

synthesizer.pipeline.pipeline_utils.combine_list_of_dicts(dicts)[source]¶

Combine a list of dictionaries into a single dictionary.

Parameters:: dicts (list) – A list of dictionaries to combine.
Returns:: The combined dictionary.
Return type:: dict

synthesizer.pipeline.pipeline_utils.count_and_check_dict_recursive(data, prefix='')[source]¶

Recursively count the number of leaves in a dictionary.

Parameters:

data (dict) – The dictionary to search.
prefix (str) – A prefix to add to the keys of the arrays.

Returns:

A dictionary of all the numpy arrays in the input dictionary.

Return type:

dict

synthesizer.pipeline.pipeline_utils.discover_attr_paths_recursive(obj, prefix='', output_set=None)[source]¶

Recursively discover all outputs attached to an object.

This function will collate all paths to attributes at any level within the input object.

If the object is a dictionary, we will loop over all keys and values recursing where appropriate.

If the object is a class instance (e.g. Galaxy, Stars, ImageCollection, etc.), we will loop over all attributes and recurse where appropriate.

If the object is a “value” (i.e. an array or a scalar), we will append the full path to the output list.

NOTE: this function is currently unused but is kept for debugging purposes since it is extremely useful to see the nesting of attributes on objects.

Parameters:

obj (dict) – The dictionary to search.
prefix (str) – A prefix to add to the keys of the arrays.
output_set (set) – A set to store the output paths in.

Returns:

A dictionary of all the numpy arrays in the input dictionary.

Return type:

dict

synthesizer.pipeline.pipeline_utils.discover_dict_recursive(data, prefix='', output_set=None)[source]¶

Recursively discover all leaves in a dictionary.

Parameters:

data (dict) – The dictionary to search.
prefix (str) – A prefix to add to the keys of the arrays.
output_set (set) – A set to store the output paths in.

Returns:

A dictionary of all the numpy arrays in the input dictionary.

Return type:

dict

synthesizer.pipeline.pipeline_utils.discover_dict_structure(data)[source]¶

Recursively discover the structure of a dictionary.

Parameters:: data (dict) – The dictionary to search.
Returns:: A dictionary of all the paths in the input dictionary.
Return type:: dict

synthesizer.pipeline.pipeline_utils.get_dataset_properties(data, comm, root=0)[source]¶

Return the shapes, dtypes and units of all data arrays in a dictionary.

Parameters:

data (dict) – The data to get the shapes of.
comm (mpi.Comm) – The MPI communicator.
root (int) – The root rank to gather data to.

Returns:

A dictionary of the shapes of all data arrays. dict: A dictionary of the dtypes of all data arrays. dict: A dictionary of the units of all data arrays.

Return type:

dict

synthesizer.pipeline.pipeline_utils.get_full_memory(obj, seen=None)[source]¶

Estimate memory usage of a Python object, including NumPy arrays.

Parameters:

obj – The object to inspect.
seen – Set of seen object ids to avoid double-counting.

Returns:

Approximate size in bytes.

Return type:

int

synthesizer.pipeline.pipeline_utils.unify_dict_structure_across_ranks(data, comm, root=0)[source]¶

Recursively unify the structure of a dictionary across all ranks.

This function will ensure that all ranks have the same structure in their dictionaries. This is necessary for writing out the data in parallel.

Parameters:

data (dict) – The data to unify.
comm (mpi.Comm) – The MPI communicator.
root (int) – The root rank to gather data to.

synthesizer.pipeline.pipeline_utils.validate_noise_unit_compatibility(instruments, expected_unit)[source]¶

Validate that noise attributes have compatible units.

This function checks that instruments with noise capabilities have depth and noise_maps attributes with units compatible with the expected unit for the image type (luminosity or flux).

Note: depth can be specified as:

Plain float/dict of floats: apparent magnitudes (dimensionless, valid for both luminosity and flux images)
unyt_quantity/dict of unyt_quantity: flux/luminosity with units (must match image type)

Parameters:

instruments (list) – A list of Instrument objects to validate.
expected_unit (unyt.Unit) – The expected unit for the image type (e.g., “erg/s/Hz” for luminosity images or “nJy” for flux images).

Raises:

InconsistentArguments – If an instrument has depth or noise_maps with incompatible units.

Classes

class synthesizer.pipeline.pipeline_utils.OperationKwargs(**kwargs)[source]¶

A container class holding the kwargs needed by any pipeline operation.

_kwargs¶: dict The original kwargs dict used to build this object. (Values are not copied; we just hold the references.)

get(key, default=None)[source]¶: Dict-like get method: obj.get(‘fov’, default) -> kwargs.get().

get_hash()[source]¶: Get the hash representation of the kwargs for caching purposes.

property kwargs¶: Return the underlying kwargs dict.

class synthesizer.pipeline.pipeline_utils.OperationKwargsHandler(model_labels)[source]¶

Container for Pipeline operation kwargs.

This handler enables running pipeline operations multiple times with different parameters for different models in a clean, expandable and organized manner.

Internally it stores unique OperationKwargs objects per operation (func_name) and associates them with one or more model labels and their instruments:

self._func_map[func_name][OperationKwargs][label] -> list[instruments]

This avoids duplicating identical kwargs sets across labels and provides a clean interface to loop over:

all (label, OperationKwargs) for a given operation, or

all OperationKwargs for a given (label, operation), or

groups of labels that share the same OperationKwargs.

add(model_label, func_name, **kwargs)[source]¶

Add a kwargs set for a given func_name and one or more labels.

This wraps the kwargs in an OperationKwargs and deduplicates them based on its hashing / equality semantics.

Parameters:

model_label (str or iterable of str or None) – Emission model label(s) or None for NO_MODEL_LABEL.
func_name (str) – Operation / method name, e.g. “get_images_luminosity”.
**kwargs – Arbitrary keyword arguments to store for this func.

Returns:

The OperationKwargs instance representing this kwargs set.

Return type:

OperationKwargs

add_unique(func_name, **kwargs)[source]¶

Add a single unique kwargs set for a given func_name.

This is used for operations that should only have one configuration per pipeline run (e.g., get_sfzh, get_sfh, get_observed_spectra).

Parameters:

func_name (str) – Operation / method name, e.g. “get_sfzh”.
**kwargs – Arbitrary keyword arguments to store for this func.

Returns:

The OperationKwargs instance representing this kwargs set.

Return type:

OperationKwargs

get_unique_kwargs(func_name)[source]¶

Return the unique OperationKwargs for a given func_name.

This is only applicable for operations added via add_unique() and can never have multiple variations.

Parameters:: func_name (str) – Operation / method name.
Returns:: The unique OperationKwargs for this operation.
Return type:: OperationKwargs

has(func_name, model_label=None)[source]¶

Return True if any kwargs are stored for the given operation.

Parameters:

func_name (str) – Operation / method name.
model_label (str, optional) – If provided, restrict the check to this model. If omitted, all models are searched.

Returns:

True if at least one OperationKwargs exists matching the query.

Return type:

bool

iter_all(func_name)[source]¶

Iterate over (model_label, OperationKwargs) pairs for an operation.

This is the main entry point for Pipeline methods that want to process all configs for a given operation, regardless of model.

Non-consuming: internal state is left unchanged.

Parameters:: func_name (str) – Operation / method name.
Yields:: (model_label, OperationKwargs) – Tuples of model label and OperationKwargs object.