synthesizer.pipeline.pipeline_utils

A submodule with helpers for writing out Synthesizer pipeline results.

Functions

synthesizer.pipeline.pipeline_utils.cached_split(split_key)[source]

Split a key into a list of keys.

This is a cached version of the split function to avoid repeated splitting of the same key.

Parameters:

split_key (str) – The key to split in “key1/key2/…/keyN” format.

Returns:

A list of the split keys.

Return type:

list

synthesizer.pipeline.pipeline_utils.combine_list_of_dicts(dicts)[source]

Combine a list of dictionaries into a single dictionary.

Parameters:

dicts (list) – A list of dictionaries to combine.

Returns:

The combined dictionary.

Return type:

dict

synthesizer.pipeline.pipeline_utils.count_and_check_dict_recursive(data, prefix='')[source]

Recursively count the number of leaves in a dictionary.

Parameters:
  • data (dict) – The dictionary to search.

  • prefix (str) – A prefix to add to the keys of the arrays.

Returns:

A dictionary of all the numpy arrays in the input dictionary.

Return type:

dict

synthesizer.pipeline.pipeline_utils.discover_attr_paths_recursive(obj, prefix='', output_set=None)[source]

Recursively discover all outputs attached to an object.

This function will collate all paths to attributes at any level within the input object.

If the object is a dictionary, we will loop over all keys and values recursing where appropriate.

If the object is a class instance (e.g. Galaxy, Stars, ImageCollection, etc.), we will loop over all attributes and recurse where appropriate.

If the object is a “value” (i.e. an array or a scalar), we will append the full path to the output list.

NOTE: this function is currently unused but is kept for debugging purposes since it is extremely useful to see the nesting of attributes on objects.

Parameters:
  • obj (dict) – The dictionary to search.

  • prefix (str) – A prefix to add to the keys of the arrays.

  • output_set (set) – A set to store the output paths in.

Returns:

A dictionary of all the numpy arrays in the input dictionary.

Return type:

dict

synthesizer.pipeline.pipeline_utils.discover_dict_recursive(data, prefix='', output_set=None)[source]

Recursively discover all leaves in a dictionary.

Parameters:
  • data (dict) – The dictionary to search.

  • prefix (str) – A prefix to add to the keys of the arrays.

  • output_set (set) – A set to store the output paths in.

Returns:

A dictionary of all the numpy arrays in the input dictionary.

Return type:

dict

synthesizer.pipeline.pipeline_utils.discover_dict_structure(data)[source]

Recursively discover the structure of a dictionary.

Parameters:

data (dict) – The dictionary to search.

Returns:

A dictionary of all the paths in the input dictionary.

Return type:

dict

synthesizer.pipeline.pipeline_utils.get_dataset_properties(data, comm, root=0)[source]

Return the shapes, dtypes and units of all data arrays in a dictionary.

Parameters:
  • data (dict) – The data to get the shapes of.

  • comm (mpi.Comm) – The MPI communicator.

  • root (int) – The root rank to gather data to.

Returns:

A dictionary of the shapes of all data arrays. dict: A dictionary of the dtypes of all data arrays. dict: A dictionary of the units of all data arrays.

Return type:

dict

synthesizer.pipeline.pipeline_utils.get_full_memory(obj, seen=None)[source]

Estimate memory usage of a Python object, including NumPy arrays.

Parameters:
  • obj – The object to inspect.

  • seen – Set of seen object ids to avoid double-counting.

Returns:

Approximate size in bytes.

Return type:

int

synthesizer.pipeline.pipeline_utils.unify_dict_structure_across_ranks(data, comm, root=0)[source]

Recursively unify the structure of a dictionary across all ranks.

This function will ensure that all ranks have the same structure in their dictionaries. This is necessary for writing out the data in parallel.

Parameters:
  • data (dict) – The data to unify.

  • comm (mpi.Comm) – The MPI communicator.

  • root (int) – The root rank to gather data to.

synthesizer.pipeline.pipeline_utils.validate_noise_unit_compatibility(instruments, expected_unit)[source]

Validate that noise attributes have compatible units.

This function checks that instruments with noise capabilities have depth and noise_maps attributes with units compatible with the expected unit for the image type (luminosity or flux).

Note: depth can be specified as:
  • Plain float/dict of floats: apparent magnitudes (dimensionless, valid for both luminosity and flux images)

  • unyt_quantity/dict of unyt_quantity: flux/luminosity with units (must match image type)

Parameters:
  • instruments (list) – A list of Instrument objects to validate.

  • expected_unit (unyt.Unit) – The expected unit for the image type (e.g., “erg/s/Hz” for luminosity images or “nJy” for flux images).

Raises:

InconsistentArguments – If an instrument has depth or noise_maps with incompatible units.

Classes

class synthesizer.pipeline.pipeline_utils.OperationKwargs(**kwargs)[source]

A container class holding the kwargs needed by any pipeline operation.

_kwargs

dict The original kwargs dict used to build this object. (Values are not copied; we just hold the references.)

get(key, default=None)[source]

Dict-like get method: obj.get(‘fov’, default) -> kwargs.get().

get_hash()[source]

Get the hash representation of the kwargs for caching purposes.

property kwargs

Return the underlying kwargs dict.

class synthesizer.pipeline.pipeline_utils.OperationKwargsHandler(model_labels)[source]

Container for Pipeline operation kwargs.

This handler enables running pipeline operations multiple times with different parameters for different models in a clean, expandable and organized manner.

Internally it stores unique OperationKwargs objects per operation (func_name) and associates them with one or more model labels and their instruments:

self._func_map[func_name][OperationKwargs][label] -> list[instruments]

This avoids duplicating identical kwargs sets across labels and provides a clean interface to loop over:

  • all (label, OperationKwargs) for a given operation, or

  • all OperationKwargs for a given (label, operation), or

  • groups of labels that share the same OperationKwargs.

add(model_label, func_name, **kwargs)[source]

Add a kwargs set for a given func_name and one or more labels.

This wraps the kwargs in an OperationKwargs and deduplicates them based on its hashing / equality semantics.

Parameters:
  • model_label (str or iterable of str or None) – Emission model label(s) or None for NO_MODEL_LABEL.

  • func_name (str) – Operation / method name, e.g. “get_images_luminosity”.

  • **kwargs – Arbitrary keyword arguments to store for this func.

Returns:

The OperationKwargs instance representing this kwargs set.

Return type:

OperationKwargs

add_unique(func_name, **kwargs)[source]

Add a single unique kwargs set for a given func_name.

This is used for operations that should only have one configuration per pipeline run (e.g., get_sfzh, get_sfh, get_observed_spectra).

Parameters:
  • func_name (str) – Operation / method name, e.g. “get_sfzh”.

  • **kwargs – Arbitrary keyword arguments to store for this func.

Returns:

The OperationKwargs instance representing this kwargs set.

Return type:

OperationKwargs

get_unique_kwargs(func_name)[source]

Return the unique OperationKwargs for a given func_name.

This is only applicable for operations added via add_unique() and can never have multiple variations.

Parameters:

func_name (str) – Operation / method name.

Returns:

The unique OperationKwargs for this operation.

Return type:

OperationKwargs

has(func_name, model_label=None)[source]

Return True if any kwargs are stored for the given operation.

Parameters:
  • func_name (str) – Operation / method name.

  • model_label (str, optional) – If provided, restrict the check to this model. If omitted, all models are searched.

Returns:

True if at least one OperationKwargs exists matching the query.

Return type:

bool

iter_all(func_name)[source]

Iterate over (model_label, OperationKwargs) pairs for an operation.

This is the main entry point for Pipeline methods that want to process all configs for a given operation, regardless of model.

Non-consuming: internal state is left unchanged.

Parameters:

func_name (str) – Operation / method name.

Yields:

(model_label, OperationKwargs) – Tuples of model label and OperationKwargs object.