synthesizer.pipeline.pipeline_utils¶
A submodule with helpers for writing out Synthesizer pipeline results.
Functions
- synthesizer.pipeline.pipeline_utils.cached_split(split_key)[source]¶
Split a key into a list of keys.
This is a cached version of the split function to avoid repeated splitting of the same key.
- Parameters:
split_key (str) – The key to split in “key1/key2/…/keyN” format.
- Returns:
A list of the split keys.
- Return type:
list
- synthesizer.pipeline.pipeline_utils.combine_list_of_dicts(dicts)[source]¶
Combine a list of dictionaries into a single dictionary.
- Parameters:
dicts (list) – A list of dictionaries to combine.
- Returns:
The combined dictionary.
- Return type:
dict
- synthesizer.pipeline.pipeline_utils.count_and_check_dict_recursive(data, prefix='')[source]¶
Recursively count the number of leaves in a dictionary.
- Parameters:
data (dict) – The dictionary to search.
prefix (str) – A prefix to add to the keys of the arrays.
- Returns:
A dictionary of all the numpy arrays in the input dictionary.
- Return type:
dict
- synthesizer.pipeline.pipeline_utils.discover_attr_paths_recursive(obj, prefix='', output_set=None)[source]¶
Recursively discover all outputs attached to an object.
This function will collate all paths to attributes at any level within the input object.
If the object is a dictionary, we will loop over all keys and values recursing where appropriate.
If the object is a class instance (e.g. Galaxy, Stars, ImageCollection, etc.), we will loop over all attributes and recurse where appropriate.
If the object is a “value” (i.e. an array or a scalar), we will append the full path to the output list.
NOTE: this function is currently unused but is kept for debugging purposes since it is extremely useful to see the nesting of attributes on objects.
- Parameters:
obj (dict) – The dictionary to search.
prefix (str) – A prefix to add to the keys of the arrays.
output_set (set) – A set to store the output paths in.
- Returns:
A dictionary of all the numpy arrays in the input dictionary.
- Return type:
dict
- synthesizer.pipeline.pipeline_utils.discover_dict_recursive(data, prefix='', output_set=None)[source]¶
Recursively discover all leaves in a dictionary.
- Parameters:
data (dict) – The dictionary to search.
prefix (str) – A prefix to add to the keys of the arrays.
output_set (set) – A set to store the output paths in.
- Returns:
A dictionary of all the numpy arrays in the input dictionary.
- Return type:
dict
- synthesizer.pipeline.pipeline_utils.discover_dict_structure(data)[source]¶
Recursively discover the structure of a dictionary.
- Parameters:
data (dict) – The dictionary to search.
- Returns:
A dictionary of all the paths in the input dictionary.
- Return type:
dict
- synthesizer.pipeline.pipeline_utils.get_dataset_properties(data, comm, root=0)[source]¶
Return the shapes, dtypes and units of all data arrays in a dictionary.
- Parameters:
data (dict) – The data to get the shapes of.
comm (mpi.Comm) – The MPI communicator.
root (int) – The root rank to gather data to.
- Returns:
A dictionary of the shapes of all data arrays. dict: A dictionary of the dtypes of all data arrays. dict: A dictionary of the units of all data arrays.
- Return type:
dict
- synthesizer.pipeline.pipeline_utils.get_full_memory(obj, seen=None)[source]¶
Estimate memory usage of a Python object, including NumPy arrays.
- Parameters:
obj – The object to inspect.
seen – Set of seen object ids to avoid double-counting.
- Returns:
Approximate size in bytes.
- Return type:
int
- synthesizer.pipeline.pipeline_utils.unify_dict_structure_across_ranks(data, comm, root=0)[source]¶
Recursively unify the structure of a dictionary across all ranks.
This function will ensure that all ranks have the same structure in their dictionaries. This is necessary for writing out the data in parallel.
- Parameters:
data (dict) – The data to unify.
comm (mpi.Comm) – The MPI communicator.
root (int) – The root rank to gather data to.