synthesizer.pipeline.pipeline¶

A module containing a pipeline helper class.

This module contains the Pipeline class, which is used to run observable generation pipelines on a set of galaxies. To use this functionality the user needs to define the properties of the Pipeline and a function to load the galaxies. The user can then call the various methods to generate the mock data they need, simplifying a complex pipeline full of boilerplate code to a handfull of definitions and calls to the Pipeline object.

Example usage: ```python

from synthesizer import Pipeline

pipeline = Pipeline(
gal_loader_func=load_galaxy, emission_model=emission_model, instruments=[instrument1, instrument2], n_galaxies=1000, nthreads=4, comm=None, verbose=1, )

pipeline.load_galaxies() pipeline.get_spectra() pipeline.get_photometry_luminosities() pipeline.write(“output.hdf5”)

```

Classes

class synthesizer.pipeline.pipeline.Pipeline(emission_model, nthreads=1, comm=None, verbose=1, report_memory=False)[source]¶

A class for running observable generation pipelines on galaxies.

To use this class the user must instantiate it with a galaxy loading function, an emission model defining the different emissions that will be included in the pipeline, any instruments that will be used to make observations, and the number of galaxies that will be loaded.

Optionally the user can also specify the number of threads to use if Synthesizer has been installed with OpenMP support, and an MPI communicator if they are running over MPI.

Finally the verbosity level can be set to control the amount of output.

Once the Pipeline object has been instantiated the user can call the various methods to generate the data they need.

For spectra:

get_spectra (passing a cosmology object if redshifted spectra are
required)
get_lnu_data_cubes (resolved spectral data cubes)
get_fnu_data_cubes (resolved spectral data cubes)

For photometry:

get_photometry_luminosities
get_photometry_fluxes

For emission lines:

get_lines (passing a list of line IDs to generate)

For images (with optional PSF and noise based on the instrument):

get_images_luminosity
get_images_flux

For the SFZH grid:

get_sfzh (passing a Grid object)

The user can also add their own analysis functions to the pipeline which will be run on each galaxy once all data has been generated. These functions should take a galaxy object as the first argument and can take any number of additional arguments and keyword arguments. The results of these functions should be attached to the galaxy object, either as base level attributes or dictionaries containing the computed values. These attributes should be unique to the function to avoid overwriting existing attributes (they should be named what is passed to the result_attribute argument, see add_analysis_func for more details).

Finally the user can write out the data generated by the pipeline using the write method. This will write out the data to an HDF5 file.

emission_model¶

The emission model to use for the pipeline.

Type:: EmissionModel

n_galaxies¶

How many galaxies will we load in total (i.e. not per rank if using MPI)?

Type:: int

nthreads¶

The number of threads to use for shared memory parallelism. Default is 1.

Type:: int

comm¶

The MPI communicator to use for MPI parallelism. Default is None.

Type:: MPI.Comm

verbose¶

How talkative are we? 0: No output beyond hello and goodbye. 1: Outputs with timings but only on rank 0 (when using MPI). 2: Outputs with timings on all ranks (when using MPI).

Type:: int

galaxies¶

A list of Galaxy objects that have been loaded.

Type:: list

add_analysis_func(func, result_key, *args, **kwargs)[source]¶

Add an analysis function to the Pipeline.

The provided function will be called on each galaxy in the Pipeline once all data has been generated. The function should take a galaxy object as the first argument and can take any number of additional arguments and keyword arguments.

The results of the analysis function should be returned. This can be a scalar, array, or a dictionary of arbitrary structure. We’ll store it in a dictionary on the Pipeline object with the key being the result_key argument.

For example:

```python def my_analysis_func(galaxy, *args, **kwargs):

return galaxy.some_attribute * 2

pipeline.add_analysis_func(my_analysis_func, “MyAnalysisResult”) ```

Or for a specific component of the galaxy:

```python def my_analysis_func(galaxy, *args, **kwargs):

return galaxy.stars.mass.sum()

pipeline.add_analysis_func(my_analysis_func, “Stars/Mass”) ```

Parameters:

func (callable) – The analysis function to add to the Pipeline. This function should take a galaxy object as the first argument and can take any number of additional arguments and keyword arguments.
result_key (str) – The key to use when storing the results of the analysis function in the output. This can include slashes to denote nesting, e.g. “Gas/Nested/Result”.
*args – Any additional arguments to pass to the analysis function.
**kwargs – Any additional keyword arguments to pass to the analysis function.

add_galaxies(galaxies)[source]¶

Add galaxies to the Pipeline.

This function will add the provided galaxies to the Pipeline. This is useful if you have already loaded the galaxies and want to add them to the Pipeline object.

Parameters:: galaxies (list) – A list of Galaxy objects to add to the Pipeline.

property all_galaxies_memory_usage¶

Return the memory usage of all galaxies across all ranks.

Returns:: The memory usage in Megabytes.
Return type:: float

combine_files()[source]¶

Combine inidividual rank files into a single file.

Only applicable to MPI runs.

This will create a physical file on disk with all the data copied from the inidivdual rank files. The rank files themselves will be deleted. Once all data has been copied.

This method is cleaner but has the potential to be very slow.

combine_files_virtual()[source]¶

Combine inidividual rank files into a single virtual file.

Only applicable to MPI runs.

This will create a file where all the data is accessible but not physically copied. This is much faster than making a true copy but requires each individual rank file remains accessible.

property galaxies_memory_usage¶

Return the memory usage of the galaxies loaded into the Pipeline.

Returns:: The memory usage in Megabytes.
Return type:: float

get_data_cubes_fnu()[source]¶: Compute the Spectral flux density data cubes.

get_data_cubes_lnu()[source]¶: Compute the spectral luminosity density data cubes.

get_images_flux(*instruments, fov=None, img_type='smoothed', kernel=None, kernel_threshold=1.0, cosmo=None, igm=None, spectra_type=None, psf_resample_factor=1)[source]¶

Flag that the Pipeline should compute the flux images.

This will signal the Pipeline to compute the flux images for each galaxy when the run method is called.

The flux images are generated based on the fluxes, and in turn the fnu spectra, and the instrument filters.

Parameters:

instruments (Instrument/InstrumentCollection) – The instruments to use for the flux images. This can be any number of instruments or instrument collections, they will all be combined into a single InstrumentCollection for this operation.
fov (unyt_quantity) – The field of view of the image with units.
img_type (str) – The type of image to generate. Options are ‘smoothed’ or ‘hist’. Default is ‘smoothed’.
kernel (array-like) – The kernel to use for smoothing the image. Default is None. Required for ‘smoothed’ images from a particle distribution.
kernel_threshold (float) – The threshold of the kernel. Default is 1.0.
cosmo (astropy.cosmology.Cosmology) – If get_spectra_observed has not been called explicitly, then we will need the cosmology to compute the observed spectra first. Default is None.
igm (IGMBase) – If get_spectra_observed has not been called explicitly, then we will need the IGM model to compute the observed spectra first. Unlike the cosmology, this is not required if IGM attenuation is not needed. Default is None.
spectra_type (list/str) – The type of spectra to generate images for. By default this is None and all spectra types will be used. This can either be a list of strings or a single string.
psf_resample_factor (int) – (Only applicable for instruments with a PSF.) The resample factor for the PSF. This should be a value greater than 1. The image will be resampled by this factor before the PSF is applied and then downsampled back to the original after convolution. This can help minimize the effects of using a generic PSF centred on the galaxy centre, a simplification we make for performance reasons (the effects are sufficiently small that this simplifications is justified).

get_images_luminosity(*instruments, fov=None, img_type='smoothed', kernel=None, kernel_threshold=1.0, spectra_type=None, psf_resample_factor=1)[source]¶

Flag that the Pipeline should compute the luminosity images.

This will signal the Pipeline to compute the luminosity images for each galaxy when the run method is called.

The luminosity images are generated based on the luminosities, and in turn the lnu spectra, and the instrument filters.

Parameters:

instruments (Instrument/InstrumentCollection) – The instruments to use for the luminosity images. This can be any number of instruments or instrument collections, they will all be combined into a single InstrumentCollection for this operation.
fov (unyt_quantity) – The field of view of the image with units.
img_type (str) – The type of image to generate. Options are ‘smoothed’ or ‘hist’. Default is ‘smoothed’.
kernel (array-like) – The kernel to use for smoothing the image. Default is None. Required for ‘smoothed’ images from a particle distribution.
kernel_threshold (float) – The threshold of the kernel. Default is 1.0.
spectra_type (list/str) – The type of spectra to generate images for. By default this is None and all spectra types will be used. This can either be a list of strings or a single string.
psf_resample_factor (int) – (Only applicable for instruments with a PSF.) The resample factor for the PSF. This should be a value greater than 1. The image will be resampled by this factor before the PSF is applied and then downsampled back to the original after convolution. This can help minimize the effects of using a generic PSF centred on the galaxy centre, a simplification we make for performance reasons (the effects are sufficiently small that this simplifications is justified).

get_lines(line_ids)[source]¶

Flag that the Pipeline should compute the emission lines.

This will signal the Pipeline to compute the emission lines for each galaxy when the run method is called.

The emission lines are generated based on the lnu spectra and the EmissionModel.

Parameters:: line_ids (list) – The emission line IDs to generate.

get_los_optical_depths(kernel, kernel_threshold=1.0, kappa=0.0795)[source]¶

Flag that the Pipeline should compute the LOS optical depths.

This will signal the Pipeline to compute the LOS optical depths when the run method is called.

LOS optical depths are computed first.

Note that the LOS calculation requries a galaxy has a gas component and either a stellar or black hole components emitting.

Parameters:

kernel (array-like) – The gas SPH kernel.
kernel_threshold (float) – The threshold of the kernel. Default is 1.0.
kappa (float) – The dust opacity coefficient in units of Msun / pc**2. Default is 0.0795.

get_observed_lines(cosmo, igm=<class 'synthesizer.emission_models.transformers.igm.Inoue14'>, line_ids=None)[source]¶

Flag that the Pipeline should compute the observed emission lines.

This will signal the Pipeline to compute the observed emission lines for each galaxy when the run method is called.

The observed emission lines are generated based on the emission lines and the cosmology.

Parameters:

cosmo (astropy.cosmology.Cosmology) – The cosmology to use for the observed emission lines.
igm (IGMBase) – The IGM model to use for the observed emission lines. Default is Inoue14.
line_ids (list) – If get_lines has not been called explicitly, then we will need the line IDs to generate the emission lines. Default is None.

get_observed_spectra(cosmo, igm=<class 'synthesizer.emission_models.transformers.igm.Inoue14'>)[source]¶

Flag that the Pipeline should compute the observed spectra.

This will signal the Pipeline to compute the observed spectral flux density for each galaxy when the run method is called.

The observed spectra are generated based on the rest frame spectra and the cosmology.

Parameters:

cosmo (astropy.cosmology.Cosmology) – The cosmology to use for the observed spectra.
igm (IGMBase) – The IGM model to use for the attenuation of the spectra.

get_photometry_fluxes(*instruments, cosmo=None, igm=None)[source]¶

Flag that the Pipeline should compute the photometric fluxes.

This will signal the Pipeline to compute the photometric fluxes for each galaxy when the run method is called.

The photometric fluxes are generated based on the fnu spectra and the instrument filters.

Parameters:

instruments (Instrument/InstrumentCollection) – The instruments to use for the photometric fluxes. This can be any number of instruments or instrument collections, they will all be combined into a single InstrumentCollection for this operation.
cosmo (astropy.cosmology.Cosmology) – If get_spectra_observed has not been called explicitly, then we will need the cosmology to compute the observed spectra first. Default is None.
igm (IGMBase) – If get_spectra_observed has not been called explicitly, then we will need the IGM model to compute the observed spectra first. Unlike the cosmology, this is not required if IGM attenuation is not needed. Default is None.

get_photometry_luminosities(*instruments)[source]¶

Flag that the Pipeline should compute the photometric luminosities.

This will signal the Pipeline to compute the photometric luminosities for each galaxy when the run method is called using the passed instrument. If multiple instruments are desired this method can be called multiple times to add the new instruments.

The photometric luminosities are generated based on the lnu spectra and the instrument filters.

Parameters:: instruments (Instrument/InstrumentCollection) – The instruments to use for the photometric luminosities. This can be any number of instruments or instrument collections, they will all be combined into a single InstrumentCollection for this operation.

get_sfh(log10ages)[source]¶

Flag that the Pipeline should compute the binned SFH.

This will signal the Pipeline to compute the binned SFH when the run method is called.

The SFH is the binned star formation history based on an arbitrary set of age bins define din log10 space.

Parameters:: log10ages (array-like) – The log10 age axis of the SFH grid.

get_sfzh(log10ages, log10metallicities)[source]¶

Flag that the Pipeline should compute the SFZH grid.

This will signal the Pipeline to compute the SFZH grid when the run method is called.

The SFZH grid is the star formation history grid for each galaxy.

Parameters:

log10ages (array-like) – The log10 age axis of the SFZH grid.
log10metallicities (array-like) – The log10 metallicity axis of the SFZH grid.

get_spectra()[source]¶

Flag that the Pipeline should compute the rest frame spectra.

This will signal the Pipeline to compute the rest frame spectral luminosity density for each galaxy when the run method is called.

The spectra are generated based on the EmissionModel and the galaxy components.

Spectral flux densities can be computed with get_observed_spectra.

get_spectroscopy_fnu(*instruments)[source]¶

Flag that the Pipeline should compute the spectral flux density.

This will signal the Pipeline to compute the spectral flux density for each galaxy when the run method is called.

Parameters:: instruments (Instrument/InstrumentCollection) – The instruments to use for the spectral flux density. This can be any number of instruments or instrument collections, they will all be combined into a single InstrumentCollection for this operation.

get_spectroscopy_lnu(*instruments)[source]¶

Flag that the Pipeline should compute spectral luminosity density.

This will signal the Pipeline to compute the spectral luminosity density for each galaxy when the run method is called.

Parameters:: instruments (Instrument/InstrumentCollection) – The instruments to use for the spectral luminosity density. This can be any number of instruments or instrument collections, they will all be combined into a single InstrumentCollection for this operation.

property memory_usage¶

Return the memory usage of the Pipeline object.

Returns:: The memory usage in Megabytes.
Return type:: float

repartition_galaxies(galaxy_weights=None, random_seed=42)[source]¶: Given the galaxies repartition them across the ranks.

report_operations()[source]¶

Print the operations that will be performed by the Pipeline.

This will print out the operations that will be performed by the pipeline, including which will be written out and which will just be computed.

property results_memory_usage¶

Return the memory usage of the results stored on the Pipeline.

Returns:: The memory usage in Megabytes.
Return type:: float

run()[source]¶

Run the pipeline.

This will churn throuh the attached galaxies generating all the data requested using the get_* methods.

Only data flagged for saving will be held in memory with all other data cleared out.

Once the pipeline has run, the data can be written out to a file using the write method.

Note that as we loop over galaxies they will be removed from the pipeline to free up memory. This means that once the pipeline has run the galaxies will no longer be accessible from the pipeline object.

Raises:: PipelineNotReady – If the pipeline is not ready to a specific operation.

write(outpath, verbose=None)[source]¶

Write what we have produced to a HDF5 file.

Any get_* methods that have been called will have their results written to the HDF5 file. We consider the call to get_* to be the signal to write the data out. If a get_* method has not been called, but the data was needed by a subsequent get_* method, it will have been run but then discarded, e.g. calling get_photometry_fluxes will have run get_observed_spectra, which will have run get_spectra, but the results from get_spectra and get_observed_spectra will have been discarded.

Parameters:

outpath (str) – The path to the HDF5 file to write.
verbose (bool, optional) – If set, override the Pipeline verbose setting.