The Pipeline¶
If you have a galaxy catalog (either of Parametric
origin or from a simulation), an `EmissionModel
<../emission_models/emission_models.rst>`__, and a set of instruments you want observables for, you can easily write a pipeline to generate the observations you want using the Synthesizer UI. However, lets say you have a new catalog you want to run the same analysis on, or a whole different set of instruments you want to use. You could
modify your old pipeline or write a whole new pipeline, but thats a lot of work and boilerplate.
This is where the Pipeline
shines. Instead, of having to write a pipeline, the Pipeline
class is a high-level interface that allows you to easily generate observations for a given catalog, emission model, and set of instruments. All you need to do is set up the Pipeline
object, attach the galaxies, and pass the instruments to the observable methods you want to generate. Possible observables include:
Spectra.
Emission Lines.
Photometry.
Images (with or without PSF convolution/noise).
Spectral data cubes (IFUs) [WIP].
Instrument specific spectroscopy [WIP].
The Pipeline
will generate all the requested observations for all (compatible) instruments and galaxies, before writing them out to a standardised HDF5 format.
As a bonus, the abstraction into the Pipeline
class allows for easy parallelization of the analysis, not only over local threads but distributed over MPI.
In the following sections we will show how to instantiate and use a Pipeline
object to generate observations for a given catalog, emission model, and set of instruments.
Setting up a Pipeline
object¶
Before we instatiate a pipeline we need to define its “dependencies”. These are an emission model, a set of instruments, and importantly some galaxies to observe.
Defining an emission model¶
The EmissionModel
defines the emissions we’ll generate, including its origin and any reprocessing the emission undergoes. For more details see the EmissionModel
docs.
For demonstration, we’ll use a simple premade IntrinsicEmission
model which defines the intrinsic stellar emission (i.e. stellar emission without any ISM dust reprocessing).
[1]:
from synthesizer.emission_models import IntrinsicEmission
from synthesizer.grid import Grid
# Get the grid
grid_dir = "../../../tests/test_grid/"
grid_name = "test_grid"
grid = Grid(grid_name, grid_dir=grid_dir)
model = IntrinsicEmission(grid, fesc=0.1)
model.set_per_particle(True) # we want per particle emissions
Defining the instruments¶
We don’t need any instruments if all we want is spectra at the spectral resolution of the Grid
or emission lines. However, to get anything more sophisticated we need Instruments
that define the technical specifications of the observations we want to generate. For a full breakdown see the instrumentation docs.
Here we’ll define a simple set of instruments including a subset of NIRCam filters (capable of imaging with a 0.1 kpc resolution) and a set of UVJ top hat filters (only capable of photometry). We’ll pass these explicitly to the observable methods below to associate them with the observations we want to generate.
[2]:
import numpy as np
from unyt import angstrom, kpc
from synthesizer.instruments import UVJ, FilterCollection, Instrument
# Get the filters
lam = np.linspace(10**3, 10**5, 1000) * angstrom
webb_filters = FilterCollection(
filter_codes=[
f"JWST/NIRCam.{f}"
for f in ["F090W", "F150W", "F200W", "F277W", "F356W", "F444W"]
],
new_lam=lam,
)
uvj_filters = UVJ(new_lam=lam)
# Instatiate the instruments
webb_inst = Instrument("JWST", filters=webb_filters, resolution=1 * kpc)
uvj_inst = Instrument("UVJ", filters=uvj_filters)
instruments = webb_inst + uvj_inst
print(instruments)
+-------------------------------------------------------------------------------------------------+
| INSTRUMENT COLLECTION |
+-------------------+-----------------------------------------------------------------------------+
| Attribute | Value |
+-------------------+-----------------------------------------------------------------------------+
| all_filters | <synthesizer.instruments.filters.FilterCollection object at 0x7ff98ca6b760> |
+-------------------+-----------------------------------------------------------------------------+
| ninstruments | 2 |
+-------------------+-----------------------------------------------------------------------------+
| instrument_labels | [JWST, UVJ] |
+-------------------+-----------------------------------------------------------------------------+
| instruments | JWST: Instrument |
| | UVJ: Instrument |
+-------------------+-----------------------------------------------------------------------------+
Loading galaxies¶
You can load galaxies however you want but for this example we’ll load some CAMELS galaxies using the load_data
module.
[3]:
from synthesizer.load_data.load_camels import load_CAMELS_IllustrisTNG
# Create galaxy object
galaxies = load_CAMELS_IllustrisTNG(
"../../../tests/data/",
snap_name="camels_snap.hdf5",
group_name="camels_subhalo.hdf5",
physical=True,
)
/opt/hostedtoolcache/Python/3.10.17/x64/lib/python3.10/site-packages/unyt/array.py:1972: RuntimeWarning: invalid value encountered in divide
out_arr = func(
Instantiating the Pipeline
object¶
We have all the ingredients we need to instantiate a Pipeline
object. All we need to do now is pass them into the Pipeline
object alongside the number of threads we want to use during the analysis (in this notebook we’ll only use 1 for such a small handful of galaxies).
[4]:
from synthesizer.pipeline import Pipeline
pipeline = Pipeline(
emission_model=model,
nthreads=1,
verbose=1,
)
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⣀⡀⠒⠒⠦⣄⡀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⢀⣤⣶⡾⠿⠿⠿⠿⣿⣿⣶⣦⣄⠙⠷⣤⡀⠀⠀⠀⠀
⠀⠀⠀⣠⡾⠛⠉⠀⠀⠀⠀⠀⠀⠀⠈⠙⠻⣿⣷⣄⠘⢿⡄⠀⠀⠀
⠀⢀⡾⠋⠀⠀⠀⠀⠀⠀⠀⠀⠐⠂⠠⢄⡀⠈⢿⣿⣧⠈⢿⡄⠀⠀
⢀⠏⠀⠀⠀⢀⠄⣀⣴⣾⠿⠛⠛⠛⠷⣦⡙⢦⠀⢻⣿⡆⠘⡇⠀⠀
⠀⠀⠀+-+-+-+-+-+-+-+-+-+-+-+⡇⠀⠀
⠀⠀⠀|S|Y|N|T|H|E|S|I|Z|E|R|⠃⠀⠀
⠀⠀⢰+-+-+-+-+-+-+-+-+-+-+-+⠀⠀⠀
⠀⠀⢸⡇⠸⣿⣷⠀⢳⡈⢿⣦⣀⣀⣀⣠⣴⣾⠟⠁⠀⠀⠀⠀⢀⡎
⠀⠀⠘⣷⠀⢻⣿⣧⠀⠙⠢⠌⢉⣛⠛⠋⠉⠀⠀⠀⠀⠀⠀⣠⠎⠀
⠀⠀⠀⠹⣧⡀⠻⣿⣷⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣠⡾⠃⠀⠀
⠀⠀⠀⠀⠈⠻⣤⡈⠻⢿⣿⣷⣦⣤⣤⣤⣤⣤⣴⡾⠛⠉⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠈⠙⠶⢤⣈⣉⠛⠛⠛⠛⠋⠉⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
[00000.04]: Root emission model: intrinsic
[00000.00]: EmissionModel contains 10 individual models.
[00000.00]: EmissionModels split by emitter:
[00000.00]: - galaxy: 0
[00000.00]: - stellar: 10
[00000.00]: - blackhole: 0
[00000.00]: EmissionModels split by operation type:
[00000.00]: - extraction: 3
[00000.00]: - combination: 3
[00000.00]: - attenuating: 4
[00000.00]: - generation: 0
Notice that we got a log out of the Pipeline
object detailing the basic setup. The Pipeline
will automatically output logging information to the console but this can be supressed by passing verbose=0
which limits the outputs to saying hello, goodbye, and any errors that occur.
Adding analysis functions¶
We could just run the analysis now and get whatever predefined outputs we want. However, we can also add our own analysis functions to the Pipeline
object. These functions will be run on each galaxy in the catalog and can be used to generate any additional outputs we want. Importantly, these functions will be run after all other analysis has finished so they can make use of any outputs generated by the Pipeline
object. They will also be run in the order they have been added allowing
access to anything derived in previous analysis functions.
Any extra analysis functions must obey the following rules:
It must calculate the “result” for a single galaxy at a time.
The function’s first argument must be the galaxy to calculate for.
It can take any number of additional arguments and keyword arguments.
It must either:
Return an array of values or a scalar, such that
np.array(<list of results>)
is a valid operation. In other words, the results once combined for all galaxies should be an array of shape(n_galaxies, <result shape>)
.Return a dictionary where each result at the “leaves” of the dictionary structure is an array of values or a scalar, such that
np.array(<list of results>)
is a valid operation. In other words, the dictionary of results once combined for all galaxies should be a dictionary with an array of shape(n_galaxies, <result shape>)
at each “leaf”.
Below we’ll define an analysis function to compute stellar mass radii of each galaxy.
[5]:
def get_stellar_mass_radius(gal, fracs):
"""Compute the half mass radii.
Args:
gal (Galaxy):
The galaxy to compute the half light radius of.
fracs (list of float):
The fractional radii to compute.
"""
result = {}
for frac in fracs:
result[str(frac).replace(".", "p")] = gal.stars.get_attr_radius(
"current_masses", frac
)
return result
To add this to the Pipeline
we need to pass it along with a string defining the key under which the results will be stored in the HDF5 file and the fracs
argument it requires.
[6]:
pipeline.add_analysis_func(
get_stellar_mass_radius,
result_key="Stars/HalfMassRadius",
fracs=(0.2, 0.5, 0.8),
)
[00000.01]: Added analysis function: Stars/HalfMassRadius
This can also be done with simple lambda
functions to include galaxy attributes in the output. For instance, the redshift.
[7]:
pipeline.add_analysis_func(lambda gal: gal.redshift, result_key="Redshift")
[00000.01]: Added analysis function: Redshift
Running the pipeline¶
To run the pipeline we just need to attach our galaxies and then call the various observable generation methods (including any of the necessary arguments and/or instruments for each generation method). This approach allows you to explicitly control which observables you want to generate with a single line of code for each. Each of these getter methods signals to the Pipeline
which observables you want to generate and eventually write out to the HDF5 file.
Loading the galaxies¶
First we’ll attach the galaxies.
[8]:
pipeline.add_galaxies(galaxies)
[00000.02]: Galaxies memory footprint: 0.33 MB
[00000.02]: Adding 10 galaxies took 1.536 ms.
Generating the observables¶
Now we have the galaxies we can generate their observables. We do this by calling the various observable generation methods on the Pipeline
object to signal which observables we want, followed by the run
method to perform the analysis.
We’ll start with the spectra.
[9]:
pipeline.get_spectra()
If we want fluxes, we can pass an astropy.cosmology
object to the get_observed_fluxes
method to get the fluxes in the observer frame.
[10]:
from astropy.cosmology import Planck18 as cosmo
pipeline.get_observed_spectra(cosmo)
Next we’ll generate the emission lines. Here we can pass exactly which emission lines we want to generate based on line ID. We’ll just generate all lines offered by the Grid
.
[11]:
pipeline.get_lines(line_ids=grid.available_lines)
Again, to get observed fluxes we can pass an astropy.cosmology
object to the get_observed_lines
method.
[12]:
pipeline.get_observed_lines(cosmo)
Next, the photometry, where we need to pass the instruments defining the filters we want to apply. Here we’ll generate rest frame luminosities for the UVJ top hats and observed fluxes for the NIRCam filters.
[13]:
pipeline.get_photometry_luminosities(uvj_inst)
pipeline.get_photometry_fluxes(webb_inst)
Finally, we’ll generate the images. Again, these are split into luminosity and flux flavours. Here we define our field of view and pass that into each method alongside the webb instrument. We are also doing “smoothed” imaging where each particle is smoothed over its SPH kernel. For this style of image generation we need to pass the kernel array, which we’ll extract here.
Had we defined instruments with PSFs and/or noise these would be applied automatically.
[14]:
from synthesizer.kernel_functions import Kernel
# Get the SPH kernel
sph_kernel = Kernel()
kernel = sph_kernel.get_kernel()
pipeline.get_images_luminosity(webb_inst, fov=10 * kpc, kernel=kernel)
pipeline.get_images_flux(webb_inst, fov=10 * kpc, kernel=kernel)
Now we have every observable we want signalled for generation we can run the pipeline. This will generate all the observables on a galaxy by galaxy basis removing each galaxy from memory to reduce memory usage. This whole process will automatically be parallelized over the number of threads we defined when we instantiated the Pipeline
object where available.
To run everything we simply call the run
method.
[15]:
pipeline.run()
[00000.59]: Using 2 instruments.
[00000.59]: Instruments have 18 filters in total.
[00000.59]: Included instruments: UVJ, JWST
[00000.59]: Instruments split by capability:
[00000.59]: - photometry: 2
[00000.59]: - spectroscopy: 0
[00000.59]: - imaging: 1
[00000.59]: - resolved spectroscopy: 0
[00000.59]: Pipeline memory footprint (MB): 576.7287836074829
[00000.59]: Running the pipeline...
[00000.59]: +------------+------------+------------+------------+------------+
[00000.59]: | Galaxy #| Nstars| Ngas| Nbh| dt (s)|
[00000.59]: +------------+------------+------------+------------+------------+
[00001.46]: | 0| 278| 64| None| 0.87|
[00004.12]: | 1| 656| 0| None| 2.66|
[00005.15]: | 2| 441| 25| None| 1.03|
[00007.84]: | 3| 726| 0| None| 2.68|
[00008.09]: | 4| 8| 180| None| 0.25|
[00008.37]: | 5| 26| 0| None| 0.27|
[00010.69]: | 6| 569| 0| None| 2.32|
[00011.19]: | 7| 165| 0| None| 0.50|
[00011.67]: | 8| 149| 0| None| 0.48|
[00013.44]: | 9| 461| 0| None| 1.77|
[00013.45]: +------------+------------+------------+------------+------------+
[00013.45]: Computing 80 Lnu Spectra took 3.05 s
[00013.45]: Computing 80 Fnu Spectra took 2.07 s
[00013.45]: Computing 720 Luminosities took 3.64 s
[00013.45]: Computing 720 Fluxes took 2.84 s
[00013.45]: Computing 17200 Emission Line Luminosities took 431.41 ms
[00013.45]: Computing 17200 Emission Line Fluxes took 200.32 ms
[00013.45]: Computing 720 Luminosity Images took 251.18 ms
[00013.45]: Computing 720 Flux Images took 329.90 ms
[00013.45]: Running 20 extra analyses took 5.64 ms
[00013.45]: Unpacking results took 8.66 ms
[00013.45]: Computing Stars/HalfMassRadius took 5.57 ms
[00013.45]: Computing Redshift took 0.02 ms
[00013.45]: Cleaning outputs took 7.038 ms.
Writing out the data¶
Finally, we write out the data to a HDF5 file. This file will contain all the observables we generated, as well as any additional analysis we ran. This file is structure to mirror the structure of Synthesizer objects, with each galaxy being a group, each component being a subgroup, and each individual observable being a dataset (or set of subgroups with the observables as datasets at their leaves in the case of a dicitonary attribute).
To write out the data we just pass the path to the file we want to write to to the write
method.
Note that we all passing verbose=0
to silence the dataset timings for these docs. Otherwise, we would get timings for the writing of individual datasets. In the wild these timings are useful but here they’d just bloat the demo.
[16]:
pipeline.write("output.hdf5", verbose=0)
[00013.96]: Writing data took 501.463 ms.
[00013.96]: Total synthesis took 13.966 s.
[00013.96]: Goodbye!
Below is a view into the HDF5 file produced by the above pipeline (as shown by H5forest).

Putting it all together¶
Here is what the pipeline would look like without all the descriptive fluff…
[17]:
# Create galaxy object
galaxies = load_CAMELS_IllustrisTNG(
"../../../tests/data/",
snap_name="camels_snap.hdf5",
group_name="camels_subhalo.hdf5",
physical=True,
)
pipeline = Pipeline(model, report_memory=True)
pipeline.add_analysis_func(
get_stellar_mass_radius,
result_key="Stars/HalfMassRadius",
fracs=(0.2, 0.5, 0.8),
)
pipeline.add_analysis_func(lambda gal: gal.redshift, result_key="Redshift")
pipeline.add_galaxies(galaxies)
pipeline.get_spectra()
pipeline.get_observed_spectra(cosmo)
pipeline.get_lines(line_ids=grid.available_lines)
pipeline.get_observed_lines(cosmo)
pipeline.get_photometry_luminosities(instruments)
pipeline.get_photometry_fluxes(webb_inst)
pipeline.get_images_luminosity(webb_inst, fov=10 * kpc, kernel=kernel)
pipeline.get_images_flux(webb_inst, fov=10 * kpc, kernel=kernel)
pipeline.run()
pipeline.write("output.hdf5", verbose=0)
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⣀⡀⠒⠒⠦⣄⡀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⢀⣤⣶⡾⠿⠿⠿⠿⣿⣿⣶⣦⣄⠙⠷⣤⡀⠀⠀⠀⠀
⠀⠀⠀⣠⡾⠛⠉⠀⠀⠀⠀⠀⠀⠀⠈⠙⠻⣿⣷⣄⠘⢿⡄⠀⠀⠀
⠀⢀⡾⠋⠀⠀⠀⠀⠀⠀⠀⠀⠐⠂⠠⢄⡀⠈⢿⣿⣧⠈⢿⡄⠀⠀
⢀⠏⠀⠀⠀⢀⠄⣀⣴⣾⠿⠛⠛⠛⠷⣦⡙⢦⠀⢻⣿⡆⠘⡇⠀⠀
⠀⠀⠀+-+-+-+-+-+-+-+-+-+-+-+⡇⠀⠀
⠀⠀⠀|S|Y|N|T|H|E|S|I|Z|E|R|⠃⠀⠀
⠀⠀⢰+-+-+-+-+-+-+-+-+-+-+-+⠀⠀⠀
⠀⠀⢸⡇⠸⣿⣷⠀⢳⡈⢿⣦⣀⣀⣀⣠⣴⣾⠟⠁⠀⠀⠀⠀⢀⡎
⠀⠀⠘⣷⠀⢻⣿⣧⠀⠙⠢⠌⢉⣛⠛⠋⠉⠀⠀⠀⠀⠀⠀⣠⠎⠀
⠀⠀⠀⠹⣧⡀⠻⣿⣷⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣠⡾⠃⠀⠀
⠀⠀⠀⠀⠈⠻⣤⡈⠻⢿⣿⣷⣦⣤⣤⣤⣤⣤⣴⡾⠛⠉⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠈⠙⠶⢤⣈⣉⠛⠛⠛⠛⠋⠉⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
[00000.67]: Root emission model: intrinsic
[00000.00]: EmissionModel contains 10 individual models.
[00000.00]: EmissionModels split by emitter:
[00000.00]: - galaxy: 0
[00000.00]: - stellar: 10
[00000.00]: - blackhole: 0
[00000.00]: EmissionModels split by operation type:
[00000.00]: - extraction: 3
[00000.00]: - combination: 3
[00000.00]: - attenuating: 4
[00000.00]: - generation: 0
[00000.00]: Added analysis function: Stars/HalfMassRadius
[00000.00]: Added analysis function: Redshift
[00000.00]: Galaxies memory footprint: 0.33 MB
[00000.00]: Adding 10 galaxies took 0.936 ms.
[00000.00]: Using 2 instruments.
[00000.00]: Instruments have 18 filters in total.
[00000.00]: Included instruments: JWST, UVJ
[00000.00]: Instruments split by capability:
[00000.00]: - photometry: 2
[00000.00]: - spectroscopy: 0
[00000.00]: - imaging: 1
[00000.00]: - resolved spectroscopy: 0
[00000.00]: Pipeline memory footprint (MB): 578.2429323196411
[00000.00]: Running the pipeline...
[00000.00]: +------------+------------+------------+------------+------------+------------------------+------------------------+------------------------+
[00000.00]: | Galaxy #| Nstars| Ngas| Nbh| dt (s)| Memory footprint (MB)| Galaxy memory (MB)| Results memory (MB)|
[00000.00]: +------------+------------+------------+------------+------------+------------------------+------------------------+------------------------+
[00000.92]: | 0| 278| 64| None| 0.91| 579.60| 0.30| 1.42|
[00004.21]: | 1| 656| 0| None| 3.29| 580.89| 0.24| 2.75|
[00005.67]: | 2| 441| 25| None| 1.45| 582.18| 0.21| 4.09|
[00009.40]: | 3| 726| 0| None| 3.72| 583.46| 0.15| 5.42|
[00009.71]: | 4| 8| 180| None| 0.30| 584.79| 0.13| 6.77|
[00010.05]: | 5| 26| 0| None| 0.33| 586.12| 0.12| 8.10|
[00012.94]: | 6| 569| 0| None| 2.88| 587.41| 0.08| 9.44|
[00013.54]: | 7| 165| 0| None| 0.59| 588.73| 0.06| 10.77|
[00014.07]: | 8| 149| 0| None| 0.52| 590.07| 0.04| 12.13|
[00016.83]: | 9| 461| 0| None| 2.76| 591.36| 0.00| 13.47|
[00016.84]: +------------+------------+------------+------------+------------+------------------------+------------------------+------------------------+
[00016.84]: Computing 80 Lnu Spectra took 3.21 s
[00016.84]: Computing 80 Fnu Spectra took 2.07 s
[00016.84]: Computing 720 Luminosities took 5.14 s
[00016.84]: Computing 720 Fluxes took 5.16 s
[00016.84]: Computing 17200 Emission Line Luminosities took 429.17 ms
[00016.84]: Computing 17200 Emission Line Fluxes took 199.60 ms
[00016.84]: Computing 720 Luminosity Images took 261.22 ms
[00016.84]: Computing 720 Flux Images took 257.18 ms
[00016.84]: Running 20 extra analyses took 5.73 ms
[00016.84]: Unpacking results took 8.87 ms
[00016.84]: Computing Stars/HalfMassRadius took 5.66 ms
[00016.84]: Computing Redshift took 0.02 ms
[00016.84]: Cleaning outputs took 6.558 ms.
[00017.34]: Writing data took 495.452 ms.
[00017.34]: Total synthesis took 17.344 s.
[00017.34]: Goodbye!
Note here that we set report_memory=True
when we instantiated the Pipeline
. This caused the pipeline to probe the memory usage after each galaxy is processed and report it. While this is extremely useful information for debugging purposes, it is also extremely expensive to calculate and is thus turned off by default.
Running a subset of observables¶
If you only want to generate a subset of observables then its as simple as only calling the methods for those observables. However, some observables are dependent on others.
For instance, to generate observed fluxes you need to have already generated observer frame spectra. If you signal you want one of these “downstream” observables without signalling the “upstream” observable then the Pipeline
will automatically generate the upstream observable for you but they will not be written out to the HDF5 file.
The only difference is that you must supply the method you are calling with the arguments required to generate the upstream observable. Don’t worry though, the Pipeline
will automatically tell you what is missing if you forget something.
We demonstrate this below by only selecting only the observed fluxes and observed emission lines.
[18]:
# Create galaxy object
galaxies = load_CAMELS_IllustrisTNG(
"../../../tests/data/",
snap_name="camels_snap.hdf5",
group_name="camels_subhalo.hdf5",
physical=True,
)
# Set up the pipeline
pipeline = Pipeline(model)
pipeline.add_galaxies(galaxies)
pipeline.get_observed_lines(cosmo, line_ids=grid.available_lines)
pipeline.get_photometry_fluxes(webb_inst, cosmo=cosmo)
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⣀⡀⠒⠒⠦⣄⡀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⢀⣤⣶⡾⠿⠿⠿⠿⣿⣿⣶⣦⣄⠙⠷⣤⡀⠀⠀⠀⠀
⠀⠀⠀⣠⡾⠛⠉⠀⠀⠀⠀⠀⠀⠀⠈⠙⠻⣿⣷⣄⠘⢿⡄⠀⠀⠀
⠀⢀⡾⠋⠀⠀⠀⠀⠀⠀⠀⠀⠐⠂⠠⢄⡀⠈⢿⣿⣧⠈⢿⡄⠀⠀
⢀⠏⠀⠀⠀⢀⠄⣀⣴⣾⠿⠛⠛⠛⠷⣦⡙⢦⠀⢻⣿⡆⠘⡇⠀⠀
⠀⠀⠀+-+-+-+-+-+-+-+-+-+-+-+⡇⠀⠀
⠀⠀⠀|S|Y|N|T|H|E|S|I|Z|E|R|⠃⠀⠀
⠀⠀⢰+-+-+-+-+-+-+-+-+-+-+-+⠀⠀⠀
⠀⠀⢸⡇⠸⣿⣷⠀⢳⡈⢿⣦⣀⣀⣀⣠⣴⣾⠟⠁⠀⠀⠀⠀⢀⡎
⠀⠀⠘⣷⠀⢻⣿⣧⠀⠙⠢⠌⢉⣛⠛⠋⠉⠀⠀⠀⠀⠀⠀⣠⠎⠀
⠀⠀⠀⠹⣧⡀⠻⣿⣷⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣠⡾⠃⠀⠀
⠀⠀⠀⠀⠈⠻⣤⡈⠻⢿⣿⣷⣦⣤⣤⣤⣤⣤⣴⡾⠛⠉⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠈⠙⠶⢤⣈⣉⠛⠛⠛⠛⠋⠉⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
[00000.25]: Root emission model: intrinsic
[00000.00]: EmissionModel contains 10 individual models.
[00000.00]: EmissionModels split by emitter:
[00000.00]: - galaxy: 0
[00000.00]: - stellar: 10
[00000.00]: - blackhole: 0
[00000.00]: EmissionModels split by operation type:
[00000.00]: - extraction: 3
[00000.00]: - combination: 3
[00000.00]: - attenuating: 4
[00000.00]: - generation: 0
[00000.00]: Galaxies memory footprint: 0.33 MB
[00000.00]: Adding 10 galaxies took 0.932 ms.
If we want to see which observables we have signalled for generation we can call the report_operations
method, which will print out a list of all observables we have signalled for generation and whether they will be written out to the HDF5 file.
[19]:
pipeline.report_operations()
[00000.00]: ------------------------------------------------------------
[00000.00]: Compute? Write?
[00000.00]: Line of Sight Optical Depths False N/A
[00000.00]: SFZH False False
[00000.00]: SFH False False
[00000.00]: Lnu Spectra True False
[00000.00]: Fnu Spectra True False
[00000.00]: Photometric Luminosities False False
[00000.00]: Photometric Fluxes True True
[00000.00]: Emission Line Luminosities True False
[00000.00]: Emission Line Fluxes True True
[00000.00]: Luminosity Images False False
[00000.00]: Flux Images False False
[00000.00]: ------------------------------------------------------------
Note here we have can see all the possible operations we could have signalled for generation but only the observed fluxes and observed emission lines will be written out to the HDF5 file since these are the getters we called.
Hybrid parallelism with MPI¶
Above we demonstrated how to run a pipeline using only local shared memory parallelism. We can also use mpi4py
to not only use the shared memory parallelism but also distribute the analysis across multiple nodes (hence “hybrid parallelism”).
Instatiating a Pipeline
when using MPI¶
To make use of MPI we only need to make a couple changes to running the pipeline. The first is simply that we need to pass the comm
object to the Pipeline
object when we instantiate it.
from mpi4py import MPI
pipeline = Pipeline(
gal_loader_func=galaxy_loader,
emission_model=model,
n_galaxies=10,
instruments=instruments,
nthreads=4,
verbose=1,
comm=MPI.COMM_WORLD,
)
Here verbose=1
will mean only rank 0 will output logging information. If you want all ranks to output logging information you should set verbose=2
. verbose=0
will silence all outputs apart from the greeting, total timing and errors as before.
Adding galaxies with MPI¶
We also need to partition the galaxies before we attach them to a Pipeline
. For now we provide no mechanisms for this, it is entirely up to you how to split galaxies across the ranks. The important thing is that you only pass the galaxies on a rank to add_galaxies
.
Writing out results with MPI¶
When running a distributed Pipeline
you have several options for writing out the data. Regardless of which approach is used the process to write the outputs is the same as the shared memory version shown above (i.e. we call the write
method). We detail each of these below.
Collective I/O [WIP]¶
If you have installed h5py
with parallel HDF5 its possible to write collectively to a single HDF5 file. A Pipeline
will detect if parallel h5py
is available and will automatically chose this option if possible.
Individual I/O¶
When collective I/O operations aren’t available we produce a file per MPI rank. This is the most efficient method since communicating the results to a single rank for outputting is not only extremely time consuming but can also lead to communication errors when the outputs are sufficiently large.
Once the rank files have been written out we provide 2 options for combining them into a single file, note that working with the rank files is entirely possible though.
Combination into a single physical file: calling
combine_files
will copy all the data across from each rank file into a single file before deleting each individual rank file. This is clean with regard to what is left, but is extremely time consuming.Combination into a single virtual file: calling
combine_files_virtual
will make a single file with symlinks to all the rank data in virtual datasets. This is far more efficient and gives the same interaction behaviour as the copy option (i.e. a single file to interact with) but does mean all the rank files must be kept alongside the virtual file.