Complex Library Generation¶

In addition to the basic library generation demonstrated previously, Synference also supports more complex scenarios, such as libraries with more supplementary parameters, custom observation transformations, and more complicated model setups. Below is an example of how to generate a more complex model library using Synference.

Firstly we’ll just import the necessary modules and set up the synthesizer model.

[1]:

import numpy as np
from astropy.cosmology import Planck18 as cosmo
from synthesizer.emission_models import Greybody
from synthesizer.emission_models.attenuation import (
    Calzetti2000,
)
from synthesizer.emission_models.stellar.pacman_model import (
    PacmanEmission,
)
from synthesizer.grid import Grid
from synthesizer.instruments import FilterCollection, Instrument
from synthesizer.parametric import ZDist
from tqdm import tqdm
from unyt import K

And our Synference components - the GalaxyBasis class, and some utility functions we will use later.

[2]:

from synference import (
    GalaxyBasis,
    calculate_balmer_decrement,
    calculate_beta,
    calculate_colour,
    calculate_d4000,
    calculate_mass_weighted_age,
    calculate_muv,
    calculate_sfh_quantile,
    draw_from_hypercube,
    generate_random_DB_sfh,
)

/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

We’ll set up Synthesizer similarly to before - see the basic library generation example for more details.

For this example we’ll use a set of filter used in wide area surveys, including VISTA, Subaru Hyper Suprime-Cam, Euclid, and Spitzer IRAC.

Note that if you’re running this step on a cluster node without internet access, you’ll need to create the instrument file beforehand and pass in the HDF5 file path insead.

[3]:

filter_codes = [
    "Paranal/VISTA.Z",
    "Paranal/VISTA.Y",
    "Paranal/VISTA.J",
    "Paranal/VISTA.H",
    "Paranal/VISTA.Ks",
    "Subaru/HSC.g",
    "Subaru/HSC.r",
    "Subaru/HSC.i",
    "Subaru/HSC.z",
    "Subaru/HSC.Y",
    "CFHT/MegaCam.u",
    "CFHT/MegaCam.g",
    "CFHT/MegaCam.r",
    "CFHT/MegaCam.i",
    "CFHT/MegaCam.z",
    "Euclid/VIS.vis",
    "Euclid/NISP.Y",
    "Euclid/NISP.J",
    "Euclid/NISP.H",
    "Spitzer/IRAC.I1",
    "Spitzer/IRAC.I2",
]

filterset = FilterCollection(filter_codes=filter_codes)

instrument = Instrument("EuclidDeep", filters=filterset)

grid = Grid("test_grid")

Now we can configure our model, which we will make more complex. Firstly we define the prior ranges for our main galaxy parameters.

This model will use a more complex dust attenuation model, with variable UV slope and 2175A bump strength, as well as a variable escape fraction for ionizing photons.

[4]:

N_models = 1000

redshift = (0.01, 14)
masses = (4, 12)  # log(M/Msun)=4 to log(M/Msun)=12
logAv = (-3, 0.7)  # Av=0.001 to Av=5
log_zmet = (-4, -1.39)  # Z=0.0001 to Z=0.04
fesc = (0.0, 1.0)  # Fraction of ionizing photons that escape the galaxy
slope = (-0.4, 1.1)  # UV slope modification to Calzetti law
bump_strength = (0.0, 3.0)  # 2175A bump strength in Calzetti law


prior_ranges = {
    "redshift": redshift,
    "log_masses": masses,
    "log_Av": logAv,  # Av in magnitudes
    "log_zmet": log_zmet,
    "fesc": fesc,
    "slope": slope,
    "bump_strength": bump_strength,
}

Star Formation History¶

We’ll use a non-parametric ‘Dense Basis’ SFH (Iyer et al. 2019), where we model the time at which different quantiles of stellar mass formed. Synference provides a helper module for generating these Star Formation Histories.

We will use 3 quantiles, e.g. \(t_{25}, t_{50}, t_{75}\), so we will simply add three dummy parameters to our LHC sampling of the parameter space, and then replace them afterward. We need to set the concentration parameter \(\alpha\) for the Dirichilet prior on the SFH, for which we will use \(\alpha=3\). This controls the correlation between different SFH quantiles, where lower values will have more rapdily varying star formation histories. We also set a prior on the recent SFR, in terms of the sSFR, which normalizes by the stellar mass. This allows for a range from quiescent to highly star-forming galaxies.

[5]:

tx_alpha = 3

for i in range(3):
    j = 100 * (i + 1) / (4)
    prior_ranges[f"sfh_quantile_{j:.0f}"] = (0, 1)

prior_ranges["ssfr"] = (-14, -7)

Now we will sample these parameters from our hypercube. Note that we set a log prior on dust attenuation \(A_V\), but we want to sample linear dust attenuation, so we get the model to ‘unlog’ the parameter.

[6]:

all_param_dict = draw_from_hypercube(prior_ranges, N_models, unlog_keys=["log_Av"])

Now we will create our metallicity distributions.

[7]:

Z_dists = [
    ZDist.DeltaConstant(log10metallicity=log_z)
    for log_z in tqdm(all_param_dict["log_zmet"], desc="Creating ZDist")
]

Creating ZDist: 100%|██████████| 1000/1000 [00:00<00:00, 215092.51it/s]

Now we can create the star-formation history. We are using a specific prior in specific star-formation rate here, but we need to provide log SFR to the function, so we calculate this inside the loop.

[8]:

# Draw SFH params from prior
sfh_models = []
for i in tqdm(range(N_models), desc="Generating SFH models"):
    z = all_param_dict["redshift"][i]
    logmass = all_param_dict["log_masses"][i]
    logssfr = all_param_dict["ssfr"][i]
    logsfr = logmass + logssfr
    sfh, tx = generate_random_DB_sfh(
        Nparam=3,
        tx_alpha=tx_alpha,
        redshift=z,
        logsfr=logsfr,
        logmass=logmass,
    )
    for j in range(3):
        all_param_dict[f"sfh_quantile_{100 * (j + 1) / (3 + 1):.0f}"][i] = tx[j]
    sfh_models.append(sfh)

Generating SFH models: 100%|██████████| 1000/1000 [00:06<00:00, 165.36it/s]

Now we will set up two functions, which are used to convert parameters from one form to another inside the model, and allow the SFH model to be serialized and re-created. This is a flexible system which should allow complex transformations.

These are required because synference automatically looks for varying parameters which are stored on the Synthesizer galaxies and emitters, but if they are in a complex form (e.g. the Dense Basis tuple), then we need to explain how to understand them to the code.

[9]:

def make_db_tuple(params):
    """Constructs the DenseBasis tuple from the SFH."""
    nquant = 0
    for key in params:
        if key.startswith("sfh_quantile_"):
            nquant += 1

    mass_quantiles = np.linspace(0, 1, nquant + 2)[1:-1]  # Exclude the 0 and 1 quantiles

    db_tuple = [params["log_mass"], params["log_sfr"], nquant] + [
        params[f"sfh_quantile_{int(q * 100)}"] for q in mass_quantiles
    ]
    return db_tuple  # Return a tuple of (log_mass, SFR, nquant, [quantiles...])


def db_sf_convert(param, param_dict, Nparam_SFH=3):
    """Converts from a DenseBasis tuple back to parameters."""
    db_tuple = param_dict["db_tuple"]
    # dp_tuple has the folliwng
    # mass, sfr, tx_alpha, *sfh_quantiles
    if param.startswith("sfh_quantile_"):
        # Convert the SFH quantile parameters to the Dense Basis SFH format
        j = int(np.round(int(param.split("_")[-1]) / 100 * (Nparam_SFH + 1)))
        return db_tuple[j + 2]  # +3 because first three are mass, sfr, tx_alpha
    elif param == "log_sfr":
        # Convert log_sfr to the Dense Basis SFH format
        return db_tuple[1]
    elif param == "log_masses":
        # Convert log_masses to the Dense Basis SFH format
        return db_tuple[0]
    elif param == "tx_alpha":
        # Convert tx_alpha to the Dense Basis SFH format
        return db_tuple[2]
    elif param == "log_ssfr":
        # Convert log_ssfr to the Dense Basis SFH format
        return db_tuple[1] - db_tuple[0]
    else:
        raise ValueError(f"Unknown parameter {param.str} in db_tuple conversion.")

Now we can set up some parameter transformations. These are for when we want to sample a parameter which is not what is used directly in Synthesizer. The simplest example is our use of the V-band attenuation \(A_V\), whereas in reality to generate a model we must provide the V-band optical depth \(\tau_V\), to Synthesizer. So we provide a function to Synference to allow it to do this conversion, which is simply \(A_V = \tau_V * 1.086\)

These generally take the form of a dictionary, where the key is the parameter name required by the Synthesizer model, and value is a two component tuple, where the first value is a new name (or a list of new names, for multiple parameters), and the second value the conversion function.

We will also save the inverse functions in a separate dictionary. These are only used if we want to recreate this simulator later, to generate SEDs from a set of input parameters. This is useful to recover SEDs for observations when performing inference.

[10]:

alt_parametrizations = {
    "tau_v": ("Av", lambda x: x["tau_v"] * 1.086),
    "db_tuple": (
        ["log_sfr"] + [f"sfh_quantile_{100 * (j + 1) / (3 + 1):.0f}" for j in range(3)],
        db_sf_convert,
    ),
}

param_transforms_to_save = {
    "tau_v": lambda x: x["Av"] / 1.086,
    "db_tuple": make_db_tuple,
}

Emission Models¶

Now we will set up our complex emission model which supports our priors, with variable escape fraction and flexible attenuation law.

The basic concept is simple: Any emission model parameter set with a string, rather than an explicit value, will be inherited from the emission model, or emitter. So in this case, for ‘tau_v’, ‘fesc’, ‘bump_strength’, and ‘slope’, Synthesizer will look for these parameters to be set on the individual Galaxy or Star instances.

We will also set the emission key we will save, which is the root of the emission model, named ‘total’.

[11]:

dust_emission = Greybody(temperature=40 * K, emissivity=1.5)
dust_curve = Calzetti2000(slope="slope", ampl="bump_strength")

print("Creating emission model.")
emission_model = PacmanEmission(
    grid=grid, tau_v="tau_v", dust_curve=dust_curve, dust_emission=dust_emission, fesc="fesc"
)

emission_key = emission_model.label

print(f"Root emission model label is: {emission_key}")

Creating emission model.
Root emission model label is: total

To ensure these parameters are set on each galaxy, we will create our final input, the galaxy_params dictionary. This is simply a dictionary of parameter name and value array pairs for every galaxy.

[12]:

galaxy_params = {
    "fesc": all_param_dict["fesc"],
    "tau_v": all_param_dict["Av"] / 1.086,
    "bump_strength": all_param_dict["bump_strength"],
    "slope": all_param_dict["slope"],
}

Now we can instantiate the GalaxyBasis, into which we will pass these inputs. This won’t do much until we call the correct function to build the library. Note that we set ‘build_library’ = False, because we have already generated our full parameters sample. If we wanted we could also pass in a smaller set of parameter values instead, and set build_library=True, and the code would generate all the combinations of those parameters.

[13]:

basis = GalaxyBasis(
    model_name="sps_Euclid_test",
    redshifts=all_param_dict["redshift"],
    grid=grid,
    emission_model=emission_model,
    sfhs=sfh_models,
    cosmo=cosmo,
    instrument=instrument,
    metal_dists=Z_dists,
    galaxy_params=galaxy_params,
    alt_parametrizations=alt_parametrizations,
    redshift_dependent_sfh=True,
    build_library=False,
    log_stellar_masses=all_param_dict["log_masses"],
)

2025-11-13 20:03:55,469 | synference | INFO     | Generating library directly from provided parameter samples.

Something else we can do here to improve the utility of our model is add more parameters to be saved and stored. Synference provides a set of these parameters, to save things like the surviving stellar mass, UV magnitude, \(\beta\) slope, D4000 break strength, UVJ colors, etc. We can see the full list here:

[14]:

from synference import SUPP_FUNCTIONS

SUPP_FUNCTIONS()

[14]:

Available supplementary functions:
  - calculate_MUV
  - calculate_Ndot_ion
  - calculate_agn_fraction
  - calculate_balmer_decrement
  - calculate_beta
  - calculate_burstiness
  - calculate_colour
  - calculate_d4000
  - calculate_flux_weighted_age
  - calculate_line_ew
  - calculate_line_flux
  - calculate_lum_weighted_age
  - calculate_mass_weighted_age
  - calculate_muv
  - calculate_sfh_quantile
  - calculate_sfr
  - calculate_surviving_mass
  - calculate_xi_ion0

We can pass in these functions to our GalaxyBasis.create_mock_library function, and they will be run for every galaxy and the output stored in the library.

The functions should take the galaxy as the first argument, and then the following arguments (if any) will be set by position as demonstrated below. The keys will be the parameter names. The return should be either a single value (float, string, unyt_quantity) or a dictionary which maps to the emission model names (e.g to record \(\beta\) slope for different components)

[15]:

supp_params = {
    "mUV": (calculate_muv, cosmo),
    "sfh_quant_25": (calculate_sfh_quantile, 0.25, True),  # Calculate SFH quantile at 25%
    "sfh_quant_50": (calculate_sfh_quantile, 0.50, True),  # Calculate SFH quantile at 50%
    "sfh_quant_75": (calculate_sfh_quantile, 0.75, True),  # Calculate SFH quantile at 75%
    "UV": (calculate_colour, "U", "V", emission_key, True),  # Calculate UV colour (rest-frame)
    "VJ": (calculate_colour, "V", "J", emission_key, True),  # Calculate VJ colour (rest-frame)
    "d4000": (calculate_d4000, emission_key),  # Calculate D4000 using the emission model
    "beta": (calculate_beta, emission_key),
    "balmer_decrement": (calculate_balmer_decrement, emission_key),
    "mass_weighted_age": calculate_mass_weighted_age,
}

Library Generation¶

Now we can run the final method to create the output catalogue. There are several things to note here. We can set the number of processes to use, to make use of multiple threads. We can also set the batch size, which will split the generation into multiple HDF5 files which are later combined. This is useful to avoid running out of RAM with large libraries.

We can also set the output type - in this case it is “photometry”, but if instead we made in “spectra”, we would generate a library of spectra which we could infer from as well.

[16]:

basis.create_mock_library(
    emission_model_key=emission_key,
    out_name="library_Euclid_test",
    out_dir="./",
    overwrite=True,
    n_proc=1,
    verbose=False,
    batch_size=10_000,
    parameter_transforms_to_save=param_transforms_to_save,
    cat_type="photometry",
    **supp_params,
)

2025-11-13 20:03:55,516 | synference | INFO     | Checking parameters inside create_matched_galaxies.
2025-11-13 20:03:55,522 | synference | INFO     | 1000

Preparing galaxy batches: 100%|██████████| 1000/1000 [00:00<00:00, 149876.86it/s]
Creating galaxies: 100%|██████████| 1/1 [02:10<00:00, 130.80s/it]

2025-11-13 20:06:06,345 | synference | INFO     | Created 1000 galaxies.


Processing parameters: 100%|██████████| 11/11 [00:00<00:00, 8715.03it/s]

2025-11-13 20:06:06,366 | synference | INFO     | Finished creating galaxies.
2025-11-13 20:06:06,367 | synference | INFO     | Created 1000 galaxies for base sps_Euclid_test
2025-11-13 20:06:06,371 | synference | INFO     | Creating pipeline.
2025-11-13 20:06:06,372 | synference | INFO     | Processing all galaxies in a single batch.
2025-11-13 20:06:06,374 | synference | INFO     | Running in single-node mode.
2025-11-13 20:06:06,374 | synference | INFO     | Added analysis functions to pipeline.

2025-11-13 20:06:06,561 | synference | INFO     | Running pipeline at 2025-11-13 20:06:06.561883 for 1000 galaxies
2025-11-13 20:06:41,110 | synference | INFO     | Finished running pipeline at 2025-11-13 20:06:41.110316 for 1000 galaxies
2025-11-13 20:06:41,111 | synference | INFO     | Pipeline took 0:00:34.548426 to run.
2025-11-13 20:06:43,848 | synference | INFO     | Written pipeline to disk at ./sps_Euclid_test.hdf5.
2025-11-13 20:06:44,121 | synference | INFO     | Compiling the library after processing bases.
2025-11-13 20:06:44,122 | synference | INFO     | Emission model key for base sps_Euclid_test:total

Loading galaxy properties: 1it [00:00, 30.90it/s]

2025-11-13 20:06:44,858 | synference | INFO     | Combined outputs shape: (21, 1000)
2025-11-13 20:06:44,860 | synference | INFO     | Combined parameters shape: (11, 1000)
2025-11-13 20:06:44,860 | synference | INFO     | Combined supplementary parameters shape: (10, 1000)
2025-11-13 20:06:44,861 | synference | INFO     | Filter codes: ['Paranal/VISTA.Z', 'Paranal/VISTA.Y', 'Paranal/VISTA.J', 'Paranal/VISTA.H', 'Paranal/VISTA.Ks', 'Subaru/HSC.g', 'Subaru/HSC.r', 'Subaru/HSC.i', 'Subaru/HSC.z', 'Subaru/HSC.Y', 'CFHT/MegaCam.u', 'CFHT/MegaCam.g', 'CFHT/MegaCam.r', 'CFHT/MegaCam.i', 'CFHT/MegaCam.z', 'Euclid/VIS.vis', 'Euclid/NISP.Y', 'Euclid/NISP.J', 'Euclid/NISP.H', 'Spitzer/IRAC.I1', 'Spitzer/IRAC.I2']
2025-11-13 20:06:44,861 | synference | INFO     | Parameter names: ['redshift', 'log_mass', 'fesc', 'bump_strength', 'slope', 'log10metallicity', 'Av', 'log_sfr', 'sfh_quantile_25', 'sfh_quantile_50', 'sfh_quantile_75']
2025-11-13 20:06:44,862 | synference | INFO     | Parameter units: ['dimensionless', 'log10_Msun', 'dimensionless', 'dimensionless', 'dimensionless', 'log10(Zmet)', 'mag', 'log10(Msun/yr)', 'dimensionless', 'dimensionless', 'dimensionless']
2025-11-13 20:06:44,922 | synference | INFO     | Processed the bases and saved the output.

[16]:

<synference.library.CombinedBasis at 0x7f7168fab640>

Spectroscopic Grid Creation¶

We can also build a library of spectra, which we can train a model from using an embedding network. At the default setting, the spectra would be at the wavelength range and resolution of our SPS grid, which is likely higher than required. We can change the wavelength array on our instrument or grid to a more reasonable choice.

from unyt import Angstrom

from synference import generate_constant_R

new_lam = generate_constant_R(300, start=100 * Angstrom, stop=100_000 * Angstrom)

[17]:

basis.create_mock_library(
    emission_model_key=emission_key,
    out_name="spectral_library_Euclid_test",
    out_dir="./",
    overwrite=True,
    n_proc=4,
    verbose=False,
    batch_size=10_000,
    parameter_transforms_to_save=param_transforms_to_save,
    cat_type="spectra",
    **supp_params,
)

2025-11-13 20:06:44,936 | synference | WARNING  | File .//sps_Euclid_test.hdf5 already exists. Overwriting..
2025-11-13 20:06:44,959 | synference | INFO     | Checking parameters inside create_matched_galaxies.
2025-11-13 20:06:44,961 | synference | INFO     | 1000

Preparing galaxy batches: 100%|██████████| 1000/1000 [00:00<00:00, 223755.88it/s]
Creating galaxies: 100%|██████████| 1/1 [01:31<00:00, 91.29s/it]

2025-11-13 20:08:16,259 | synference | INFO     | Created 1000 galaxies.


Processing parameters: 100%|██████████| 11/11 [00:00<00:00, 14628.20it/s]

2025-11-13 20:08:16,272 | synference | INFO     | Finished creating galaxies.
2025-11-13 20:08:16,273 | synference | INFO     | Created 1000 galaxies for base sps_Euclid_test
2025-11-13 20:08:16,275 | synference | INFO     | Creating pipeline.
2025-11-13 20:08:16,275 | synference | INFO     | Processing all galaxies in a single batch.
2025-11-13 20:08:16,276 | synference | INFO     | Running in single-node mode.
2025-11-13 20:08:16,277 | synference | WARNING  | Synthesizer not installed with OpenMP support. Pipeline will run on a single core
2025-11-13 20:08:16,278 | synference | INFO     | Added analysis functions to pipeline.
2025-11-13 20:08:16,385 | synference | INFO     | Running pipeline at 2025-11-13 20:08:16.385751 for 1000 galaxies

2025-11-13 20:08:43,557 | synference | INFO     | Finished running pipeline at 2025-11-13 20:08:43.557039 for 1000 galaxies
2025-11-13 20:08:43,558 | synference | INFO     | Pipeline took 0:00:27.171281 to run.
2025-11-13 20:08:45,618 | synference | INFO     | Written pipeline to disk at ./sps_Euclid_test.hdf5.
2025-11-13 20:08:45,860 | synference | INFO     | Compiling the library after processing bases.
2025-11-13 20:08:45,861 | synference | INFO     | Emission model key for base sps_Euclid_test:total

Loading galaxy properties: 1it [00:00,  2.83it/s]

2025-11-13 20:08:46,597 | synference | INFO     | Combined outputs shape: (9244, 1000)
2025-11-13 20:08:46,598 | synference | INFO     | Combined parameters shape: (11, 1000)
2025-11-13 20:08:46,599 | synference | INFO     | Combined supplementary parameters shape: (10, 1000)
2025-11-13 20:08:46,600 | synference | INFO     | Spectral mode enabled, using wavelengths as filter codes.
2025-11-13 20:08:46,600 | synference | INFO     | Parameter names: ['redshift', 'log_mass', 'fesc', 'bump_strength', 'slope', 'log10metallicity', 'Av', 'log_sfr', 'sfh_quantile_25', 'sfh_quantile_50', 'sfh_quantile_75']
2025-11-13 20:08:46,601 | synference | INFO     | Parameter units: ['dimensionless', 'log10_Msun', 'dimensionless', 'dimensionless', 'dimensionless', 'log10(Zmet)', 'mag', 'log10(Msun/yr)', 'dimensionless', 'dimensionless', 'dimensionless']
2025-11-13 20:08:48,313 | synference | INFO     | Processed the bases and saved the output.

[17]:

<synference.library.CombinedBasis at 0x7f7178b9e6b0>