Multithreading Library Generation

In the previous sections, we demonstrated how to generate model libraries using Synference’s built-in library generation capabilities. However, for large-scale model libraries, the generation process can be time-consuming. To address this, Synference supports multithreaded library generation, allowing you to parallelize the computation across multiple CPU cores and even multiple nodes in a computing cluster.

We use mpi4py for this, which requires you to run your script with e.g. mpirun or srun if using a slurm queue.

mpirun -n $SLURM_CPUS_PER_TASK python library_generation.py $SLURM_CPUS_PER_TASK 0

Then in your script you should get the ran and size of each instance, like so.

The idea, is that if in GalaxyBasis.create_mock_library, you set multi_node=True and you run the script in MPI, that the galaxies will be split across the ranks. This allows for a two-tiered parallelization process, where the galaxies are split across multiple nodes, and each node has a batch of galaxies to process, which can be split over the cores in that node. This is enabled through the Synthesizer Pipeline functionality.