Parallelism¶
Synthesizer supports both shared memory and distributed memory parallelism, giving users fine-grained control over how to parallelise their workflows.
Shared Memory (OpenMP): Built-in threading for computationally intensive operations within individual galaxies
Distributed Memory (MPI): Partitioning of galaxy populations across processors left in the hands of user, with interfaces for the Comm object and rank/size information
This hybrid approach allows you to avoid parallelism overheads where not needed, while enabling efficient scaling for large galaxy populations.
Distributed Memory Parallelism (Pipeline)¶
For processing large galaxy populations, Synthesizer’s Pipeline object provides a framework for distributed parallelism, but leaves the partitioning in your hands.
Why User-Controlled Partitioning?
Different scientific workflows have different parallelisation strategies:
Simple embarrassingly parallel splits across galaxy populations
Load-balanced partitioning based on particle counts
Spatially-aware splits for simulation snapshots
Custom groupings based on galaxy properties
By leaving partitioning to the user, you can choose the optimal strategy for your specific use case.
Example: MPI with Pipeline
Here’s a simple example of using MPI to process different galaxy subsets on different ranks:
from mpi4py import MPI
from synthesizer import Pipeline
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
# Load all galaxies (or use lazy loading for large datasets)
all_galaxies = load_galaxies()
# Partition galaxies across ranks (simple split)
my_galaxies = all_galaxies[rank::size]
# Each rank processes its subset
pipeline = Pipeline(emission_model=model, nthreads=8, comm=comm)
pipeline.add_galaxies(my_galaxies)
# ... configure pipeline operations ...
pipeline.run()
# Save results (each rank writes its own file or gather results)
save_results(pipeline.galaxies, rank=rank)
Load Balancing
For better load balancing, you can partition based on computational cost (e.g., particle counts):
# Sort galaxies by particle count
galaxies_sorted = sorted(all_galaxies,
key=lambda g: len(g.stars.masses),
reverse=True)
# Round-robin assignment for load balancing
my_galaxies = galaxies_sorted[rank::size]
Combining MPI and OpenMP
For optimal performance on HPC systems, combine distributed (MPI) and shared memory (OpenMP) parallelism:
# Use MPI for galaxy-level parallelism
my_galaxies = all_galaxies[rank::size]
# Use OpenMP threading within each galaxy
pipeline = Pipeline(emission_model=model, nthreads=8, comm=comm)
pipeline.add_galaxies(my_galaxies)
pipeline.run()
This hybrid approach allows you to scale to thousands of cores on HPC systems while maintaining efficiency.