Load Balancing For High Performance Computing Using Quantum Annealing: Adaptive Mesh Refinement

3 Jul 2024

Authors:

(1) Omer Rathore, Department of Physics, Durham University;

(2) Alastair Basden, Department of Physics, Durham University;

(3) Nicholas Chancellor, Department of Physics, Durham University and School of Computing, Newcastle University;

(4) Halim Kusumaatmaja, Department of Physics, Durham University and Institute for Multiscale Thermofluids, School of Engineering, The University of Edinburgh.

Table of Links

Abstract and Introduction

Methods

Results

Conclusions and Outlook, Acknowledgements, and References

In order to formulate load balancing for AMR as an Ising problem suitable for annealers, data was gathered using CompReal[66], a fully compressible, finite difference flow solver for the Navier-Stokes equations. The flow solver interfaces with a widely used general software framework for AMR applications called BoxLib[16], [67]. BoxLib was designed as a platform upon which to build massively parallel software and has demonstrated good scalability on up to 100,000 cores[68]. Research codes based on BoxLib are numerous and varied, but include the realms of astrophysics[69], [70], computational cosmology[69], subsurface flow[71] and combustion[72] among others.

The particular test case simulated involves a developing spherical blast wave as illustrated in Figure 2. The sharpest gradients are in a very thin zone encompassing the shock edge which is moving outwards. As such the mesh is regularly updated to track its position. Data is defined on a nested hierarchy of logically rectangular collection of cells called grids (or patches). This is divided into levels where each level refers to the union of all grids that share the same mesh spacing. Aside from the coarsest level, finer levels are disjoint and do not cover the entire simulation domain, thus facilitating allocation of resources to desired regions. Each grid is made up of a number of cells and this is not necessarily the same across all grids. The cells have high intra-connectivity within the same grid and lower inter-connectivity to neighbouring grids. Therefore, from a work distribution perspective, it is desirable to allocate whole grids to individual processors while trying to maintain roughly the same total cell count when possible.

Figure 2. A shockwave expanding outwards from a region initially at high pressure (red) to the surrounding lower pressure (blue) environment simulated using CompReal. The nested grid hierarchy is illustrated using individual grids from the finest, intermediate and coarsest levels outlined in yellow, red and white respectively.

To quantify the computational burden associated with each nested grid, the total number of cells within that patch are counted. There are also alternative measures of "burden", for example actually recording the time it takes each patch to complete its allocated tasks might be more representative for multi-physics simulations where different cells solve different equations. This would affect the value of the weights but not the Ising formulation itself so the method remains completely general.

The classical simulation thus provides a series of numbers representing the notional cost of each grid, which needs to be equitably distributed across a given number of processors. This process will need to be repeated throughout the classical simulation at certain intervals. It is undesirable to split these grids due to their high intra-connectivity and so the problem essentially reduces to one of number partitioning. Given a set of N numbers, where N is the number of patches, S = {n1,...,nN}, the task is to divide this set into two disjoint subsets such that the sum in both elements is the same, or at least minimises the mismatch. This is framed as the following Ising model[22],

where ni are the numbers in the set, si is the Ising spin variable and A is a general scaling constant which is set to unity henceforth. Despite the absence of linear biases in the model, the coupling terms will lead to the formation of a fully connected graph when mapped to the quantum processor. It’s worthwhile noting that number partitioning phrased as a decision problem regarding if the two subsets are equal or not is classified as NP-complete[73]. However, since this model is based on real data from simulations, achieving a perfectly balanced split is essentially almost impossible. Therefore, the objective shifts to minimizing the discrepancy between subsets, a task that is NP-hard. This process can be applied recursively to achieve divisions for more than two processors. Currently BoxLib uses a simple round robin strategy for distributing grids between subsets.

This paper is available on arxiv under CC BY 4.0 DEED license.