The material point method (MPM) is used to simulate continuous matter and is especially suited for the simulation of large deformations. Once large deformation are present, a dynamic load balancing solution is sensible to efficiently simulate large systems. Even if the initial work distribution is good, it is very often the case, that this distribution is much less so during the simulation run itself. The load balancing library ALL provides an easy plug and play solution to this problem and this module describes the details in how the library is integrated. Thanks to the good load balancing provided by the library larger systems can be simulated with less computational cost.
This module shows the straight forwardness of including the load balancing library into already existing code. Depending on the simulation code additional precautions must be taken and those needed for the MPM simulation code are presented here. The prerequisites for the simulation code are also shown. Looking at these will help determine whether a simulation code is particularly suited for integrating ALL or if some further work is needed when integrating.
The MPM simulation code with integrated ALL is used by Stephan Schulz in his thesis.
The load balancing library ALL is integrated into the material point method simulation code GMPM-PoC, which is written by Stephan Schulz during his thesis. The simulation code will be released to the public in the future.
Certain aspects of the simulation code require additional treatment of the library, or additional features of the library. First, the open boundaries of the simulation code require continuous updates of the outer domain boundaries of the boundary domains. The system extent is adapted to the particle’s bounding box each time step. This also means, the geometry generated in the last balance step by the library cannot be used directly. It is therefore saved by the simulation code, adapted to the new system extent and then given to the library as the basis for the new geometry.
The communication is based on grid halos and only accommodates nearest neighbor communication. This causes the minimum domain size to be the width of exactly this halo. The library supports this feature and only the aforementioned outer domain bounds must be checked for compliance with the minimum size. The other domain boundaries are automatically sufficiently large due to the library.
And lastly, the particle communication at the end of each time step also only accounts for nearest neighbor communication. This means, that a domain’s boundary must not change so much, that it needs to retrieve particles from a process that is not its nearest neighbor. Due to the way the library moves boundaries in the staggered grid and tensor approaches, this is also already guaranteed to be true. There is always an overlap between the old domain’s volume and the new domain’s.
However, the library also has a few requirements of the simulation code. Due to changing domains, particles must be able to be communicated across processes, which is implemented for all particle codes with moving particles. Depending on the load balancing method this communication may be more involved. In the case of the tensor method the topology does not change and every process has the same 26 neighbors during the entire simulation. If, however, the staggered grid approach is used, the communication must also handle changing number of neighbors and determine where they are and what regions they belong to. For example it is common, that one half of a boundary must be communicated to one process and the other to a different one. So the fixed relationship between boundaries and neighbors is broken up. The GMPM-PoC code was already designed with such a communication scheme in mind and provided the necessary flexibility to simply enable the staggered grid method after fixing a few communication bugs.
To build the code just run
make LB=ALL and everything should be build
automatically including dependencies. Make sure the correct compiler are
found in the path and if you want to use Intel compilers you need to set
COMPILER=INTEL as well. The normal caveats and required modules for
some HPC systems are the described in the main code’s
The main changes are the replacement of the original domain decomposition function which used to equi partition the system extent. Now, ALL is called to update the domain geometry.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123
! The following additional functions were used: ! ! Additional information: ! - LB_METHOD_PRE is set by the preprocessor to ALL_STAGGERED or ALL_TENSOR. ! - The domain_bounds_old will only be used during initialisation for the ! initial domain configuration. ! - ``this_image()`` returns the current image index, i.e. current MPI rank+1. ! - The work is estimated using ``lb_estimate_work`` which takes the current ! domain size and number of particles as arguments. function domain_decomposition_jall(bounds, dh, num_images3, domain_bounds_old, work, output) result(domain_bounds) use ISO_C_BINDING type(boundingbox), intent(in) :: bounds !< simulation bounds real (kind = real_kind), intent(in) :: dh !< grid width integer, dimension(3), intent(in) :: num_images3 !< the 1 indexed number of images in 3D type(boundingbox_aligned), intent(in) :: domain_bounds_old !< current domain bounds real(real_kind), intent(in) :: work !< work of this domain logical, intent(in) :: output !< output domain bounds to `vtk_outline` directory type(boundingbox_aligned) :: domain_bounds type(boundingbox), save :: domain_old type(ALL_t), save :: jall ! ALL object which is initialized once real (kind = real_kind), dimension(3,2) :: verts integer, dimension(3), save :: this_image3 ! the 1 indexed image number in 3D logical, save :: first_run = .true. integer, save :: step = 1 logical, dimension(2,3), save :: domain_at_sim_bound ! true if domain is at the lower/upper simulation boundary real (kind = real_kind), dimension(3), save :: min_size integer(c_int), parameter :: LB_METHOD = LB_METHOD_PRE character (len=ALL_ERROR_LENGTH) :: error_msg if(first_run) then ! calculate this_image3 block integer :: x,y,z, cnt cnt = 1 do z=1,num_images3(3) do y=1,num_images3(2) do x=1,num_images3(1) if(this_image()==cnt) this_image3 = (/ x,y,z /) cnt = cnt + 1 enddo enddo enddo end block call jall%init(LB_METHOD,3,4.0d0) call jall%set_proc_grid_params(this_image3-1, num_images3) call jall%set_proc_tag(this_image()) call jall%set_communicator(MPI_COMM_WORLD) min_size(:) = (abs(Rcont_min)+abs(Rcont_max))*dh call jall%set_min_domain_size(min_size) domain_old%bounds_unaligned = domain_bounds_old%bounds_aligned domain_at_sim_bound(1,:) = this_image3==1 ! first image in a direction is automatically at sim bound domain_at_sim_bound(2,:) = this_image3==num_images3 ! last image likewise at sim bound call jall%setup() endif call jall%set_work(real(work,real_kind)) !! The `domain_old` bounds are not the actual domain bounds, which !! are aligned to grid widths, but what we got from the previous !! iteration of load balancing. However, the simulation boundaries are !! unchanged by the load balancing. block type(boundingbox_aligned) :: aligned_bnds real (kind = real_kind), dimension(3) :: lowest_upper_bound, highest_lower_bound !> Align the simulation boundaries to the grid and add an additional !! grid width on the top. These may be used instead of our current !! bounds, so they should align properly on the upper bound, if we !! are a simulation boundary. If the simulation bounds have not !! changed they should still coincide with the domain bounds. aligned_bnds%bounds_aligned = floor(bounds%bounds_unaligned/dh)*dh aligned_bnds%bounds_aligned(2,:) = aligned_bnds%bounds_aligned(2,:) + dh !> To make sure, the shrinking domain is still always large enough !! and in particular is not shrunk into the neighbouring domain. !! This can happen if the bounding box is not present in the current !! domain, so the outer bound is moved across the inner bound. This !! must be avoided at all cost. Additionally, we also need to ensure !! the minimum domain width. Also, the outer bound of all boundary !! domains, must be the same. To achieve this, the outermost inner !! bound is calculated in each direction. This then allows us to !! compute the innermost position any outer bound may have to still !! be the required distance from every next inner bound. ! For the lowest domains: lowest_upper_bound = comm_co_min_f(domain_old%bounds_unaligned(2,:)) aligned_bnds%bounds_aligned(1,:) = min(lowest_upper_bound-min_size, aligned_bnds%bounds_aligned(1,:)) ! For the highest domains: highest_lower_bound = comm_co_max_f(domain_old%bounds_unaligned(1,:)) aligned_bnds%bounds_aligned(2,:) = max(highest_lower_bound+min_size, aligned_bnds%bounds_aligned(2,:)) ! And now set the boundary domains outer bounds to the new, fixed bounds where(domain_at_sim_bound) domain_old%bounds_unaligned = aligned_bnds%bounds_aligned end where end block !> Make sure that the old domain bounds are sensible. we are only !! updating them, based in the previous value. This also means !! the first call must already contain a sensible approximation !! (the equidistant (geometric) distribution suffices for that). verts = transpose(domain_old%bounds_unaligned) call jall%set_vertices(verts) call jall%balance() call jall%get_result_vertices(verts) domain_bounds%bounds_aligned = transpose(verts) domain_old%bounds_unaligned = domain_bounds%bounds_aligned domain_bounds%bounds_aligned = nint(domain_bounds%bounds_aligned/dh)*dh if(output) then call ALL_reset_error() call jall%print_vtk_outlines(step) if(ALL_error() /= 0) then error_msg = ALL_error_description() print*, "Error in ALL detected:" print*, error_msg endif endif first_run = .false. step = step + 1 call assert_domain_width(domain_bounds, dh) end function !> Estimate local work function lb_estimate_work(n_part, domain_bounds_old) result(work) integer, intent(in) :: n_part !< number of particles of this domain type(boundingbox_aligned), intent(in) :: domain_bounds_old !< domain bounds real(real_kind) :: work real(real_kind), parameter :: beta = 0.128 ! empirically determined work = n_part + beta*product( domain_bounds_old%bounds_aligned(2,:)-domain_bounds_old%bounds_aligned(1,:) )/grid%dh**3 end function
To include the library and its VTK dependency into the existing make build
system, the following snippets were used. This builds a ‘minimal’ VTK and
links ALL against this. During the linking of the main simulation code VTK
is linked using
$(VTK_LIB) where the order is very important. The
calling of make in this Makefile is deprecated and should be replaced by
appropriate calls to
cmake --build and
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
MJOBS ?= $(shell getconf _NPROCESSORS_ONLN) JUELICH_ALL_INCLUDE := external/juelich_all_build/include/modules JUELICH_ALL_LIB := external/juelich_all_build/lib/libALL_fortran.a external/juelich_all_build/lib/libALL.a VTK_LIB := $(subst lib/,external/vtk_build/lib/, lib/libvtkFiltersProgrammable-7.1.a lib/libvtkIOParallelXML-7.1.a lib/libvtkIOXML-7.1.a lib/libvtkIOXMLParser-7.1.a lib/libvtkexpat-7.1.a lib/libvtkParallelMPI-7.1.a lib/libvtkParallelCore-7.1.a lib/libvtkIOLegacy-7.1.a lib/libvtkIOCore-7.1.a lib/libvtkCommonExecutionModel-7.1.a lib/libvtkCommonDataModel-7.1.a lib/libvtkCommonTransforms-7.1.a lib/libvtkCommonMisc-7.1.a lib/libvtkCommonMath-7.1.a lib/libvtkCommonSystem-7.1.a lib/libvtkCommonCore-7.1.a lib/libvtksys-7.1.a -ldl -lpthread lib/libvtkzlib-7.1.a) # ... # VTK VTKCONFIG_FILE := external/vtk_build/lib/cmake/vtk-7.1/VTKConfig.cmake $(VTKCONFIG_FILE): mkdir -p external/vtk_build cd external/vtk_build && CC=$(CC) CXX=$(CXX) cmake ../vtk -DCMAKE_INSTALL_PREFIX=`pwd` $(EXT_LTO_CMFLAGS) -DBUILD_SHARED_LIBS=OFF -DBUILD_TESTING=OFF -DCMAKE_BUILD_TYPE=Release -DVTK_Group_MPI=OFF -DVTK_Group_Rendering=OFF -DVTK_Group_StandAlone=OFF -DVTK_RENDERING_BACKEND=None -DVTK_USE_CXX11_FEATURES=ON -DModule_vtkCommonDataModel=ON -DModule_vtkFiltersProgrammable=ON -DModule_vtkIOParallelXML=ON -DModule_vtkParallelMPI=ON $(MAKE) -j $(MJOBS) -C external/vtk_build $(MAKE) -C external/vtk_build install # juelich_all $(JUELICH_ALL_LIB): $(VTKCONFIG_FILE) mkdir -p external/juelich_all_build cd external/juelich_all_build && CC=$(CC) CXX=$(CXX) cmake ../juelich_all $(EXT_LTO_CMFLAGS) -DCMAKE_Fortran_FLAGS="-Wall -Wextra -fbacktrace $(EXT_LTO)" -DCMAKE_INSTALL_PREFIX=`pwd` -DCM_ALL_FORTRAN=ON -DCM_ALL_USE_F08=$(ALL_USE_F08) -DCMAKE_BUILD_TYPE=Release -DCM_ALL_DEBUG=OFF -DCM_ALL_VTK_OUTPUT=ON -DVTK_DIR=`pwd`/../vtk_build/lib/cmake/vtk-7.1 $(MAKE) -C external/juelich_all_build $(MAKE) -C external/juelich_all_build install