MPM Integration¶
The material point method (MPM) is used to simulate continuous matter and is especially suited for the simulation of large deformations. Once large deformation are present, a dynamic load balancing solution is sensible to efficiently simulate large systems. Even if the initial work distribution is good, it is very often the case, that this distribution is much less so during the simulation run itself. The load balancing library ALL provides an easy plug and play solution to this problem and this module describes the details in how the library is integrated. Thanks to the good load balancing provided by the library larger systems can be simulated with less computational cost.
Purpose of Module¶
This module shows the straight forwardness of including the load balancing library into already existing code. Depending on the simulation code additional precautions must be taken and those needed for the MPM simulation code are presented here. The prerequisites for the simulation code are also shown. Looking at these will help determine whether a simulation code is particularly suited for integrating ALL or if some further work is needed when integrating.
This module also shows a real world application of the Fortran interface provided with ALL (documented in ALL Fortran interface).
The MPM simulation code with integrated ALL is used by Stephan Schulz in his thesis.
Background Information¶
The load balancing library ALL is integrated into the material point method simulation code GMPM-PoC, which is written by Stephan Schulz during his thesis. The simulation code will be released to the public in the future.
Certain aspects of the simulation code require additional treatment of the library, or additional features of the library. First, the open boundaries of the simulation code require continuous updates of the outer domain boundaries of the boundary domains. The system extent is adapted to the particle’s bounding box each time step. This also means, the geometry generated in the last balance step by the library cannot be used directly. It is therefore saved by the simulation code, adapted to the new system extent and then given to the library as the basis for the new geometry.
The communication is based on grid halos and only accommodates nearest neighbor communication. This causes the minimum domain size to be the width of exactly this halo. The library supports this feature and only the aforementioned outer domain bounds must be checked for compliance with the minimum size. The other domain boundaries are automatically sufficiently large due to the library.
And lastly, the particle communication at the end of each time step also only accounts for nearest neighbor communication. This means, that a domain’s boundary must not change so much, that it needs to retrieve particles from a process that is not its nearest neighbor. Due to the way the library moves boundaries in the staggered grid and tensor approaches, this is also already guaranteed to be true. There is always an overlap between the old domain’s volume and the new domain’s.
However, the library also has a few requirements of the simulation code. Due to changing domains, particles must be able to be communicated across processes, which is implemented for all particle codes with moving particles. Depending on the load balancing method this communication may be more involved. In the case of the tensor method the topology does not change and every process has the same 26 neighbors during the entire simulation. If, however, the staggered grid approach is used, the communication must also handle changing number of neighbors and determine where they are and what regions they belong to. For example it is common, that one half of a boundary must be communicated to one process and the other to a different one. So the fixed relationship between boundaries and neighbors is broken up. The GMPM-PoC code was already designed with such a communication scheme in mind and provided the necessary flexibility to simply enable the staggered grid method after fixing a few communication bugs.
Building and Testing¶
To build the code just run make LB=ALL
and everything should be build
automatically including dependencies. Make sure the correct compiler are
found in the path and if you want to use Intel compilers you need to set
COMPILER=INTEL
as well. The normal caveats and required modules for
some HPC systems are the described in the main code’s README
.
Source Code¶
The main changes are the replacement of the original domain decomposition function which used to equi partition the system extent. Now, ALL is called to update the domain geometry.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | ! The following additional functions were used:
!
! Additional information:
! - LB_METHOD_PRE is set by the preprocessor to ALL_STAGGERED or ALL_TENSOR.
! - The domain_bounds_old will only be used during initialisation for the
! initial domain configuration.
! - ``this_image()`` returns the current image index, i.e. current MPI rank+1.
! - The work is estimated using ``lb_estimate_work`` which takes the current
! domain size and number of particles as arguments.
function domain_decomposition_jall(bounds, dh, num_images3, domain_bounds_old, work, output) result(domain_bounds)
use ISO_C_BINDING
type(boundingbox), intent(in) :: bounds !< simulation bounds
real (kind = real_kind), intent(in) :: dh !< grid width
integer, dimension(3), intent(in) :: num_images3 !< the 1 indexed number of images in 3D
type(boundingbox_aligned), intent(in) :: domain_bounds_old !< current domain bounds
real(real_kind), intent(in) :: work !< work of this domain
logical, intent(in) :: output !< output domain bounds to `vtk_outline` directory
type(boundingbox_aligned) :: domain_bounds
type(boundingbox), save :: domain_old
type(ALL_t), save :: jall ! ALL object which is initialized once
real (kind = real_kind), dimension(3,2) :: verts
integer, dimension(3), save :: this_image3 ! the 1 indexed image number in 3D
logical, save :: first_run = .true.
integer, save :: step = 1
logical, dimension(2,3), save :: domain_at_sim_bound ! true if domain is at the lower/upper simulation boundary
real (kind = real_kind), dimension(3), save :: min_size
integer(c_int), parameter :: LB_METHOD = LB_METHOD_PRE
character (len=ALL_ERROR_LENGTH) :: error_msg
if(first_run) then
! calculate this_image3
block
integer :: x,y,z, cnt
cnt = 1
do z=1,num_images3(3)
do y=1,num_images3(2)
do x=1,num_images3(1)
if(this_image()==cnt) this_image3 = (/ x,y,z /)
cnt = cnt + 1
enddo
enddo
enddo
end block
call jall%init(LB_METHOD,3,4.0d0)
call jall%set_proc_grid_params(this_image3-1, num_images3)
call jall%set_proc_tag(this_image())
call jall%set_communicator(MPI_COMM_WORLD)
min_size(:) = (abs(Rcont_min)+abs(Rcont_max))*dh
call jall%set_min_domain_size(min_size)
domain_old%bounds_unaligned = domain_bounds_old%bounds_aligned
domain_at_sim_bound(1,:) = this_image3==1 ! first image in a direction is automatically at sim bound
domain_at_sim_bound(2,:) = this_image3==num_images3 ! last image likewise at sim bound
call jall%setup()
endif
call jall%set_work(real(work,real_kind))
!! The `domain_old` bounds are not the actual domain bounds, which
!! are aligned to grid widths, but what we got from the previous
!! iteration of load balancing. However, the simulation boundaries are
!! unchanged by the load balancing.
block
type(boundingbox_aligned) :: aligned_bnds
real (kind = real_kind), dimension(3) :: lowest_upper_bound, highest_lower_bound
!> Align the simulation boundaries to the grid and add an additional
!! grid width on the top. These may be used instead of our current
!! bounds, so they should align properly on the upper bound, if we
!! are a simulation boundary. If the simulation bounds have not
!! changed they should still coincide with the domain bounds.
aligned_bnds%bounds_aligned = floor(bounds%bounds_unaligned/dh)*dh
aligned_bnds%bounds_aligned(2,:) = aligned_bnds%bounds_aligned(2,:) + dh
!> To make sure, the shrinking domain is still always large enough
!! and in particular is not shrunk into the neighbouring domain.
!! This can happen if the bounding box is not present in the current
!! domain, so the outer bound is moved across the inner bound. This
!! must be avoided at all cost. Additionally, we also need to ensure
!! the minimum domain width. Also, the outer bound of all boundary
!! domains, must be the same. To achieve this, the outermost inner
!! bound is calculated in each direction. This then allows us to
!! compute the innermost position any outer bound may have to still
!! be the required distance from every next inner bound.
! For the lowest domains:
lowest_upper_bound = comm_co_min_f(domain_old%bounds_unaligned(2,:))
aligned_bnds%bounds_aligned(1,:) = min(lowest_upper_bound-min_size, aligned_bnds%bounds_aligned(1,:))
! For the highest domains:
highest_lower_bound = comm_co_max_f(domain_old%bounds_unaligned(1,:))
aligned_bnds%bounds_aligned(2,:) = max(highest_lower_bound+min_size, aligned_bnds%bounds_aligned(2,:))
! And now set the boundary domains outer bounds to the new, fixed bounds
where(domain_at_sim_bound)
domain_old%bounds_unaligned = aligned_bnds%bounds_aligned
end where
end block
!> Make sure that the old domain bounds are sensible. we are only
!! updating them, based in the previous value. This also means
!! the first call must already contain a sensible approximation
!! (the equidistant (geometric) distribution suffices for that).
verts = transpose(domain_old%bounds_unaligned)
call jall%set_vertices(verts)
call jall%balance()
call jall%get_result_vertices(verts)
domain_bounds%bounds_aligned = transpose(verts)
domain_old%bounds_unaligned = domain_bounds%bounds_aligned
domain_bounds%bounds_aligned = nint(domain_bounds%bounds_aligned/dh)*dh
if(output) then
call ALL_reset_error()
call jall%print_vtk_outlines(step)
if(ALL_error() /= 0) then
error_msg = ALL_error_description()
print*, "Error in ALL detected:"
print*, error_msg
endif
endif
first_run = .false.
step = step + 1
call assert_domain_width(domain_bounds, dh)
end function
!> Estimate local work
function lb_estimate_work(n_part, domain_bounds_old) result(work)
integer, intent(in) :: n_part !< number of particles of this domain
type(boundingbox_aligned), intent(in) :: domain_bounds_old !< domain bounds
real(real_kind) :: work
real(real_kind), parameter :: beta = 0.128 ! empirically determined
work = n_part + beta*product( domain_bounds_old%bounds_aligned(2,:)-domain_bounds_old%bounds_aligned(1,:) )/grid%dh**3
end function
|
To include the library and its VTK dependency into the existing make build
system, the following snippets were used. This builds a ‘minimal’ VTK and
links ALL against this. During the linking of the main simulation code VTK
is linked using $(VTK_LIB)
where the order is very important. The
calling of make in this Makefile is deprecated and should be replaced by
appropriate calls to cmake --build
and cmake --install
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | MJOBS ?= $(shell getconf _NPROCESSORS_ONLN)
JUELICH_ALL_INCLUDE := external/juelich_all_build/include/modules
JUELICH_ALL_LIB := external/juelich_all_build/lib/libALL_fortran.a external/juelich_all_build/lib/libALL.a
VTK_LIB := $(subst lib/,external/vtk_build/lib/, lib/libvtkFiltersProgrammable-7.1.a lib/libvtkIOParallelXML-7.1.a lib/libvtkIOXML-7.1.a lib/libvtkIOXMLParser-7.1.a lib/libvtkexpat-7.1.a lib/libvtkParallelMPI-7.1.a lib/libvtkParallelCore-7.1.a lib/libvtkIOLegacy-7.1.a lib/libvtkIOCore-7.1.a lib/libvtkCommonExecutionModel-7.1.a lib/libvtkCommonDataModel-7.1.a lib/libvtkCommonTransforms-7.1.a lib/libvtkCommonMisc-7.1.a lib/libvtkCommonMath-7.1.a lib/libvtkCommonSystem-7.1.a lib/libvtkCommonCore-7.1.a lib/libvtksys-7.1.a -ldl -lpthread lib/libvtkzlib-7.1.a)
# ...
# VTK
VTKCONFIG_FILE := external/vtk_build/lib/cmake/vtk-7.1/VTKConfig.cmake
$(VTKCONFIG_FILE):
mkdir -p external/vtk_build
cd external/vtk_build && CC=$(CC) CXX=$(CXX) cmake ../vtk -DCMAKE_INSTALL_PREFIX=`pwd` $(EXT_LTO_CMFLAGS) -DBUILD_SHARED_LIBS=OFF -DBUILD_TESTING=OFF -DCMAKE_BUILD_TYPE=Release -DVTK_Group_MPI=OFF -DVTK_Group_Rendering=OFF -DVTK_Group_StandAlone=OFF -DVTK_RENDERING_BACKEND=None -DVTK_USE_CXX11_FEATURES=ON -DModule_vtkCommonDataModel=ON -DModule_vtkFiltersProgrammable=ON -DModule_vtkIOParallelXML=ON -DModule_vtkParallelMPI=ON
$(MAKE) -j $(MJOBS) -C external/vtk_build
$(MAKE) -C external/vtk_build install
# juelich_all
$(JUELICH_ALL_LIB): $(VTKCONFIG_FILE)
mkdir -p external/juelich_all_build
cd external/juelich_all_build && CC=$(CC) CXX=$(CXX) cmake ../juelich_all $(EXT_LTO_CMFLAGS) -DCMAKE_Fortran_FLAGS="-Wall -Wextra -fbacktrace $(EXT_LTO)" -DCMAKE_INSTALL_PREFIX=`pwd` -DCM_ALL_FORTRAN=ON -DCM_ALL_USE_F08=$(ALL_USE_F08) -DCMAKE_BUILD_TYPE=Release -DCM_ALL_DEBUG=OFF -DCM_ALL_VTK_OUTPUT=ON -DVTK_DIR=`pwd`/../vtk_build/lib/cmake/vtk-7.1
$(MAKE) -C external/juelich_all_build
$(MAKE) -C external/juelich_all_build install
|