MPM Integration

The material point method (MPM) is used to simulate continuous matter and is especially suited for the simulation of large deformations. Once large deformation are present, a dynamic load balancing solution is sensible to efficiently simulate large systems. Even if the initial work distribution is good, it is very often the case, that this distribution is much less so during the simulation run itself. The load balancing library ALL provides an easy plug and play solution to this problem and this module describes the details in how the library is integrated. Thanks to the good load balancing provided by the library larger systems can be simulated with less computational cost.

Purpose of Module

This module shows the straight forwardness of including the load balancing library into already existing code. Depending on the simulation code additional precautions must be taken and those needed for the MPM simulation code are presented here. The prerequisites for the simulation code are also shown. Looking at these will help determine whether a simulation code is particularly suited for integrating ALL or if some further work is needed when integrating.

This module also shows a real world application of the Fortran interface provided with ALL (documented in ALL Fortran interface).

The MPM simulation code with integrated ALL is used by Stephan Schulz in his thesis.

Background Information

The load balancing library ALL is integrated into the material point method simulation code GMPM-PoC, which is written by Stephan Schulz during his thesis. The simulation code will be released to the public in the future.

Certain aspects of the simulation code require additional treatment of the library, or additional features of the library. First, the open boundaries of the simulation code require continuous updates of the outer domain boundaries of the boundary domains. The system extent is adapted to the particle’s bounding box each time step. This also means, the geometry generated in the last balance step by the library cannot be used directly. It is therefore saved by the simulation code, adapted to the new system extent and then given to the library as the basis for the new geometry.

The communication is based on grid halos and only accommodates nearest neighbor communication. This causes the minimum domain size to be the width of exactly this halo. The library supports this feature and only the aforementioned outer domain bounds must be checked for compliance with the minimum size. The other domain boundaries are automatically sufficiently large due to the library.

And lastly, the particle communication at the end of each time step also only accounts for nearest neighbor communication. This means, that a domain’s boundary must not change so much, that it needs to retrieve particles from a process that is not its nearest neighbor. Due to the way the library moves boundaries in the staggered grid and tensor approaches, this is also already guaranteed to be true. There is always an overlap between the old domain’s volume and the new domain’s.

However, the library also has a few requirements of the simulation code. Due to changing domains, particles must be able to be communicated across processes, which is implemented for all particle codes with moving particles. Depending on the load balancing method this communication may be more involved. In the case of the tensor method the topology does not change and every process has the same 26 neighbors during the entire simulation. If, however, the staggered grid approach is used, the communication must also handle changing number of neighbors and determine where they are and what regions they belong to. For example it is common, that one half of a boundary must be communicated to one process and the other to a different one. So the fixed relationship between boundaries and neighbors is broken up. The GMPM-PoC code was already designed with such a communication scheme in mind and provided the necessary flexibility to simply enable the staggered grid method after fixing a few communication bugs.

Building and Testing

To build the code just run make LB=ALL and everything should be build automatically including dependencies. Make sure the correct compiler are found in the path and if you want to use Intel compilers you need to set COMPILER=INTEL as well. The normal caveats and required modules for some HPC systems are the described in the main code’s README.

Source Code

The main changes are the replacement of the original domain decomposition function which used to equi partition the system extent. Now, ALL is called to update the domain geometry.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
! The following additional functions were used:
!
! Additional information:
! - LB_METHOD_PRE is set by the preprocessor to ALL_STAGGERED or ALL_TENSOR.
! - The domain_bounds_old will only be used during initialisation for the
!   initial domain configuration.
! - ``this_image()`` returns the current image index, i.e. current MPI rank+1.
! - The work is estimated using ``lb_estimate_work`` which takes the current
!   domain size and number of particles as arguments.

    function domain_decomposition_jall(bounds, dh, num_images3, domain_bounds_old, work, output) result(domain_bounds)
        use ISO_C_BINDING
        type(boundingbox), intent(in) :: bounds !< simulation bounds
        real (kind = real_kind), intent(in) :: dh !< grid width
        integer, dimension(3), intent(in) :: num_images3 !< the 1 indexed number of images in 3D
        type(boundingbox_aligned), intent(in) :: domain_bounds_old !< current domain bounds
        real(real_kind), intent(in) :: work !< work of this domain
        logical, intent(in) :: output !< output domain bounds to `vtk_outline` directory
        type(boundingbox_aligned) :: domain_bounds
        type(boundingbox), save :: domain_old
        type(ALL_t), save :: jall ! ALL object which is initialized once
        real (kind = real_kind), dimension(3,2) :: verts
        integer, dimension(3), save :: this_image3 ! the 1 indexed image number in 3D
        logical, save :: first_run = .true.
        integer, save :: step = 1
        logical, dimension(2,3), save :: domain_at_sim_bound ! true if domain is at the lower/upper simulation boundary
        real (kind = real_kind), dimension(3), save :: min_size
        integer(c_int), parameter :: LB_METHOD = LB_METHOD_PRE
        character (len=ALL_ERROR_LENGTH) :: error_msg
        if(first_run) then
            ! calculate this_image3
            block
                integer :: x,y,z, cnt
                cnt = 1
                do z=1,num_images3(3)
                    do y=1,num_images3(2)
                        do x=1,num_images3(1)
                            if(this_image()==cnt) this_image3 = (/ x,y,z /)
                            cnt = cnt + 1
                        enddo
                    enddo
                enddo
            end block
            call jall%init(LB_METHOD,3,4.0d0)
            call jall%set_proc_grid_params(this_image3-1, num_images3)
            call jall%set_proc_tag(this_image())
            call jall%set_communicator(MPI_COMM_WORLD)
            min_size(:) = (abs(Rcont_min)+abs(Rcont_max))*dh
            call jall%set_min_domain_size(min_size)
            domain_old%bounds_unaligned = domain_bounds_old%bounds_aligned
            domain_at_sim_bound(1,:) = this_image3==1 ! first image in a direction is automatically at sim bound
            domain_at_sim_bound(2,:) = this_image3==num_images3 ! last image likewise at sim bound
            call jall%setup()
        endif
        call jall%set_work(real(work,real_kind))
        !! The `domain_old` bounds are not the actual domain bounds, which
        !! are aligned to grid widths, but what we got from the previous
        !! iteration of load balancing. However, the simulation boundaries are
        !! unchanged by the load balancing.
        block
            type(boundingbox_aligned) :: aligned_bnds
            real (kind = real_kind), dimension(3) :: lowest_upper_bound, highest_lower_bound
            !> Align the simulation boundaries to the grid and add an additional
            !! grid width on the top. These may be used instead of our current
            !! bounds, so they should align properly on the upper bound, if we
            !! are a simulation boundary. If the simulation bounds have not
            !! changed they should still coincide with the domain bounds.
            aligned_bnds%bounds_aligned = floor(bounds%bounds_unaligned/dh)*dh
            aligned_bnds%bounds_aligned(2,:) = aligned_bnds%bounds_aligned(2,:) + dh
            !> To make sure, the shrinking domain is still always large enough
            !! and in particular is not shrunk into the neighbouring domain.
            !! This can happen if the bounding box is not present in the current
            !! domain, so the outer bound is moved across the inner bound. This
            !! must be avoided at all cost. Additionally, we also need to ensure
            !! the minimum domain width. Also, the outer bound of all boundary
            !! domains, must be the same. To achieve this, the outermost inner
            !! bound is calculated in each direction. This then allows us to
            !! compute the innermost position any outer bound may have to still
            !! be the required distance from every next inner bound.
            ! For the lowest domains:
            lowest_upper_bound = comm_co_min_f(domain_old%bounds_unaligned(2,:))
            aligned_bnds%bounds_aligned(1,:) = min(lowest_upper_bound-min_size, aligned_bnds%bounds_aligned(1,:))
            ! For the highest domains:
            highest_lower_bound = comm_co_max_f(domain_old%bounds_unaligned(1,:))
            aligned_bnds%bounds_aligned(2,:) = max(highest_lower_bound+min_size, aligned_bnds%bounds_aligned(2,:))
            ! And now set the boundary domains outer bounds to the new, fixed bounds
            where(domain_at_sim_bound)
                domain_old%bounds_unaligned = aligned_bnds%bounds_aligned
            end where
        end block
        !> Make sure that the old domain bounds are sensible. we are only
        !! updating them, based in the previous value. This also means
        !! the first call must already contain a sensible approximation
        !! (the equidistant (geometric) distribution suffices for that).
        verts = transpose(domain_old%bounds_unaligned)
        call jall%set_vertices(verts)
        call jall%balance()
        call jall%get_result_vertices(verts)
        domain_bounds%bounds_aligned = transpose(verts)
        domain_old%bounds_unaligned = domain_bounds%bounds_aligned
        domain_bounds%bounds_aligned = nint(domain_bounds%bounds_aligned/dh)*dh
        if(output) then
            call ALL_reset_error()
            call jall%print_vtk_outlines(step)
            if(ALL_error() /= 0) then
                error_msg = ALL_error_description()
                print*, "Error in ALL detected:"
                print*, error_msg
            endif
        endif
        first_run = .false.
        step = step + 1
        call assert_domain_width(domain_bounds, dh)
    end function

    !> Estimate local work
    function lb_estimate_work(n_part, domain_bounds_old) result(work)
        integer, intent(in) :: n_part !< number of particles of this domain
        type(boundingbox_aligned), intent(in) :: domain_bounds_old !< domain bounds
        real(real_kind) :: work
        real(real_kind), parameter :: beta = 0.128 ! empirically determined
        work = n_part + beta*product( domain_bounds_old%bounds_aligned(2,:)-domain_bounds_old%bounds_aligned(1,:) )/grid%dh**3
    end function

To include the library and its VTK dependency into the existing make build system, the following snippets were used. This builds a ‘minimal’ VTK and links ALL against this. During the linking of the main simulation code VTK is linked using $(VTK_LIB) where the order is very important. The calling of make in this Makefile is deprecated and should be replaced by appropriate calls to cmake --build and cmake --install.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
MJOBS ?= $(shell getconf _NPROCESSORS_ONLN)
JUELICH_ALL_INCLUDE := external/juelich_all_build/include/modules
JUELICH_ALL_LIB := external/juelich_all_build/lib/libALL_fortran.a external/juelich_all_build/lib/libALL.a
VTK_LIB := $(subst lib/,external/vtk_build/lib/, lib/libvtkFiltersProgrammable-7.1.a lib/libvtkIOParallelXML-7.1.a lib/libvtkIOXML-7.1.a lib/libvtkIOXMLParser-7.1.a lib/libvtkexpat-7.1.a lib/libvtkParallelMPI-7.1.a lib/libvtkParallelCore-7.1.a lib/libvtkIOLegacy-7.1.a lib/libvtkIOCore-7.1.a lib/libvtkCommonExecutionModel-7.1.a lib/libvtkCommonDataModel-7.1.a lib/libvtkCommonTransforms-7.1.a lib/libvtkCommonMisc-7.1.a lib/libvtkCommonMath-7.1.a lib/libvtkCommonSystem-7.1.a lib/libvtkCommonCore-7.1.a lib/libvtksys-7.1.a -ldl -lpthread lib/libvtkzlib-7.1.a)

# ...

# VTK
VTKCONFIG_FILE := external/vtk_build/lib/cmake/vtk-7.1/VTKConfig.cmake
$(VTKCONFIG_FILE):
	mkdir -p external/vtk_build
	cd external/vtk_build && CC=$(CC) CXX=$(CXX) cmake ../vtk -DCMAKE_INSTALL_PREFIX=`pwd` $(EXT_LTO_CMFLAGS) -DBUILD_SHARED_LIBS=OFF -DBUILD_TESTING=OFF -DCMAKE_BUILD_TYPE=Release -DVTK_Group_MPI=OFF -DVTK_Group_Rendering=OFF -DVTK_Group_StandAlone=OFF -DVTK_RENDERING_BACKEND=None -DVTK_USE_CXX11_FEATURES=ON -DModule_vtkCommonDataModel=ON -DModule_vtkFiltersProgrammable=ON -DModule_vtkIOParallelXML=ON -DModule_vtkParallelMPI=ON
	$(MAKE) -j $(MJOBS) -C external/vtk_build
	$(MAKE) -C external/vtk_build install

# juelich_all
$(JUELICH_ALL_LIB): $(VTKCONFIG_FILE)
	mkdir -p external/juelich_all_build
	cd external/juelich_all_build && CC=$(CC) CXX=$(CXX) cmake ../juelich_all $(EXT_LTO_CMFLAGS) -DCMAKE_Fortran_FLAGS="-Wall -Wextra -fbacktrace $(EXT_LTO)" -DCMAKE_INSTALL_PREFIX=`pwd` -DCM_ALL_FORTRAN=ON -DCM_ALL_USE_F08=$(ALL_USE_F08) -DCMAKE_BUILD_TYPE=Release -DCM_ALL_DEBUG=OFF -DCM_ALL_VTK_OUTPUT=ON -DVTK_DIR=`pwd`/../vtk_build/lib/cmake/vtk-7.1
	$(MAKE) -C external/juelich_all_build
	$(MAKE) -C external/juelich_all_build install