EESSI and vGPU support in Magic Castle

In the module MPI support for EESSI-based containers, we introduced the European Environment for Scientific Software Installations (EESSI) which provides a shared stack of scientific software installations. That software stack is intended to work on laptops, personal workstations, HPC clusters and in the cloud. That initiative is built upon the previous efforts of Compute Canada to develop a pan-Canadian software infrastructure.

Another interesting project to come from Compute Canada, which leverages the software infrastructure, is Magic Castle. Magic Castle which aims to recreate the Compute Canada user experience in public clouds, it uses the open-source software Terraform and HashiCorp Language (HCL) to define the virtual machines, volumes, and networks that are required to replicate a virtual HPC infrastructure. After deployment, the user is provided with a complete HPC cluster software environment including a Slurm scheduler, a Globus Endpoint, JupyterHub, LDAP, DNS, and over 3000 research software applications compiled by experts with EasyBuild.

Magic Castle is compatible with AWS, Microsoft Azure, Google Cloud, OpenStack, and OVH.

Purpose of Module

This module describes the inclusion and support of the EESSI software stack in Magic Castle. In addition we also include the generalisation of the virtual GPU (vGPU) support within Magic Castle for those found in the Fenix Research Infrastructure.

Background Information

EU-wide requirements for HPC training are exploding as the adoption of HPC in the wider scientific community gathers pace. However, the number of topics that can be thoroughly addressed without providing access to actual HPC resources is very limited, even at the introductory level. In cases where such access is available, security concerns and the overhead of the process of provisioning accounts make the scalability of this approach questionable.

EU-wide access to HPC resources on the scale required to meet the training needs of all countries is an objective that we attempt to address with LearnHPC. The proposed solution leverages Magic Castle to provision virtual HPC systems in a public cloud. This infrastructure will allow us to dynamically create temporary event-specific HPC clusters for training purposes, including a scientific software stack from EESSI.

Building and Testing

Since EESSI is now already integrated in Magic Castle, one can simply follow the standard Magic Castle setup instructions and use the switch for the EESSI software stack in the infrastructure configuration file.

If you use vGPU enabled instances for execution nodes in your virtual cluster, the vGPUs will be automatically configured and included as available resources in the SLURM environment.