Compile Application

With Docker

Dev Environment Images

Two images are provided for development purposes.

# vtk+mkl+cuda
docker pull tlcfem/suanpan-env-cuda
# vtk+mkl
docker pull tlcfem/suanpan-env

They can be used as development containers. VS Code and CLion can be configured to use these containers for development. There is no need to install any dependencies on the host machine.

The tlcfem/suanpan-env image also supports arm64, in which OpenBLAS is used as the linear algebra driver.

On AMD platforms, it is known that MKL may throttle thus yields a poor performance, it may be necessary to use a specific version of OpenBLAS or AMD Optimizing CPU Libraries (AOCL) instead.

Docker Images

It is possible to compile the project with Docker. Check the provided Dockerfiles for more information. One can build the image using the example Dockerfile as it is. For example,

# the current folder contains the file Rocky.Dockerfile
docker build -t suanpan -f ./Rocky.Dockerfile .

Once the image is built, run the container and use sp, suanpan or suanPan to invoke the program in the container. Maybe it is necessary to map some folders to the container.

docker run -it --rm suanpan
# now in the container
suanpan -v

Docker files provide a standard reproducible environment and a reference configuration. One can always introduce adaptions to cater various needs.

Without Docker

The following is a general guide that covers three main operating systems. It mainly targets the amd64 architecture.

Prerequisites

To configure the source code, CMake shall be available. Please install it before configuring the source code package.
The linear algebra driver used is OpenBLAS. You may want to compile it with the optimal configuration based on the specific machine. Otherwise, precompiled binaries (dynamic platform) are available in this repository.
It is strongly recommended installing Intel MKL for potentially better performance.
Please be aware that MKL is throttled on AMD platforms. Performance comparisons can be seen for example here. If you have AMD CPUs please collect more knowledge to determine which linear algebra library is more suitable.

Toolsets

A number of new features from new standards are utilized. To compile the binary, a compiler that supports C++20 is required.

GCC 11, Clang 13, MSVC 14.3, Intel compilers and later version of those compilers are tested with the source code.

On Windows, Visual Studio with Intel oneAPI toolkit is recommended. Alternatively, WinLibs can be used if GCC compilers are preferred.

On other platforms (Linux and macOS), simply use GCC which comes with a valid Fortran compiler. Clang can also be used for C/CPP code, but since Clang and GCC have different supports for C++ new standards, successful compilation is not guaranteed with Clang.

Obtain Source Code

Download the source code archive from GitHub Releases or the latest stable code.

Configure and Compile

The manual compilation is not difficult in general. The CI/CD configuration files can be referred to if you wish. Please check this page. Here some general guidelines are given.

Windows (Visual Studio)

This is highly tailored to my own machine. Thus, it is not recommended to use it directly. Instead, use VS Code with CMake extension to automatically configure the project.

A solution file is provided under MSVC/suanPan folder. There are two configurations:

Debug: Assume no available Fortran compiler, all Fortran related libraries are provided as precompiled DLLs. Use OpenBLAS for linear algebra. Multithreading disabled. Visualisation disabled. HDF5 support disabled.
Release: Fortran libraries are configured with Intel compilers. Use MKL for linear algebra. Multithreading enabled. Visualisation enabled with VTK version 9.4. HDF5 support enabled. CUDA enabled.

This repository contains some precompiled libraries used.

If VTK, Intel oneAPI Toolkit and CUDA are not installed, only the Debug configuration can be successfully compiled. Simply open the solution and switch to Debug configuration, ignore all potential warnings and build the solution.

To compile Release version, please

Make sure oneAPI both Base and HPC toolkits, as well as VS integration, are installed. The MKL is enabled via integrated option <UseInteloneMKL>Parallel</UseInteloneMKL>.
Make sure CUDA is installed. The environment variable $(CUDA_PATH) is used to locate headers.
Make sure VTK is available. Then define a system environment variable $(VTK_DIR), which points to the root folder of VTK library. On my machine, it is
```
VTK_DIR=C:\Program Files\VTK\
```
For versions other than 9.4, names of the linked libraries shall be manually changed as they contain version numbers. Thus, it is not a good idea to switch to a different version. Precompiled VTK library is also available in this repository.
Make sure MAGMA is available. Then define a system environment variable $(MAGMA_DIR), which points to the root folder of MAGMA library. On my machine, it is
```
MAGMA_DIR=C:\Program Files\MAGMA\
```
You probably need to compile MAGMA yourself. You can manually remove all magma related settings in the solution file if you don't want to use it.

Alternatively, CMake can be used to generate solution files if some external packages are not available.

Windows (Visual Studio Code)

Open the source code folder with VS Code. Whether you choose GCC or MSVC, the configuration is done by CMake automatically.

Ubuntu

The following instructions are based on Ubuntu 22.04.CMake is used to manage builds. It is recommended to use CMake GUI if appropriate.

Install necessary tools.

sudo apt-get install gcc g++ gfortran git cmake libomp5 libglvnd-dev -y

Clone the project.

git clone -b master --depth 1 https://github.com/TLCFEM/suanPan.git

Create build folder and configure via CMake. The default configuration disables parallelism -DSP_BUILD_PARALLEL=OFF and enables HDF5 via bundled library -DSP_ENABLE_HDF5=ON. Please check CMakeLists.txt file or use GUI for available options.
```
cd suanPan && mkdir build && cd build
cmake ../
```
Invoke make.
```
make -j"$(nproc)"
```

Check the following recording.

Install VTK

Ubuntu official repository does not (Fedora does!) contain the latest VTK library. It's better to compile it manually.

Install OpenGL first, as well as compilers if necessary.
```
sudo apt install gcc-10 g++-10 gfortran-10 libglvnd-dev
```

Obtain VTK source code and unpack.

wget https://www.vtk.org/files/release/9.1/VTK-9.1.0.tar.gz
tar -xf VTK-9.1.0.tar.gz

Create folder for building VTK.
```
mkdir VTK-build && cd VTK-build
```
Configure and compile VTK library. If necessary, installation destination can be modified. Here static libraries are built.
```
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF -DCMAKE_INSTALL_PREFIX=../VTK-out ../VTK-9.1.0
make install -j4
```
Now obtain suanPan source code and unpack it. To configure it with VTK support, users may use the following flag -DSP_ENABLE_VTK=ON. If FindVTK is presented and VTK is installed to default location, there is no need to provide the variable VTK_PATH, otherwise point it to the lib/cmake/vtk-9.1 folder.

Install MKL

The provided CMake configuration covers both oneMKL and Intel MKL 2020. Please note MKL is included in oneAPI toolkit starting from 2021, which has a different folder structure compared to Intel Parallel Studio.

The following guide is a manual installation is based on Ubuntu terminal using the official repository. See this page for details.

Add repository. To summarise,

wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
sudo apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
echo "deb https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list

Install the package.

sudo apt update && sudo apt install intel-oneapi-mkl-devel -y

Now compile suanPan by enabling MKL via option -DSP_ENABLE_MKL=ON. The corresponding MKL_PATH shall be assigned, for example -DMKL_PATH=/opt/intel/oneapi/mkl/latest/, depending on the installation location. The configuration used for snap is the following one.

-DCMAKE_BUILD_TYPE=Release
-DCMAKE_INSTALL_PREFIX=
-DMKL_PATH=/opt/intel/oneapi/mkl/latest
-DSP_BUILD_PARALLEL=ON
-DSP_ENABLE_HDF5=ON
-DSP_ENABLE_IOMP=OFF
-DSP_ENABLE_MKL=ON
-DSP_ENABLE_SHARED_MKL=OFF
-DSP_ENABLE_VTK=ON
-DVTK_PATH=$CRAFT_PART_BUILD/lib/cmake/vtk-9.4/

Fedora

VTK

Fedora offers the latest VTK library, simply install it.

sudo dnf install vtk-devel

MKL

Intel also provides a repository to install MKL via dnf. See this page for details.

First, create the repo file.

tee > /tmp/oneAPI.repo << EOF
[oneAPI]
name=Intel® oneAPI repository
baseurl=https://yum.repos.intel.com/oneapi
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
EOF

Move it to the proper location.

sudo mv /tmp/oneAPI.repo /etc/yum.repos.d

Install MKL. You may perform a search sudo dnf search intel-oneapi-mkl-devel to find which package name is available and install the specific version if necessary.

sudo dnf install intel-oneapi-mkl-devel

The source can be compiled with VTK and MKL enabled.

macOS

The following guide is based on macOS Big Sur (11).

Install tools. gfortran, llvm and libomp are used for compiling the main program, glfw and glew are required for compiling VTK. VTK does not compile with GCC. Here, we use Clang.

# install brew if not installed
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# install necessary packages
brew install gcc@10 llvm@13 libomp glfw glew git cmake

Similar to Ubuntu, compile VTK if wanted.

wget https://www.vtk.org/files/release/9.1/VTK-9.1.0.tar.gz
tar xf VTK-9.1.0.tar.gz && rm VTK-9.1.0.tar.gz
mkdir VTK-build && cd VTK-build
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF -DCMAKE_INSTALL_PREFIX=../VTK-out ../VTK-9.1.0
make install -j4

MKL can be installed if necessary. See this page .

Obtain the source code and configure.

# clone source code
git clone -b master https://github.com/TLCFEM/suanPan.git
# create build directory
mkdir suanpan-build && cd suanpan-build
# use clang, clang++ and gfortran
export CC=/usr/local/opt/llvm/bin/clang && export CXX=/usr/local/opt/llvm/bin/clang++ && export FC=gfortran-10
# configure project
cmake -DCMAKE_BUILD_TYPE=Release -DSP_BUILD_PARALLEL=ON -DSP_ENABLE_HDF5=ON -DSP_ENABLE_VTK=ON -DVTK_PATH=../VTK-out/lib/cmake/vtk-9.1/ .
# compile
make -j4

Linear Algebra Driver

Any standard BLAS and LAPACK implementation can be used as the linear algebra driver which the application itself and Armadillo rely on. Currently, the following are supported and tested.

As a general guideline, when it comes to choose a proper implementation, the following points shall be considered.

From my very personal experience, OpenBLAS is okay for small matrices on a few cores, but could be slow when the size hits some threshold. This is also seen in this benchmark.
Intel oneAPI MKL is exclusively optimised for Intel CPUs, and may throttle on other platforms. This benchmark show clear differences on AMD platforms.
AMD Optimizing CPU Libraries (AOCL) is optimised for AMD CPUs based on BLIS and FLAME libraries, it performance on both platforms (and others) is superb.

Thus, use Intel oneAPI MKL if it is preferred or an Intel platform is targeted. The downside is that Intel oneAPI MKL is proprietary and the final binary may have a large size. For other cases, use AMD Optimizing CPU Libraries (AOCL) when possible. The downside it that it may need manual compilation of the libraries for the target OS.OpenBLAS shall be deemed as the last resort and the usage is discouraged as of writing.

Build Options

If CMake GUI is used to configure the project, the following options are available.

BUILD_SHARED_LIBS: If enabled, all libraries will be built as shared libraries.
SP_BUILD_DLL_EXAMPLE: If enabled, example element/material/section implemented as external libraries will be built. This is not required for the successful build of the main application.
SP_BUILD_PARALLEL: If enabled, TBB will be used for multithreading so that element update, global matrix assembly, etc., can be parallelized. OpenMP is not controlled by this option given that OpenMP support is available in major platforms. It will be used for low level parallelization such as linear algebra operations (which is controlled by Armadillo), matrix solving (which is controlled by various solvers). Thus, this flag only controls the suanPan application itself.
SP_ENABLE_HDF5: If enabled, HDF5 will be used to provide support for hdf5recorder.
SP_ENABLE_VTK: If enabled, VTK will be used to provide support for visualization. It will be useful to generate .vtk files that can be used in Paraview for post-processing. If enabled, VTK_PATH needs to be set to the path of VTK installation. For example, VTK_PATH=/usr/local/opt/vtk/lib/cmake/vtk-9.1.
SP_ENABLE_CUDA: CUDA needs to be installed manually by the user. If enabled, CUDA based solvers will be available. However, for dense matrix storage, only full matrix storage scheme is supported by CUDA. Note full matrix storage scheme is not favourable for FEM. It can, however, be used for sparse matrix solving and mixed precision solving.
SP_ENABLE_MAGMA: MAGMA needs to be installed manually by the user. If enabled, MAGMA based solvers will be available. The variable MAGMA_PATH needs to be set to find necessary files.
SP_ENABLE_ASAN: If enabled, address sanitizer will be enabled. This is useful for debugging purposes.
SP_ENABLE_CODECOV: If enabled, compile options will be enabled to support code coverage report.
SP_ENABLE_AVX: If enabled, compiler flags -mavx or /arch:AVX will be used. (~2011)
SP_ENABLE_AVX2: If enabled, compiler flags -mavx2 or /arch:AVX2 will be used. (~2013)
SP_ENABLE_AVX512: If enabled, compiler flags -mavx512f or /arch:AVX512 will be used. (~2016)
SP_ENABLE_TBB_ALLOC: If enabled, the TBB's memory allocator will be used.
SP_OPENBLAS_PATH: If assigned, link the designated OpenBLAS library, otherwise the bundled version will be used.
SP_ENABLE_AOCL: If enabled, one can use the AOCL implementation via assigning library path SP_AOCL_PATH.
SP_ENABLE_MKL: MKL needs to be installed manually by the user. If enabled, MKL will be used for linear algebra operations. If SP_ENABLE_MKL is enabled, set MKL_PATH to the root directory of MKL installation. For example, C:/Program Files (x86)/Intel/oneAPI/mkl/latest or /opt/intel/oneapi/mkl/latest, also the following additional options are available.
SP_ENABLE_SHARED_MKL: If enabled, dynamically linked MKL libraries will be used. Otherwise, statically linked MKL libraries will be used, leading to larger binary size but faster execution and fewer dependencies.
SP_ENABLE_IOMP: If enabled, Intel's OpenMP implementation will be used. Otherwise, Default ones (such as GNU OpenMP library) will be used.
SP_ENABLE_MPI: Enabled cluster support via MPI.
SP_ENABLE_64BIT_INDEXING: Enable 64-bit integer for matrix indexing.

!!! Warning The SP_ENABLE_IOMP should be switched on/off based on the compilers used. In principle, there should be only one implementation of OpenMP. Thus, if the compilers used are gcc/g++/gfortran, SP_ENABLE_IOMP=OFF. If the compilers are icx/icpx/ifx, SP_ENABLE_IOMP=ON. The Spike solver also does not support mixing OpenMP implementations. Use with caution if two OpenMP implementations exist.

Example Configuration

The following command is used to compile the program to be distributed via snap. See this file.

# assume current folder is suanPan/build
# the parent folder contains source code
cmake -DCMAKE_INSTALL_PREFIX= \
      -DCMAKE_BUILD_TYPE=Release \
      -DMKL_PATH=/opt/intel/oneapi/mkl/latest \
      -DSP_BUILD_PARALLEL=ON \
      -DSP_ENABLE_HDF5=ON \
      -DSP_ENABLE_IOMP=OFF \
      -DSP_ENABLE_MKL=ON \
      -DSP_ENABLE_SHARED_MKL=OFF \
      -DSP_ENABLE_VTK=ON \
      -DVTK_PATH=$CRAFT_PART_BUILD/lib/cmake/vtk-9.4/

`aarch64` Architecture

The aarch64 architecture is supported by the source code. But one shall prepare the dependencies manually. As MKL is not available for aarch64, one shall use OpenBLAS or AOCL only.

OpenBLAS is the only necessary dependency. All other dependencies are optional.

To use AOCL, one shall manually compile the blis and flame libraries in advance.

64-bit Indexing

The FEM framework part itself uses unsigned 32-bit integer for indexing, this means it supports up to 4 billion objects, for example, nodes and elements. This is sufficient at least for the years coming. But the linear algebra driver may quickly hit the size limit. For lp64 builds, the indexing capability of a default Fortran integer is around 2 billion. If the problem is large enough, the global matrix may have more than 2 billion entries (either dense or sparse storage). In this case, it is necessary to using 64-bit indexing, see ilp64 model.

To do so, compile the application with the flag -DSP_ENABLE_64BIT_INDEXING=ON.

!!! note The bundled OpenBLAS binaries are built for lp64 only. If one decides to use OpenBLAS, a ilp64 version must be provided via SP_OPENBLAS_PATH.

!!! note If SP_ENABLE_MPI=ON is also enabled, one shall make sure the linked MPI is also compiled with 64-bit integer support.MPICH is known to work well with either 32-bit or 64-bit integer size.

MKL provides both versions for lp64 and ilp64, CMake will handle the linkage automatically.

PreviousTweak Performance NextBuild Documentation

Last updated 1 month ago