set
The set
command is used to set some properties of the analysis.
Linear Elastic Simplification
In a general non-linear context, each substep requires at least two system solving, one for initial guess, one for convergence test. There is no general way to detect a linear system automatically, with which we know the initial guess leads to convergence so that further system solving is not necessary. To speed up simulation, one can use the following command to indicate the system is linear.
With this enabled, only one system solving will be performed in each substep.
Substepping Control
To use a fixed substep size, users can define
Otherwise, the algorithm will automatically substep the current step if convergence is not met. The time step control strategy cannot be customized.
To define the initial step size, users can use
To define the maximum/minimum step size, users can use
Solving Related Settings
Matrix Storage Scheme
Asymmetric Banded
Symmetric Banded
Full Storage
If the problem scale is small, it does not hurt if a full storage scheme is used. For some particular issues such as particle collision issues, the full storage scheme is the only option.
Full Packed Storage
If the matrix is symmetric, a so-called pack format can be used to store the matrix. Essentially, only the upper or the
lower triangle of the matrix is stored. The spatial cost is half of that of the full storage, but the solving speed is
no better. The _spsv()
subroutine is used for matrix solving. It is not recommended using this packed scheme.
Sparse Storage
The sparse matrix is also supported. Several sparse solvers are implemented.
Direct System Solver
Different solvers are implemented for different storage schemes. It is possible to switch from one to another by using the following command. Details are covered in the summary table.
Mixed Precision Algorithm
The following command can be used to control if to use mixed precision refinement. This command has no effect if the target matrix storage scheme has no mixed precision implementation.
Iterative Refinement
The mixed precision algorithm requires iterative refinement. The maximum number of refinements can be bounded by
It cannot exceed 256.
Iterative Refinement Tolerance
If the mixed precision algorithm is used, it is possible to use the following command to control the tolerance.
Thus, the following command set makes sense.
Iterative System Solver
Summary
For Single Node Machine
With SP_ENABLE_MPI
disabled, all available settings are summarised in the following table.
full
set symm_mat false
set band_mat false
LAPACK
yes
(default) d(s)gesv
CUDA
yes
cusolverDnD(S)gesv
symm. banded
set symm_mat true
set band_mat true
LAPACK
yes
(default) d(s)pbsv
SPIKE
yes
d(s)spike_gbsv
asymm. banded
set symm_mat false
set band_mat true
(not required)
yes
d(s)gbsv
symm. packed
set symm_mat true
set band_mat false
(not required)
yes
d(s)ppsv
sparse
set sparse_mat true
SuperLU
no
(default) d(s)gssv
CUDA
yes
cusolverSpD(S)csrlsvqr
PARDISO
no
pardiso
FGMRES
no
dfgmres
For Multi Node Cluster
With SP_ENABLE_MPI
enabled, all available settings are summarised in the following table.
full
set symm_mat false
set band_mat false
(not required)
no
pdgesv
symm. banded
set symm_mat true
set band_mat true
(not required)
no
pdpbsv
asymm. banded
set symm_mat false
set band_mat true
(not required)
no
pdgbsv
symm. packed
set symm_mat true
set band_mat false
(not required)
no
pdposv
sparse
set sparse_mat true
PARDISO
no
cluster_sparse_solver
LIS
no
lis_solve
MUMPS
no
(default) dmumps_c
Some empirical guidance can be concluded as follows.
For most cases, the asymmetric banded storage with full precision solver is the most general option.
The best performance is obtained by using symmetric banded storage, if the (effective) stiffness matrix is guaranteed to be positive definite, users shall use it as a priority.
The mixed precision algorithm often gives the most significant performance boost for full storage with
CUDA
solver. It outperforms the full precision algorithm when the size of system exceeds several thousands.The
SPIKE
solver is slightly slower than the conventionalLAPACK
implementations.The
PARDISO
direct solver andFGMRES
iterative solver are provided byMKL
.
Parallel Matrix Assembling
By default, the coloring algorithm is enabled. To disable it, users can use the following command.
As a NP hard problem, there is no optimal algorithm to find the minimum chromatic number. The Welsh-Powell algorithm is
implemented in suanPan
. The maximum independent set algorithm is also available, it may outperform the
Welsh-Powell algorithm on large models. To switch, users can use the following command.
Also, depending on the problem setup, such a coloring may or may not help to improve the performance. If there are not a large number of matrix assembling, the time saved may not be significant. Thus, for problems of small sizes, users may consider disabling coloring.
This option has no effect if a sparse storage is used.
Penalty Number
For some constraints and loads that are implemented by using the penalty method, the default penalty number can be overridden.
This command does not overwrite user defined penalty number if the specific constraint or load takes the penalty number than input arguments.
FGMRES Iterative Tolerance
For FGMRES
iterative solver, one can use the following dedicated command to control the tolerance of the algorithm.
If the boundary condition is applied via the penalty method, say for example one previously
uses set constraint_multiplier 1E8
, then there is no need to set a tolerance smaller than 1E-8
. A slightly larger
value is sufficient for an iterative algorithm, one can then set
Last updated