suanPan-manual
  • Introduction
  • Basic
    • Obtain Application
    • Configure Application
    • Perform Analysis
    • Model Syntax
    • Model Structure
    • Tweak Performance
    • Compile Application
    • Build Documentation
    • Architecture Design
    • On Clusters
  • Example
    • Developer
      • element template
      • material template
    • Solid
      • wave propagation
    • Geotechnical
      • triaxial compression of sand
      • slope analysis
    • Structural
      • Statics
        • bending of a cantilever beam
        • bifurcation of a cantilever beam
        • double-edge notched specimen
        • lees frame
        • notched beam under cyclic loading
        • rc section analysis
        • truss roof
        • uniform tension of a rubber specimen
        • thin-walled section analysis for frame structures
        • calibration of subloading surface model
      • Dynamics
        • bouncing of a ball
        • mass-spring-dashpot system
        • dynamic analysis of a portal frame
        • elemental damping
        • particle collision
        • response history analysis of an elastic coupled wall
        • multi-support excitation
        • triple pendulum
        • computing response spectrum
        • integrate with python
        • process ground motion
      • Hybrid
        • vibration of a displaced beam
      • Buckling
        • buckling analysis of a cantilever beam
      • Contact
        • contact between beam and block
        • contact in 3d space
      • Optimization
        • evolutionary structural optimization
      • Isogeometric Analysis
        • linear analysis of a single element
    • Miscellaneous
      • batch execution for automation
  • Command Collection
    • Define
      • amplitude
      • bc
      • domain
      • element
      • expression
      • file
      • generate
      • group
      • import
      • initial
      • load
      • material
      • modifier
      • node
      • recorder
      • section
    • Configure
      • analyze
      • converger
      • criterion
      • integrator
      • precheck
      • step
    • Process
      • benchmark
      • clear
      • command
      • enable
      • exit
      • materialtest
      • materialtestbyload
      • sectiontest
      • peek
      • plot
      • protect
      • pwd
      • reset
      • save
      • set
      • upsampling
      • sdof_response
      • response_spectrum
  • Amplitude
    • Amplitude
    • Special
      • NZStrongMotion
    • Universal
      • Combine
      • Constant
      • Decay
      • Linear
      • Modulated
      • Tabular
      • TabularSpline
      • Trig
  • Constraint
    • MPC
    • ParticleCollision
    • RigidWall
    • RestitutionWall
    • FixedLength
    • MaxForce
    • NodeLine
    • NodeFacet
    • Embed2D
    • Embed3D
    • LJPotential2D
    • MaximumGap2D
    • MinimumGap2D
    • MaximumGap3D
    • MinimumGap3D
  • Converger
    • Converger
    • Absolute
      • AbsDisp
      • AbsError
      • AbsIncreDisp
      • AbsIncreAcc
      • AbsIncreEnergy
      • AbsResidual
    • Other
      • FixedNumber
      • Logic
    • Relative
      • RelDisp
      • RelError
      • RelIncreDisp
      • RelIncreAcc
      • RelIncreEnergy
      • RelResidual
  • Criterion
    • Criterion
    • MaxDisplacement
    • MaxHistory
    • MaxResistance
    • MinDisplacement
    • MinResistance
    • StrainEnergyEvolution
  • Element
    • Beam
      • B21
      • B21E
      • B21H
      • B31
      • B31OS
      • EB21
      • EB31OS
      • F21
      • F21H
      • F31
      • NMB21
      • NMB21E
      • NMB31
      • MVLEM
      • Orientation
    • Cube
      • C3D20
      • C3D4
      • C3D8
      • C3D8I
      • CIN3D8
      • DC3D4
      • DC3D8
    • Membrane
      • Couple Stress
      • Phase Field
        • DCP3
        • DCP4
      • Axisymmetric
        • CAX3
        • CAX4
        • CAX8
      • Plane
        • CP3
        • CP4
        • CP4I
        • CP5
        • CP6
        • CP7
        • CP8
      • Mixed
        • PS
        • QE2
      • Drilling
        • Allman
        • GCMQ
        • GQ12
      • Infinite
        • CINP4
      • Geotechnical
        • PCPE4DC
        • PCPE4UC
        • PCPE8DC
        • PCPE8UC
      • Membrane
    • Modifier
      • Modifier
      • ElementalLee
      • ElementalNonviscous
      • LinearViscosity
    • Patch
      • Patch
      • PatchCube
      • PatchQuad
    • Plate
      • DKT3
      • DKT4
      • Mindlin
    • Shell
      • DKTS3
      • DKTS4
      • S4
      • SGCMS
      • ShellBase
    • Special
      • Contact2D
      • Contact3D
      • Damper01
      • Damper02
      • Embedded2D
      • Embedded3D
      • Joint
      • Mass
      • SingleSection
      • Spring01
      • Spring02
      • Tie
      • TranslationConnector
    • Truss
      • T2D2
      • T2D2S
      • T3D2
      • T3D2S
  • Group
    • CustomNodeGroup
    • NodeGroup
    • ElementGroup
    • GroupGroup
  • Integrator
    • Implicit
      • Linear
      • BatheTwoStep
      • GeneralizedAlpha
      • OALTS
      • GSSSS
      • Newmark
        • LeeNewmark
        • LeeElementalNewmark
        • LeeNewmarkFull
        • LeeNewmarkIterative
        • Newmark
        • RayleighNewmark
        • WilsonPenzienNewmark
        • NonviscousNewmark
    • Explicit
      • Tchamwa
      • BatheExplicit
      • GeneralizedAlphaExplicit
  • Material
    • Guide
      • Metal
      • Customisation
    • Material1D
      • Concrete
        • ConcreteCM
        • ConcreteExp
        • ConcreteTsai
        • ConcreteTable
        • ConcreteK4
      • Degradation
        • Degradation
        • CustomStrainDegradation
        • CustomStressDegradation
        • Dhakal
        • TrilinearStrainDegradation
      • Elastic
        • BilinearElastic1D
        • Elastic1D
        • AsymmElastic1D
        • MultilinearElastic1D
        • PolyElastic1D
        • NLE1D01
        • Sinh1D
        • Tanh1D
        • CustomElastic1D
      • Hysteresis
        • AFC
        • AFCN
        • BilinearOO
        • BilinearPO
        • BoucWen
        • BWBN
        • Flag
        • MPF
        • MultilinearOO
        • MultilinearPO
        • RambergOsgood
        • SimpleHysteresis
        • SlipLock
        • SteelBRB
        • Trivial
        • Gap01
      • Viscosity
        • Kelvin
        • Maxwell
        • NonlinearViscosity
        • BilinearViscosity
        • CustomViscosity
        • Viscosity01
        • Viscosity02
        • CoulombFriction
        • Nonviscous01
      • vonMises
        • Subloading1D
        • ArmstrongFrederick1D
        • AFCO1D
        • Bilinear1D
        • BilinearMises1D
        • CustomGurson1D
        • CustomMises1D
        • ExpGurson1D
        • ExpMises1D
        • Mises1D
        • Multilinear1D
        • NonlinearGurson1D
        • VAFCRP1D
    • Material2D
      • AxisymmetricElastic
      • Concrete21
      • Concrete22
      • DuncanSelig
      • Elastic2D
      • Rebar2D
    • Material3D
      • CamClay
        • BilinearCC
        • ExpCC
        • NonlinearCamClay
        • ParabolicCC
      • Concrete
        • CDP
        • CDPM2
        • Rebar3D
        • TableCDP
        • CustomCDP
      • Damage
        • IsotropicDamage
        • LinearDamage
      • DruckerPrager
        • BilinearDP
        • ExpDP
        • CustomDP
        • NonlinearDruckerPrager
      • Elastic
        • BlatzKo
        • IsotropicElastic3D
        • IsotropicNonlinearElastic3D
        • MooneyRivlin
        • NLE3D01
        • OrthotropicElastic3D
        • Yeoh
      • Hoffman
        • BilinearHoffman
        • ExpHoffman
        • CustomHoffman
        • NonlinearHill
        • NonlinearHoffman
        • TimberPD
      • Sand
        • SimpleSand
        • DafalisaManzari
      • vonMises
        • ArmstrongFrederick
        • BilinearJ2
        • BilinearPeric
        • CustomGurson
        • TableGurson
        • CustomJ2
        • ExpGurson
        • ExpJ2
        • MultilinearJ2
        • NonlinearGurson
        • NonlinearJ2
        • NonlinearPeric
        • PolyJ2
        • VAFCRP
        • Subloading
    • MaterialOS
      • ElasticOS
    • Wrapper
      • Axisymmetric
      • Laminated
      • Parallel
      • PlaneStrain
      • PlaneSymmetric
      • PlaneStress
      • Rotation2D
      • Rotation3D
      • Sequential
      • Stacked
      • Uniaxial
      • OS146
      • OS146S
      • Substepping
  • Recorder
    • Recorder
    • OutputType
  • Section
    • Code
      • EU
      • NZ
      • US
    • Section1D
      • Circle1D
      • Fibre1D
      • Rectangle1D
      • TrussSection
    • Section2D
      • Bar2D
      • Box2D
      • Circle2D
      • CircularHollow2D
      • Fibre2D
      • HSection2D
      • ISection2D
      • Rectangle2D
      • TSection2D
    • Section3D
      • Bar3D
      • Box3D
      • Circle3D
      • CircularHollow3D
      • Fibre3D
      • ISection3D
      • Rectangle3D
      • TSection3D
    • SectionOS
      • Cell3DOS
      • Fibre3DOS
    • SectionNM
      • SectionNM
      • NM2D1
      • NM2D2
      • NM2D3
      • NM2D3K
      • NM3D1
      • NM3D2
      • NM3D3
      • NM3D3K
  • Solver
    • BFGS
    • MPDC
    • Newton
    • AICN
    • Ramm
  • Step
    • Overview
    • ArcLength
    • Buckle
    • Dynamic
    • Frequency
    • Optimization
    • Static
  • Developer
    • Prerequisites
    • C Style Interface
      • material
    • CPP Style Interface
      • material
      • element
      • constraint
Powered by GitBook
On this page
  • Linear Elastic Simplification
  • Substepping Control
  • Solving Related Settings
  • Matrix Storage Scheme
  • Direct System Solver
  • Iterative System Solver
  • Summary
  • Parallel Matrix Assembling
  • Penalty Number
  • FGMRES Iterative Tolerance
Edit on GitHub
  1. Command Collection
  2. Process

set

The set command is used to set some properties of the analysis.

Linear Elastic Simplification

In a general non-linear context, each substep requires at least two system solving, one for initial guess, one for convergence test. There is no general way to detect a linear system automatically, with which we know the initial guess leads to convergence so that further system solving is not necessary. To speed up simulation, one can use the following command to indicate the system is linear.

set linear_system true

With this enabled, only one system solving will be performed in each substep.

Substepping Control

To use a fixed substep size, users can define

set fixed_step_size true

Otherwise, the algorithm will automatically substep the current step if convergence is not met. The time step control strategy cannot be customized.

To define the initial step size, users can use

set ini_step_size (1)
# (1) double, substep size

To define the maximum/minimum step size, users can use

set max_step_size (1)
set min_step_size (1)
# (1) double, target time step size

Solving Related Settings

Matrix Storage Scheme

Asymmetric Banded

By default, an asymmetric banded matrix storage scheme is used. For 1D analysis, the global stiffness matrix is always symmetric. However, when it comes to 2D and 3D analyses, the global stiffness may be structurally symmetric in terms of its layout, but there may not have the same value on the symmetric entries. In maths language, if K(i,j)≠0K(i,j) \neq0K(i,j)=0 then K(j,i)≠0K(j,i)\neq0K(j,i)=0, but K(i,j)≠K(j,i)K(i,j)\neq{}K(j,i)K(i,j)=K(j,i). Hence, an asymmetric banded storage is the safest. The _gbsv() LAPACK subroutine is used for matrix solving.

Symmetric Banded

It shall be noted that the symmetric scheme can save almost 50%50\%50% of the memory used in the asymmetric scheme. The _pbsv() LAPACK subroutine is used. This subroutine can only deal with symmetric positive definite band matrix. For some problems, in which the matrix is not necessarily positive definite, for example the buckling problems, this subroutine fails. Before enabling the symmetric banded storage, the analyst must check if the matrix is SPD.

Full Storage

If the problem scale is small, it does not hurt if a full storage scheme is used. For some particular issues such as particle collision issues, the full storage scheme is the only option.

Full Packed Storage

If the matrix is symmetric, a so-called pack format can be used to store the matrix. Essentially, only the upper or the lower triangle of the matrix is stored. The spatial cost is half of that of the full storage, but the solving speed is no better. The _spsv() subroutine is used for matrix solving. It is not recommended using this packed scheme.

Sparse Storage

The sparse matrix is also supported. Several sparse solvers are implemented.

Direct System Solver

Different solvers are implemented for different storage schemes. It is possible to switch from one to another by using the following command. Details are covered in the summary table.

set system_solver (1)
# (1) string, system solver name

Mixed Precision Algorithm

The following command can be used to control if to use mixed precision refinement. This command has no effect if the target matrix storage scheme has no mixed precision implementation.

set precision (1)
# (1) string, "single" ("mixed") or "double" ("full")

Iterative Refinement

The mixed precision algorithm requires iterative refinement. The maximum number of refinements can be bounded by

set iterative_refinement (1)
# (1) integer, maximum number of refinements

It cannot exceed 256.

Iterative Refinement Tolerance

If the mixed precision algorithm is used, it is possible to use the following command to control the tolerance.

set tolerance (1)
# (1) double, tolerance

Typically, each refinement reduces the error by a factor of 10−710^{-7}10−7. Thus, two or three refinements should be sufficient to achieve the working precision.

Thus, the following command set makes sense.

set precision mixed
set iterative_refinement 3
set tolerance 1e-15

Iterative System Solver

Summary

For Single Node Machine

With SP_ENABLE_MPI disabled, all available settings are summarised in the following table.

storage
configuration
configuration
system solver
mixed precision
subroutine in external library

full

set symm_mat false

set band_mat false

LAPACK

yes

(default) d(s)gesv

CUDA

yes

cusolverDnD(S)gesv

symm. banded

set symm_mat true

set band_mat true

LAPACK

yes

(default) d(s)pbsv

SPIKE

yes

d(s)spike_gbsv

asymm. banded

set symm_mat false

set band_mat true

(not required)

yes

d(s)gbsv

symm. packed

set symm_mat true

set band_mat false

(not required)

yes

d(s)ppsv

sparse

set sparse_mat true

SuperLU

no

(default) d(s)gssv

CUDA

yes

cusolverSpD(S)csrlsvqr

PARDISO

no

pardiso

FGMRES

no

dfgmres

For Multi Node Cluster

With SP_ENABLE_MPI enabled, all available settings are summarised in the following table.

storage
configuration
configuration
system solver
mixed precision
subroutine in external library

full

set symm_mat false

set band_mat false

(not required)

no

pdgesv

symm. banded

set symm_mat true

set band_mat true

(not required)

no

pdpbsv

asymm. banded

set symm_mat false

set band_mat true

(not required)

no

pdgbsv

symm. packed

set symm_mat true

set band_mat false

(not required)

no

pdposv

sparse

set sparse_mat true

PARDISO

no

cluster_sparse_solver

LIS

no

lis_solve

MUMPS

no

(default) dmumps_c

Some empirical guidance can be concluded as follows.

  1. For most cases, the asymmetric banded storage with full precision solver is the most general option.

  2. The best performance is obtained by using symmetric banded storage, if the (effective) stiffness matrix is guaranteed to be positive definite, users shall use it as a priority.

  3. The mixed precision algorithm often gives the most significant performance boost for full storage with CUDA solver. It outperforms the full precision algorithm when the size of system exceeds several thousands.

  4. The SPIKE solver is slightly slower than the conventional LAPACK implementations.

  5. The PARDISO direct solver and FGMRES iterative solver are provided by MKL.

Parallel Matrix Assembling

For dense matrix storage schemes, the global matrix is stored in a consecutive chunk of memory. Assembling global matrix needs to fill in the corresponding memory location with potentially several values contributed by different elements. Often in-place atomic summation is involved. To assign each memory location a mutex lock is not cost-efficient. Instead, a kkk-coloring concept can be adopted to divide the whole model into several groups. Each group contains elements that do not share common nodes with others in the same group. By such, no atomic operations are required. Elements in the same group can be updated simultaneously so the matrix assembling is lock free.

By default, the coloring algorithm is enabled. To disable it, users can use the following command.

set color_model false
set color_model none

As a NP hard problem, there is no optimal algorithm to find the minimum chromatic number. The Welsh-Powell algorithm is implemented in suanPan. The maximum independent set algorithm is also available, it may outperform the Welsh-Powell algorithm on large models. To switch, users can use the following command.

# default to WP algorithm
set color_model WP
# switch to MIS algorithm
set color_model MIS

Also, depending on the problem setup, such a coloring may or may not help to improve the performance. If there are not a large number of matrix assembling, the time saved may not be significant. Thus, for problems of small sizes, users may consider disabling coloring.

This option has no effect if a sparse storage is used.

Penalty Number

For some constraints and loads that are implemented by using the penalty method, the default penalty number can be overridden.

set constraint_multiplier (1)
set load_multiplier (1)
# (1) double, new penalty number

This command does not overwrite user defined penalty number if the specific constraint or load takes the penalty number than input arguments.

FGMRES Iterative Tolerance

For FGMRES iterative solver, one can use the following dedicated command to control the tolerance of the algorithm.

set fgmres_tolerance (1)
# (1) double, tolerance

If the boundary condition is applied via the penalty method, say for example one previously uses set constraint_multiplier 1E8, then there is no need to set a tolerance smaller than 1E-8. A slightly larger value is sufficient for an iterative algorithm, one can then set

set fgmres_tolerance 1E-6
PrevioussaveNextupsampling

Last updated 1 month ago

Iterative solvers are available for sparse storage. The library provides a wide range of solvers and preconditioners. See for more details.

It is also possible to use GPU-based iterative solver powered by the library. The binaries shipped officially are not compiled with GPU support. Users can compile the library with GPU support by themselves. See for more details.

Lis
here
MAGMA
here