-
Notifications
You must be signed in to change notification settings - Fork 0
/
mscthesis.tex
2138 lines (1907 loc) · 197 KB
/
mscthesis.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[m,times]{cgMA}
\usepackage{listings}
\usepackage{xurl}
\urlstyle{sf}
\usepackage{hyperref}
\usepackage{amsmath}
\usepackage{bm}
\usepackage{textcomp}
\usepackage{caption}
\usepackage{color}
\usepackage{subfigure}
\usepackage{wrapfig}
\usepackage[english,ngerman]{babel}
\usepackage[backend=biber,style=alphabetic]{biblatex}
\usepackage{algorithm}% http://ctan.org/pkg/algorithms
\usepackage{algpseudocode}% http://ctan.org/pkg/algorithmicx
\usepackage{mathtools}
\usepackage{subdepth}
\usepackage{enumitem}
\usepackage[toc,page]{appendix}
\usepackage{minted}
\usepackage{verbatim}
\usepackage{graphicx}
\usepackage{blindtext}
\usepackage{svg}
\usepackage{xcolor,colortbl}
\addbibresource{mscthesis.bib}
\captionsetup{justification=raggedright,singlelinecheck=false}
\DeclareMathOperator*{\argmin}{arg min}
\makeatletter
\def\ext@algorithm{lol}% algorithm captions will be written to the .lol file
% share the list making commands and redefine the heading
\AtBeginDocument{%
\let\l@algorithm\l@lstlisting
\let\c@algorithm\c@lstlisting
\let\thealgorithm\thelstlisting
\renewcommand{\lstlistlistingname}{Algorithms and source code}%
}
\makeatother
\lstdefinelanguage{GLSL}
{
sensitive=true,
morekeywords=[1]{
attribute, const, uniform, varying,
layout, centroid, flat, smooth,
noperspective, break, continue, do,
for, while, switch, case, default, if,
else, in, out, inout, float, int, void,
bool, true, false, invariant, discard,
return, mat2, mat3, mat4, mat2x2, mat2x3,
mat2x4, mat3x2, mat3x3, mat3x4, mat4x2,
mat4x3, mat4x4, vec2, vec3, vec4, ivec2,
ivec3, ivec4, bvec2, bvec3, bvec4, uint,
uvec2, uvec3, uvec4, lowp, mediump, highp,
precision, sampler1D, sampler2D, sampler3D,
samplerCube, sampler1DShadow,
sampler2DShadow, samplerCubeShadow,
sampler1DArray, sampler2DArray,
sampler1DArrayShadow, sampler2DArrayShadow,
isampler1D, isampler2D, isampler3D,
isamplerCube, isampler1DArray,
isampler2DArray, usampler1D, usampler2D,
usampler3D, usamplerCube, usampler1DArray,
usampler2DArray, sampler2DRect,
sampler2DRectShadow, isampler2DRect,
usampler2DRect, samplerBuffer,
isamplerBuffer, usamplerBuffer, sampler2DMS,
isampler2DMS, usampler2DMS,
sampler2DMSArray, isampler2DMSArray,
usampler2DMSArray, struct},
morekeywords=[2]{
radians,degrees,sin,cos,tan,asin,acos,atan,
atan,sinh,cosh,tanh,asinh,acosh,atanh,pow,
exp,log,exp2,log2,sqrt,inversesqrt,abs,sign,
floor,trunc,round,roundEven,ceil,fract,mod,modf,
min,max,clamp,mix,step,smoothstep,isnan,isinf,
floatBitsToInt,floatBitsToUint,intBitsToFloat,
uintBitsToFloat,length,distance,dot,cross,
normalize,faceforward,reflect,refract,
matrixCompMult,outerProduct,transpose,
determinant,inverse,lessThan,lessThanEqual,
greaterThan,greaterThanEqual,equal,notEqual,
any,all,not,textureSize,texture,textureProj,
textureLod,textureOffset,texelFetch,
texelFetchOffset,textureProjOffset,
textureLodOffset,textureProjLod,
textureProjLodOffset,textureGrad,
textureGradOffset,textureProjGrad,
textureProjGradOffset,texture1D,texture1DProj,
texture1DProjLod,texture2D,texture2DProj,
texture2DLod,texture2DProjLod,texture3D,
texture3DProj,texture3DLod,texture3DProjLod,
textureCube,textureCubeLod,shadow1D,shadow2D,
shadow1DProj,shadow2DProj,shadow1DLod,
shadow2DLod,shadow1DProjLod,shadow2DProjLod,
dFdx,dFdy,fwidth,noise1,noise2,noise3,noise4,
EmitVertex,EndPrimitive},
morekeywords=[3]{
gl_VertexID,gl_InstanceID,gl_Position,
gl_PointSize,gl_ClipDistance,gl_PerVertex,
gl_Layer,gl_ClipVertex,gl_FragCoord,
gl_FrontFacing,gl_ClipDistance,gl_FragColor,
gl_FragData,gl_MaxDrawBuffers,gl_FragDepth,
gl_PointCoord,gl_PrimitiveID,
gl_MaxVertexAttribs,gl_MaxVertexUniformComponents,
gl_MaxVaryingFloats,gl_MaxVaryingComponents,
gl_MaxVertexOutputComponents,
gl_MaxGeometryInputComponents,
gl_MaxGeometryOutputComponents,
gl_MaxFragmentInputComponents,
gl_MaxVertexTextureImageUnits,
gl_MaxCombinedTextureImageUnits,
gl_MaxTextureImageUnits,
gl_MaxFragmentUniformComponents,
gl_MaxDrawBuffers,gl_MaxClipDistances,
gl_MaxGeometryTextureImageUnits,
gl_MaxGeometryOutputVertices,
gl_MaxGeometryOutputVertices,
gl_MaxGeometryTotalOutputComponents,
gl_MaxGeometryUniformComponents,
gl_MaxGeometryVaryingComponents,gl_DepthRange},
morecomment=[l]{//},
morecomment=[s]{/*}{*/},
morecomment=[l][keywordstyle4]{\#},
}
\lstdefinestyle{GL}{
tabsize=2,
rulecolor=,
basicstyle=\scriptsize,
upquote=true,
aboveskip={0.5\baselineskip},
belowskip={1.5\baselineskip},
columns=fixed,
showstringspaces=false,
extendedchars=true,
breaklines=true,
prebreak = \raisebox{0ex}[0ex][0ex]{\ensuremath{\hookleftarrow}},
frame=single,
showtabs=false,
showspaces=false,
showstringspaces=false,
identifierstyle=\ttfamily,
keywordstyle=\color[rgb]{1.0,0,0},
keywordstyle=[1]\color[rgb]{0,0,0.75},
keywordstyle=[2]\color[rgb]{0.5,0.0,0.0},
keywordstyle=[3]\color[rgb]{0.127,0.427,0.514},
keywordstyle=[4]\color[rgb]{0.4,0.4,0.4}}
\setcounter{biburllcpenalty}{7000}
\setcounter{biburlucpenalty}{8000}
\begin{document}
\author{Fabian Meyer}
\title{GPU-Beschleunigung der Material Point Method}
\zweitgutachter{Bastian Krayer, M.Sc.}
\zweitgutachterInfo{(Institut f{\"u}r Computervisualistik, AG Computergraphik)}
% Umschalten der Sprache (fuer englische Rubrikbezeichnungen etc.)
\pagenumbering{roman}
\maketitle
\clearpage
\selectlanguage{english}
\vfill
\begin{center}
\subsubsection*{Zusammenfassung}
\end{center}
Die Material Point Method hat sich in der Computergrafik für physikalische basier\-te Simulationen etabliert. Sie benutzt ein hybrides Modell aus Lagrange-Partikeln und Euler-Gitter. Während Partikel als konsistenter Speicher fungieren, erlaubt das Gitter anfallende partielle Differentialgleichungen effizient zu lösen.
\noindent Die Material Point Method unterliegt hohen Berechnungszeiten, die die Methode nur für Hero Shots rentabel macht. Sie ist jedoch hoch-parallelisierbar. Diese Arbeit zeigt, wie die Methode mit GPGPU-Techniken beschleunigt werden kann.
\noindent Der Datenaustausch von Partikel und Gitter wird über interpolierende Transfers erreicht. Pro physikalischen Zeitschritt werden diese mehrmals ausgeführt. Vorverarbeitungsschritte können unternommen werden, um diese Transfers performanter zu machen.
\noindent Countingsort für jede Partikel-Variable erhöht Coalescing und L2-Cache Hitraten. Binning teilt das Gitter in Blöcke auf, die Shared-Memory Implementationen er\-mög\-lich\-en. Die Größen der Bins sind für keine Operation beschränkt. Weiterhin werden nur Blöcke ausgeführt, die Informationen erhalten.
\vfill
\begin{center}
\subsubsection*{Abstract}
\end{center}
The material point method is allowing for physically based simulations. It has found its way into computer graphics and since then rapidly expanded. The material point method's hybrid use of Lagrangian particles as a persistent storage and a background uniform Eulerian grid enables solving of various partial differential equations with ease.
\noindent The material point method suffers from high execution times and is thus only viable for hero shots. The method is however highly parallelizable. Thus, this thesis proposes how to accelerate the material point method using GPGPU techniques.
\noindent Core of the material point method are grid and particles transfers that interpolate between the two structures. These transfers are executed multiple times per physical time step. Preprocessing steps might be taken if their computing time is outweighed.
\noindent Deep sorting with counting sort increases coalescing and L2 cache hit rates. Binning allows to divide the grid into blocks for shared memory filtering techniques. All operations do not rely on fixed bin size. As another preprocessing step, only grid blocks are executed which have particles in them.
\vfill
\clearpage
\tableofcontents
%\setcounter{page}{3}
\clearpage % oder \cleardoublepage bei zweiseitigem Druck
% \listoffigures % fuer ein eventuelles Abbildungsverzeichnis
% \clearpage
\pagenumbering{arabic}
\section{Introduction} \label{intro}
General purpose computation on graphics processing unit(GPGPU) has elevated graphics processing units(GPUs) to any computational field utilizing massively parallelizable algorithms. As such is the nature of discretizations in physics.
The higher parallelization acceleration of GPUs is a blessing for those applications that can profit from it. In unison with efficient algorithms they still have much potential to offer. Conversely, algorithms which are not parallelizable perform much worse and should be avoided. Dividing up these tasks between CPU and GPU is not recommendable. Transferring data over the PCI-Bus between the two is slow for big amounts of data and requires synchronization. Tasks in such a pipeline should thus stay on the GPU although they might perform worse.
Programming for the GPU is another hurdle. GPU code is reliant on the generation and hardware vendor of the GPU to use relevant extensions. Backwards compatibility therefore is mostly ignored and written towards a certain GPU generation. This is amplified due to the GPU code often being reliant on architecture limits for maximum performance. The major languages to target the GPU are CUDA, OpenCL, Direct 3D, OpenGL, and Vulkan. They build on top of a C/C++-language subset.
Whereas CUDA offers the most libraries and optimized code, it is only available on NVIDIA GPUs. OpenCL's development on NVIDIA GPUs is dragging far behind the standard. Its merge into the Vulkan roadmap could be an incentive to support a more in-depth language universally \cite{OPENCL:ROADMAP}. Direct3D is only available on Windows. Vulkan is the successor of OpenGL and removes work from the driver to the developer. Main benefit over OpenGL is less CPU usage which helps little for the almost full GPU implementation here. This implementation opts for OpenGL as Vulkan is still relatively new. Everything presented could be implemented in one of the other languages as well.
The simulation of the dynamics of materials is still extremely challenging in different applications and fields. Multiphysics interactions between materials are not realized on a big scale and often are not possible or feasible. The material point method(MPM) has already reliable tackled many different models, materials, and even interaction between materials.
Pure Eulerian methods like the finite element method(FEM) have been in use for long in engineering tasks but face problems handling topology changes with numerical stability. Pure Langrangian methods form the end other of the spectrum. Prominent are smoothed-particle hydrodynamics(SPH) that are restricted to (viscous) fluids and are of empirical nature. Particle in cell(PIC) techniques like the material point method are hybrid Lagrangian/Eulerian methods. They try to benefit from the nature of both. Advection on particles is trivial. Thus the material point method utilizes Lagrangian particles as a consistent storage of dynamic properties. The governing equations of the physical problem can be discretized easily to and solved on a uniform Eulerian grid. The grid is unmoving and does not store any information between physical time steps.
Transfers between the Lagrangian and Eulerian are of need to bring the two together. These transfers are the focus of particle in cell techniques. These transfers are realized due to interpolation. Thus they create (estimable) numerical error and lead to dissipation. Moreover grid nodes are mostly outnumbered by particles leading to more numerical error as modes of the single particles are not realized by the grid.
The $p$ particles to one node relationship of a particle-to-grid(P2G)-transfer or inversely one node to $p$ particles of a grid-to-particle(G2P)-transfer is the main problem when realizing these operations on the GPU. When faced with the static architecture of the GPU, this will inevitably lead to branch divergence of threads within a subgroup as some nodes are assigned more particles than others.
Chapter \ref{sec:rel_work} looks at the relevant literature of PIC-Methods specifically on the GPU as well as material point method extensions. In chapter \ref{sec:notation} notations which will be used throughout the thesis are laid out. The basics of chapter \ref{sec:basics} cover the derivation of an elastic material in the MPM to get an understanding of the prominent operations. Furthermore GPU intrinsics, metrics and optimization opportunities are discussed. Chapter \ref{sec:implementation} tends to the implementation. The evaluation of the implementation follows up in chapter \ref{sec:eval}. As a conclusion chapter \ref{sec:conclusion} gives a summarizing overview and tends to future work.
\clearpage
\section{Related Work}\label{sec:rel_work}
\textbf{Particle in cell} techniques were developed initially for hydrodynamics \cite{evans1957particle}. They are also widely used in plasma simulations to solve Maxwell's equations \cite{PIC:GPU}. Their key steps are identical to the MPM:
\begin{enumerate}
\item Transfer particle data to a grid.
\item \label{it:solve_gov_eq}Solve the governing partial differential equations on the grid to move forward in time.
\item Transfer grid data back to particles.
\end{enumerate}
In general the grid is regarded as a scratch-pad since its only a temporary storage for the means of computing item \ref{it:solve_gov_eq}. I.e. the grid could be deleted each physical times step. Based on PIC to reduce dampening and momentum conservation Fluid-Implicit-Particle(FLIP) was developed for fluid simulations and is still widely in use for large simulations. Their counterpart being the empirical smoothed-particle hydrodynamics \cite{gingold1977smoothed}.
As a further development the Affine-particle in cell(APIC) technique was introduced as an improvement to PIC and FLIP solvers. Due to the definition of a local affine velocity field around each particle (instead of a single vector) APIC conserves angular momentum, reduces damping and improves stability over PIC and FLIP, more in chapter \ref{sec:apic}. \cite{MPM:APIC} \cite{MPM:OLD_APIC}
A step further goes PolyPIC giving the velocity field even more freedom by adding polynomial velocity modes of increasing order to the description. \cite{MPM:POLYPIC}
The \textbf{material point method} is a further development to the PIC methods. Both use analysis and numerics to be derived without the need for empirical assumptions. Originally it was developed by \cite{sulsky1995application} with a focus on the dynamics of solids. \cite{MPM:SNOW} brought the method into computer graphics for their animation film \textit{Frozen}. In rapid succession a variety materials and effects got modeled by the MPM. Since MPM is based on discretizing governing equations and using constitutive models covered by the different literature, its fast rate of expenditure is not a surprise.
The simulation of (hyper-)elastic materials can be done with a number of different strain measures utilizing fixed Linear strain (chapter \ref{sec:cor_hyper}), Green-Lagrangian strain tensor (chapter \ref{sec:svd}), or logarithmic Hencky-strain (\cite{MPM:SHELLS}) for the different requirements of small to large strains. Inserting these into the different energy density $\boldsymbol{\Psi}(\boldsymbol{\epsilon})$ models can already result in a constitutive model. Plasticity in general requires extra modeling in terms of a flow rule, yield-condition, and a hardening rule which are not discussed here \cite{ochsner2014elasto}. To give a short impression of the developments in computer graphics:
\cite{MPM:PHASE_CHANGE} model phase transition (melting) to a liquid with using the fixed corotational hyperelasticity model (chapter \ref{sec:cor_hyper}) in combination with a discretization of the heat equation. \cite{MPM:OLROYDB} use an Olroyd-B model with Cauchy-Green strain to handle elasticity and plasticity for viscoelastic fluids, foams, and sponges. \cite{MPM:DRUCKER} uses Hencky-strain in combination with a St. Venant-Kirchhoff model and Drucker-Prager elastoplascity to model sand.
\cite{MPM:MULTI} combine particles of different species (sand and water) by one grid for each species and relating them with a heuristic momentum exchange. \cite{MPM:SHELLS} simulates thin shells (cloths,cups etc.) with a Kirchhoff-Love model using Hencky-strains.
There exist several implementations utilizing the \textbf{GPU for particle in cell} techniques, which the material point method is a part of:
\textbf{Particle sorting:} \cite{PIC:GPU} studies different particle list sorting methods. Methods like 'Message Passing Sort' \cite{kong2011particle} \cite{decyk2011adaptable}, 'In Place Particle-QuickSort' \cite{stantchev2008fast} and 'Linked List Ordering' \cite{burau2010picongpu} try to utilize the partial sorting and only update particles that need to be moved within the list. These methods often rely on the assumption that only a small number of particles change their grid node, where in contrast MPM can be quite dynamic. Those that do move, move at most to one neighboring grid-node or stay in the same node due to the CFL condition, chapter \ref{sec:cfl}. These methods often use fixed bin-sizes which can be quite memory intensive and add additional memory management. Fixed size binning makes the methods mentioned before easier to implement which don't necessarily map well to the GPU due to their high complexity and uncoalesced accesses. As a result \cite{PIC:GPU} come to the conclusion to fall back to the CUDA library \texttt{thrust}, although they do note that the aforementioned methods can perform faster for particles that move only a limited amount of grid nodes. \texttt{thrust} provides the radix sort method \texttt{thrust::sort\_by\_key()}, where one sorts by grid index as key.
In general deep or index sorts can be done. A deep sort reorders every variable of a particle. An index sort gives back an index with an offset to access the particle variable.
\cite{NVIDIA:NNSEARCH} compare radix to counting sort utilizing a uniform grid and use the latter for SPH-fluids.
This implementation compares index and deep counting sort and uses the latter. Although possible, this counting sort stays away from the limited movement implied by the Courant–Friedrichs–Lewy(CFL) condition. Fixed size bins are an assumption this implementation also does not want to impose on the method.
\textbf{Particle-to-grid-transfers} are generally done in two manners \cite{stantchev2008fast}:
\begin{enumerate}
\item \textbf{Particle push:} X particles in a local neighborhood of a grid point push their attributes onto it. Since this relationship results in a race condition \texttt{atomicAdd(float f)}s or explicit synchronization is necessary, e.g. a\-tomic: \cite{PIC:GPU}, synchronized: \cite{FRANCESCO:ROSSI}.
\item \textbf{Particle pull:} A grid point pulls the particles in its local neighborhood from a sorted particle list. This has the benefit of no race conditions being present.
\end{enumerate}
A \textbf{grid-to-particle-transfer} is in general easier since a particle already knows, due to its position, the grid points in its local neighborhood. The grid points in a uniform grid are 'sorted' by nature. This implementation tests one particle pull and two particle push methods.
Generally these methods are augmented by splitting the grid in 2D/3D blocks making use of the shared memory structure of the GPU \cite{PIC:GPU} \cite{honig2010generic} \cite{MPM:GPU}, often called domain decomposition. Shared memory is however limited per GPU (GTX 970, 48KB per block \cite{NVIDIA:GTX970}). The grid nodes reached by the support of the interpolation function on each side also have to be loaded into memory, typically called the halo of a block. Since the number of particles per grid point in the MPM is generally between 4-10, only very limited block sizes are available for pull methods that greedily load all particles \cite{MPM:SNOW}.
Stencil computations and filtering techniques underlie a very similar process to MPM-transfers. Although these techniques employ a one-to-one grid relationship, since no particles are involved, it is worth taking a look at them.
An interesting prospect for filtering techniques on the GPU is given by NVI\-DIA's new shuffle operations. Shuffle allows to 'share' register memory between threads of a warp \cite{NVIDIA:SHUFFLE}. They are however not supported on other architecture, yet.
Widely used is a 2.5D blocking: A domain (of the domain decomposition) is split into 2D-planes along one axis. Data reuse of shared memory is maximized by a cyclic-queue strategy: Start a thread for each grid node in the first plane and load the halo along the axis into shared memory. Then each thread iterates along the axis over the 2D-planes synchronously, discarding the oldest plane in memory while loading the next plane (along the axis) into shared memory. Thus there are no z-dimension halos. This however requires all particles in the halo to be present in the limited shared memory. As mentioned, this implementation wants to stay away from assumptions limiting particles similar to fixed bin sizes. \cite{brandvik2010sblock} \cite{williams2007scientific} \cite{krotkiewski2013efficient}
In particle push methods partial sums are also computed on the halo. Two different techniques exist to get the data between blocks coherent again:
\begin{enumerate}
\item Atomically add up the values of the halo. This results in two atomic adds on a side node, four on an edge node, and eight on a corner node; If the block extend is bigger than half the halo extend, otherwise more.
\item Every halo of every block corresponds to actual global memory. After a transfer, do add steps along each of the three axes. This is infeasible for large supports due to the added memory requirements. \cite{crassin2011interactive}
\end{enumerate}
\textbf{Particle activation:} If the particle's velocity is under a user-defined threshold, it becomes deactivated. It does not get recognized by any particle-to-grid-transfer, nor the grid solve(, nor any grid-to-particle-transfer not affected by the velocity transfer). They may of course be reactivated if the velocity in the neighborhood exceeds that threshold again or they collide. \cite{MPM:GPU}
Dividing the particles up can be realized with stream compaction algorithms discussed in chapter \ref{sec:implementation}.
\textbf{Particle resampling:} A method that could alleviate pressure off fixed bin sizes and make shared-memory restrictions less of a factor is particle resampling. Their intention mostly is to fill numerical gaps in the material at large deformations where the material is not supposed to break yet. Split and merge methods can directly control how many particles in a cell are allowed and accordingly increase or reduce them.
A simple \textit{Split} method divides one particle into eight with a distance of half the diagonal length of a grid cell from the original particle. The eight particles span a cube that can be randomly rotated if desired. Mass and volume get equally distributed; Velocity and deformation gradient get copied. Refer to chapter \ref{sec:mat_point} or for conservation of mass and momentum in MPM. \cite{gao2017adaptive}
A \textit{Merge} method looks for the closest neighbors and creates the new merged particle at the geometric center. Mass and volume are accumulated. Velocity is computed from a mass-weighted average of the participating particles. The deformation gradient is accordingly to chapter \ref{sec:svd} decomposed into $\boldsymbol{F} = \boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^T$. The quaternion average of $\boldsymbol{U}$s and $\boldsymbol{V}$s of the particles lead to an average rotation $\overline{\boldsymbol{U}}$ and $\overline{\boldsymbol{V}}$, respectively. The principal strains within $\boldsymbol{\Sigma}$ are just averaged component-wise to $\overline{\boldsymbol{\Sigma}}$. One can compose them in the same fashion leading to an average deformation gradient with $\overline{\boldsymbol{F}} = \overline{\boldsymbol{U}}\overline{\boldsymbol{\Sigma}}\overline{\boldsymbol{V}}^T$.\cite{gao2017adaptive}
Unconditionally keeping particles on a fixed size is not desirable due to fracture requiring that gaps will be created. \cite{gao2017adaptive} produce a signed distance field to identify interior points. Surface points also should not merge or split due to visual artifacts. Furthermore, the signed distance field can be used to adapt the grid size depending on the distance to the surface. This allows for coarser levels with increasing distance from the surface.
%look again for MPM/PIC gpu impl
%SPH, FEM methods
\clearpage
\section{Notation}\label{sec:notation}
Multi-component types are generally printed in bold letters $\boldsymbol{A}$, $\boldsymbol{a}$. Vectors furthermore use lower-case letters $\boldsymbol{v}$ with the exception of angular momentum $\boldsymbol{L}$; Matrices use upper-case letters $\boldsymbol{M}$. A variable in the Lagrangian form has a left subscript $0$: $ _0{\boldsymbol{x}}$. The Eulerian description has a left subscript $t$: $_t\boldsymbol{x}$. Occasionally, notation may be omitted where it is apparent from context.
Beginning from chapter \ref{sec:mat_point} variables defined on a particle will have right subscript $p$ (e.g. $\boldsymbol{x}_p$). Grid cells will be assigned a right subscript $i$ or additionally $j$ denoting the grid index (e.g. $\boldsymbol{x}_i$). This is not to be confused with the following Einstein-notation. Therefore, beginning with chapter \ref{sec:mat_point} Einstein-notation for components will use greek letters ($\alpha,\beta$). Occasionally summation over grid index $i$ and particles $p$ may be implied. A variable of the $n$-th time step with associated time $t^n = \sum_{i=1}^{n-1} \Delta t^i$ will have a right superscript: $\boldsymbol{x}^n$.
Used throughout this thesis is at instances the \textbf{Einstein-notation} when using tensors and vectors.
Repeated indices, that are also defined on the variable, imply component-wise operations. Let $\boldsymbol{a}$ and $\boldsymbol{b}$ be vectors of dimension $n$ and $\boldsymbol{A,B,D}$ $ { m \times n }$ tensors. $a _ { i } b _ { j }$ is multiplying component $i$ with component $j$ of vectors $\boldsymbol{a}$ and $\boldsymbol{b}$. Vector and tensor/matrix addition thus become:
\begin{equation} \label{EINSTEIN:ADD}
c _ { i } = a _ { i } + b _ { i } \quad \text { and } \quad D _ { i j } = A _ { i j } + B _ { i j }.
\end{equation}
A transpose operator swaps the indices:
\begin{equation}
D^T _ { i j } =D_{ji}.
\end{equation}
Repeated indices, that are not otherwise defined, however imply summation on that index. The vector dot product becomes:
\begin{equation}
a _ { i } b _ { i } \equiv \sum _ { i = 1 } ^ { n } a _ { i } b _ { i }.
\end{equation}
Following this notation the Frobenius inner product between two second-order tensors is (also called Frobenius scalar product):
\begin{equation} \label{EINSTEIN:FROBENIUS1}
\boldsymbol { A } : \boldsymbol { B } \equiv A _ { i j } B _ { i j } \equiv \sum _ { i = 1 } ^ { n } \sum _ { j = 1 } ^ { m } A _ { ij } B _ { ij }.
\end{equation}
The Frobenius inner product between a ${r \times s \times m \times n }$ fourth-order tensor $\boldsymbol{C}$ and second-order tensor creates a second order tensor combining the ideas of \ref{EINSTEIN:ADD} and \ref{EINSTEIN:FROBENIUS1}:
\begin{equation}
\boldsymbol { A } = A_ {ij} = \boldsymbol {C} : \boldsymbol { B } = C _ {i j k l} B _ { k l } = \sum _ { k = 1 } ^ { m } \sum _ { l = 1 } ^ { n } C _ { ijkl } B _ { kl }.
\end{equation}
Matrix-Vector and Matrix-Matrix multiplication can be expressed this way as:
\begin{equation}
b_i = A_{i j} a_j \quad D _ { i j } = A _ { i k } B _ { k j }.
\end{equation}
The definition of the Kronecker Delta is as follows:
\begin{equation}
\delta _ { i j } = \left\{
\begin{array} { l l l }
{ 1 } & {\text{if}} & {i = j} \\
{ 0 } & {\text{if}} & { i \neq j }
\end{array}. \right
\end{equation}
The Kronecker Delta as an operator is most efficiently described as a substitution:
\begin{equation}
a _ { i } \delta _ { i j } = a _ { j }.
\end{equation}
I.e. this component is only evaluated if $i = j$. The Kronecker Delta comes up in differentiating a variable by itself with same or different index:
\begin{equation}
\frac{\partial{x_0}}{\partial{x_1}}=0, \frac{\partial{x_0}}{\partial{x_0}}=1 \Rightarrow \frac{\partial{x_i}}{\partial{x_j}} = \delta_{ij}.
\end{equation}
The alternating tensor or Levi-Civita symbol
\begin{equation}
\varepsilon _ { i j k } = \left\{ \begin{array} { r l } { + 1 } & { \text { if } ( i , j , k ) \text { is } ( 1,2,3 ) , ( 2,3,1 ) , \text { or } ( 3,1,2 ) } \\ { - 1 } & { \text { if } ( i , j , k ) \text { is } ( 3,2,1 ) , ( 1,3,2 ) , \text { or } ( 2,1,3 ) } \\ { 0 } & { \text { if } i = j , \text { or } j = k , \text { or } k = i } \end{array} \right.
\end{equation}
is used to express cross-products $\boldsymbol{c}=\boldsymbol{a} \times \boldsymbol{b}$:
\begin{equation}
c_i = \varepsilon_{ijk} a_j b_k \left(= \sum _ { j = 1 } ^ { n } \sum _ { k = 1 } ^ { n } \varepsilon_{ijk} a_j b_k\right).
\end{equation}
A useful relation to the Kronecker-Delta is:
\begin{equation} \label{eq:compact_levi}
\varepsilon _ { i j k } \varepsilon _ { i m n } = \delta _ { j m } \delta _ { k n } - \delta _ { j n } \delta _ { k m }.
\end{equation}
\begin{flushright}\cite{MCGINTY:CONTINUUM}\end{flushright}
\clearpage
\section{Basics}\label{sec:basics}
The first chapters will derive the material point method and necessary extensions for an elastic model. An elastic model is an interesting case displaying much of the needed computations for a typical model. The last chapter \ref{sec:gpgpu} tends to the GPGPU side.
\subsection{Reference and current configuration}\label{sec:ref_corr_config}
Particles (or material points) in continuum mechanics are not what classically might be thought of as a particle. Rather, the continuum assumption holds: Each particle represents a continuous piece of material s.t. a microscopic view does not need to be adopted. A particular body is composed of a set of particles and can adapt different configurations due to changes in shape. These changes are caused by external or internal effects (forces etc.) on it and deform the body over time. \cite{MIT:CONTINUUM_MECHANICS} \cite{MPM:COURSE}
When modelling solids, changes of quantities from an initial reference configuration $_0\boldsymbol{x}$ to another current configuration $_t\boldsymbol{x}$ need to be measured. In the material point method the reference configuration $_0\boldsymbol{x}$ is just the initial configuration of the body (at time $t=0$). This is similar to the total Lagrangian formulation in finite element methods \cite{bathe2006finite}.
Let $\Omega ^ { 0 } , \Omega ^ { t } \subset \mathbb { R } ^ { { 3 } }$ be the set of (material) points in the reference and current configuration respectively. Then one may define a function or mapping ${^t_0}\phi ( \cdot , t ) : \Omega ^ { 0 } \rightarrow \Omega ^ { t }$ which relates the reference configuration to the current configuration. Let for simplicity $_0\boldsymbol{x}$ and $_t\boldsymbol{{x}}$ describe the position of the particle in their respective configurations. Then this mapping becomes the deformation of the body from the reference configuration $_0\boldsymbol{x}$:
\begin{equation}
_t\boldsymbol{x} = {^t_0}\phi ( _0\boldsymbol{x} , \boldsymbol { t } )
\end{equation}
If for instance the body consisting of each material point $_t\boldsymbol{x}$ moves with velocity $\boldsymbol{v}$ and rotation $\boldsymbol{R}(t)$ , this mapping is defined as:
\begin{equation}\label{eq:rigid}
_t\boldsymbol{{x}} = {^t_0}\phi ( _0\boldsymbol{x} , \boldsymbol { t } ) = \boldsymbol{R}(t)_0\boldsymbol{x}+\boldsymbol{v}t
\end{equation}
The velocity of a material point in the reference configuration can be defined using this mapping
\begin{equation}\label{eq:velocity}
_0\boldsymbol{v}(_0\boldsymbol{x},t) = \frac{\partial ({^t_0}\phi)}{\partial t}(_0\boldsymbol{x},t),
\end{equation}
and similarly the acceleration is defined
\begin{equation}
_0\boldsymbol{a}(_0\boldsymbol{x},t) = \frac{\partial^2 ({^t_0}\phi)}{\partial t^2}(_0\boldsymbol{x},t) = \frac{\partial _0\boldsymbol{v}}{\partial t}(_0\boldsymbol{x},t).
\end{equation}
It is helpful to abstract away from the reference configuration and think of it as being defined in a different fixed material space. Physically this has the impact of moving with the particle in world space. Commonly known as the Lagrangian form. It is often easier in continuum mechanics to start with a Lagrangian description and switch to a Eulerian one if needed. The Eulerian description is static: Variables of particles moving by are measured while staying in a fixed position.
These descriptions are different but they will yield the same measurements when related correctly. I.e. both configurations refer variables defined on them to the deformed state but the position, where the 'lookup' of that value happens, is different. In the reference configuration lookup is done at the initial position of the particle. In the current configuration lookup happens at the particle's world position. These relations for some particle quantity $f$ are called the (Lagrangian) pull back
\begin{equation}
_0f(_0\boldsymbol{x},t)= {_tf}({^t_0}\phi(_0\boldsymbol{x},t),t)
\end{equation}
and the (Eulerian) push forward
\begin{equation}\label{eq:push_forward}
_tf(_t\boldsymbol{x},t) = {_0f}({^t_0}\phi^{-1}(_t\boldsymbol{x},t),t) = {_0f}({_t^0}\phi(_t\boldsymbol{x},t),t)
\end{equation}
with definitions over their respective spaces $_tf ( \cdot , t ) : \Omega ^ { { t } } \rightarrow \mathbb { R }$, $_0f ( \cdot , t ) : \Omega ^ { { 0 } } \rightarrow \mathbb { R }$. To enable the operator ${^t_0}\phi$ to be homeomorphic, s.t. an inverse ${^t_0}\phi^{-1} = {_t^0}\phi$ is defined, it is assumed that no two particles will ever occupy the same space at the same time.
The difficulty in the eulerian formulation becomes apparent when differentiating (due to the chain rule):
\begin{equation} \label{eq:eulerian_general}
\frac {\partial}{\partial t} {_0f_i } ( {_0\boldsymbol { x }} , t ) =
\frac { \partial {_tf_i} } { \partial t } ( {^t_0}\phi ( {_0\boldsymbol {x }} , t ) , t ) + \frac { \partial {_tf_i }} { \partial {_tx_j} } ( {^t_0}\phi ( {_0\boldsymbol { x }} , t ) , t ) \frac { \partial {^t_0}\phi_j } { \partial t } ( {_0\boldsymbol { x }} , t ).
\end{equation}
Combining this with equation \ref{eq:velocity} and applying the push forward \ref{eq:push_forward} to cancel out mappings leads to the definition referred to as the material derivative in the current configuration:
\begin{equation} \label{eq:material_derivative}
\frac { { D } } { { D } { t } } { {f} } ( _t\boldsymbol { x } , { t } ) = \frac { \partial { {f} } } { \partial { t } } ( _t\boldsymbol {x } , { t } ) + \frac { \partial { {f} } } { \partial x _ { { j } } } ( _t{\boldsymbol { x }} , { t } ) {v} _ { { j } } ( _t\boldsymbol { x } , { t } ).
\end{equation}
The Jacobian of the deformation map $\phi$ is the deformation gradient $\boldsymbol{F}$ and is one of the key components to measure strain for material models:
\begin{equation}
{^t_0 F_{ij}} ( _0\boldsymbol {x} , t ) = \frac { \partial {^t_0\phi_i} } { \partial _0 x _j } ( _0\boldsymbol { x } , t ) = \frac { \partial _t x _i } { \partial _0 x_j } ( _0\boldsymbol { x } , t ).
\end{equation}
Topology specifies a neighborhood using the open ball concept $_{0,t}B_\epsilon(\boldsymbol{x}) = \{\boldsymbol{y} \in {\Omega^{0,t} | d(\boldsymbol{x},\boldsymbol{y}) < \epsilon}\}$ given a distance measure $d$. $_0B_\epsilon$ becomes the pre-image of $_tB_\epsilon$ under $_0^t\phi$. Intuitively the deformation gradient measures the local deformation of all particles in a small neighborhood $_0B_\epsilon$ to $_tB_\epsilon$. This allows to describe infinitesimal changes in position from the reference to the current configuration
\begin{equation}\label{eq:def_grad_pos}
d {_tx_i} = {^t_0}F{_{ij}} d{_0x_j}.
\end{equation}
With this quantity in place volume and area changes are calculable. In a typical analytical fashion a coordinate system change $_0x \rightarrow {_tx}$ for a quantity $_0g$ is done using the determinant of the Jacobian matrix. The determinant is given a separate name $^t_0J =\text{det}(^t_0\boldsymbol{F})$. The push forward of $_0g:\Omega^0\rightarrow \mathbb{R}^d$ thus becomes $_tg:\Omega^t\rightarrow \mathbb{R}^d$:
\begin{equation} \label{eq:volume_integral}
\int _ { {_t\boldsymbol { B } }} {_tg} ( {_t\boldsymbol { x }} ,t) d {_t\boldsymbol { x }} = \int _ { { _0\boldsymbol { B }}} {_0 g } ( {_0\boldsymbol{x}},t) {^t_0 J}d {_0\boldsymbol{x}}.
\end{equation}
This can also be achieved by the cross product. A cube spanned by vectors $_0\boldsymbol{x}_i$ ($i=1,2,3$) becomes a parallelepiped in the deformed configuration $d_tV =|{^t_0\boldsymbol{F}}d_0\boldsymbol{x}_0 \cdot({^t_0\boldsymbol{F}}d_0\boldsymbol{x}_1 \times {^t_0\boldsymbol{F}}d_0\boldsymbol{x}_2)| {d_0V}$:
\begin{equation}\label{eq:j}
d_tV = {^t_0J} d_0V.
\end{equation}
The area change is given by Nanson's Formula. $d\boldsymbol{A}$ is a vector pointing in the direction of the normal of the area and $d_t\boldsymbol{l} ={}^t_0\boldsymbol{F}d_0\boldsymbol{l}$ an arbitrary line element:
$$ d{_0V} = d{_0\boldsymbol{A}}\cdot d{_0\boldsymbol{l}},\quad d{_tV} = d{_t\boldsymbol{A}} \cdot d{_t\boldsymbol{l}}$$
$$ \xRightarrow{\ref{eq:def_grad_pos},\ref{eq:j}} {^t_0J} d_0\boldsymbol{A} \cdot d_0\boldsymbol{l} = d{_t\boldsymbol{A}} \cdot ({^t_0\boldsymbol{F}}d{_0\boldsymbol{l}})$$
\begin{equation}
\Rightarrow d_t\boldsymbol{A} = {^t_0\boldsymbol{F}}^{-T} {^t_0}J d_0\boldsymbol{A} = {^0_t\boldsymbol{F}}^{T} {^t_0}J d_0\boldsymbol{A}.
\end{equation}
A surface integral may then be transformed to reference configuration by
\begin{equation}\label{eq:surface_integral}
\int _ { \partial {_0B} } \boldsymbol{h} ( {_t\boldsymbol{x}} , t ) \cdot d \boldsymbol{A} ( {_t\boldsymbol{x}} ) = \int _ { \partial {_tB} } \boldsymbol{h} ( {_0\boldsymbol{x}} , t ) \cdot \boldsymbol{F}^{-T}J d \boldsymbol{A} ( {_0\boldsymbol{x}} )
\end{equation}
where $_0\boldsymbol{h} = \boldsymbol{h}(_0\boldsymbol{x},\cdot)$ is the pull back of $_t\boldsymbol{h} = \boldsymbol{h}(_t\boldsymbol{x},\cdot)$. $d_0\boldsymbol{A}$, $d_t\boldsymbol{A}$ point in the direction of the surface normal of $\partial {_0B}(_0\boldsymbol{x})$, $\partial {_tB}(_t\boldsymbol{x})$, respectively.
\cite{MIT:CONTINUUM_MECHANICS}
\cite{MPM:COURSE}
\subsection{Polar and singular value decomposition}\label{sec:svd}
The target is to define strain measures in terms of the deformation gradient: $\boldsymbol{\epsilon}(\boldsymbol{F})$. In equation \ref{eq:rigid} a rigid body movement was introduced. Let $b_i(t) = v_it$ be more generally some translation. A problem arises when calculating the deformation gradient of this equation.
\begin{equation}
^t_0F_{ij} = \frac{\partial{_tx_i}}{\partial_0x_j} = \frac{\partial (R_{ik}(t)_0x_k+b_i(t))}{\partial _0x_j} = R_{ik}(t) \delta_{kj} = R_{ij}(t).
\end{equation}
As can be seen, the deformation gradient contains a rigid rotation. For strain measures this is not beneficial as an assumption of the stiffness tensor requires no net-rotation (more in chapter \ref{sec:linear_elasticty}). I.e. the deformation gradient has two components a constant rotation and the actual distortion or strain. There are two ways to deal with this:
\begin{enumerate}
\item Use a strain measure that cancels out the rotation. An example for this would be the Green-Lagrangian strain tensor with quadratic components:
\begin{equation}\label{eq:green}
E _ { i j } = \frac { 1 } { 2 } \left( F _ { k i } F _ { k j } - \delta _ { i j } \right).
\end{equation}
\item \label{it:polar} Polar decompose the deformation gradient in its rotational $\boldsymbol{R}$ and (symmetric positive definite) distortional $\boldsymbol{S}$ parts $\boldsymbol{F}=\boldsymbol{R}\boldsymbol{S}$.
\end{enumerate}
That equation \ref{eq:green} cancels out the rotational part can be shown by item \ref{it:polar}:
$$ \frac { 1 } { 2 } \left( F _ { k i } F _ { k j } - \delta _ { i j } \right) = \frac { 1 } { 2 } \left( S _ {m i}R_{k m} R _ { k n} S_{n j} - \delta _ { i j } \right)$$
$$= \frac { 1 } { 2 } \left( S _ {m i} \delta_{mn} S_{n j} - \delta _ { i j } \right) = \frac { 1 } { 2 } \left( S _ {i m} S_{m j} - \delta _ { i j } \right)$$
$$= \frac{1}{2} \left(\boldsymbol{S}^2 - I \right).$$
Regarding item \ref{it:polar}: Assuming a singular value decomposition of
\begin{equation}\label{eq:svd}
\boldsymbol{F} = \boldsymbol{U\Sigma V}^T
\end{equation}
is already computed where $\boldsymbol{U}$,$\boldsymbol{V}$ are orthogonal matrices and $\boldsymbol{\Sigma}$ is a diagonal matrix containing the singular values $\sigma_1 \geq \sigma_2 \geq ... \geq \sigma_r$ of $\boldsymbol{F}$. $(r-n)$ forms the dimension of the null space. The polar decomposition is computable as:
\begin{equation}
\boldsymbol{R} = \boldsymbol{V} \boldsymbol{W} ^ { T }, \quad \boldsymbol{S} = \boldsymbol{W} \boldsymbol{\Sigma} \boldsymbol{W} ^ { T }.
\end{equation}
Since singular values are positive, it is straightforward to see that the properties for $\boldsymbol{R}$ and $\boldsymbol{S}$ hold.
The components of the singular value decomposition are important to gain an intuition for its usefulness: The columns of $\boldsymbol{U}$ and $\boldsymbol{V}$ span bases for the row and column spaces of $\boldsymbol{F}$ using the left and right singular vectors $\boldsymbol{u}_i,\boldsymbol{v}_i$, respectively \cite{MIT:SVD}. For illustrating purposes imagine the manipulation of $\boldsymbol{v}_1$ due to $\boldsymbol{U\Sigma V}^T$:
\begin{enumerate}
\item Transform from the right singular vector space to standard basis space: $\boldsymbol{V}^T\boldsymbol{v}_1$ $ = \boldsymbol{e}_1$.
\item Scale by singular values to transform to principal stretch space: $\boldsymbol{\Sigma} \boldsymbol{e}_1 = \sigma_1 \boldsymbol{e}_1$.
\item \label{it:tr_sing_vect} Transform to left singular vector space: $\boldsymbol{U}\sigma_1\boldsymbol{e}_1 = \sigma_1 \boldsymbol{u}_{1}$.
\end{enumerate}
For a positive definite matrix the singular value decomposition becomes even easier as $\boldsymbol{U}=\boldsymbol{V}$. Item \ref{it:tr_sing_vect} then effectively just becomes a transform 'back'.
In the following a summary of the $3 \times 3$ singular value decomposition as in \cite{SVD:3x3} is given. The proposed singular value decomposition is also called the 'Polar SVD' and follows a specific convention.
\begin{enumerate}\label{ref:itemize_conv}
\item\label{item:refl_free} $\boldsymbol{U}$,$\boldsymbol{V}$ are reflection-free corresponding to true rotation matrices, i.e. both $\text{det}(\boldsymbol{U}), \text{det}(\boldsymbol{V}) = 1$ hold.
\item If $\text{det}(\boldsymbol{F}) = -1$ the negative sign needs to move on to $\boldsymbol{\Sigma}$ as a result of item \ref{item:refl_free}. The lowest singular value in magnitude will get a negative sign attached.
\end{enumerate}
This convention does not change the existence or uniqueness of the singular value or polar decomposition although strictly speaking $\boldsymbol{S}$ is not positive definite anymore.
\\
The algorithm proceeds as follows:
\begin{enumerate}
\item \label{it:eig} \textbf{Symmetric eigenanalysis:} A Jacobi eigenvalue algorithm begins with the symmetric positive definite matrix $\boldsymbol { S } ^{(0)}= \boldsymbol { A } ^ { T } \boldsymbol { A } = \boldsymbol { V } \boldsymbol { \Sigma } ^ { 2 } \boldsymbol { V } ^ { T }$.
\begin{enumerate}[label*=\arabic*.]
\item \label{item:sk} Iteratively compute (an also symmetric, positive definite) $\boldsymbol { S } ^ { ( k + 1 ) } = [ \boldsymbol { Q } ^ { ( k )}] ^ { T } \boldsymbol { S } ^ { ( k ) } \boldsymbol { Q } ^ { ( k ) }$ where $\boldsymbol{Q}$ is a Givens-Rotation aiming to eliminate off $S_{12}$. Store $\boldsymbol{V}^{(k+1)} = \boldsymbol{V}^{(k)} \boldsymbol{Q}^{(k)}$.
\item \label{item:sii} Do \ref{item:sk} again for the other off-diagonal entries $S_{13},S_{23}$.
\item \label{item:redo_sii} Redo \ref{item:sk} - \ref{item:sii} a fixed amount of steps $m$.
\end{enumerate}
\item \textbf{Sorting singular values:} Compute $\boldsymbol{B} := \boldsymbol{AV}$, where $\boldsymbol{V} = \boldsymbol{V}^{(3m)}$. Acquire $\boldsymbol{\Sigma}$ by $\|\boldsymbol{b}_i \|_2 = \|\boldsymbol{u}_i\sigma_i\|_2 = |\sigma_i|$, where also $\boldsymbol{B=U\Sigma}$ holds. Permute the singular values by sorting them in decreasing order. Apply the same permutation to the columns of $\boldsymbol{B}$ and $\boldsymbol{V}$, where switches in $\boldsymbol{V}$ also cause a sign change. Enforce the convention mentioned above.
\item \label{item:qr} \textbf{QR-factorization:} Compute $\boldsymbol{U}$ using a $\boldsymbol{QR}$-factorization with Givens-Rotations where $\boldsymbol{B}=\boldsymbol{QR}=\boldsymbol{U\Sigma}$. The $\boldsymbol{QR}$-factorization is done once in the same fashion as in item \ref{item:sk} - \ref{item:sii}.
\end{enumerate}
Due to inherent normalization, fast multiplication and storage efficiency quaternions are preferred over actual rotation matrices. In item \ref{item:qr} a $\boldsymbol{QR}$-factorization is preferred over a column normalization of $\boldsymbol{\Sigma U}$ due to its inaccuracy at near-zero singular values. In general $\boldsymbol{R}$ is an upper triangular matrix. In item \ref{item:qr} it reduces to being diagonal.
\cite{MPM:COURSE}\cite{SVD:3x3}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Constitutive Models %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Constitutive models}\label{seq:constitutive_models}
It is common practice in elastoplascity theory to decompose the deformation gradient in its elastic and plastic parts $\boldsymbol{F} = \boldsymbol{F}_E \boldsymbol{F}_P$. The elastic part is recoverable. The plastic part is irreversibly lost. Plastic models are not covered here. They extend the energy densities discussed here with additional terms $\Psi(\boldsymbol{F}_E,\boldsymbol{F}_P)$. For more information refer to the literature \cite{ochsner2014elasto}.
\subsubsection{Linear elasticity}\label{sec:linear_elasticty}
The first aim will be to find the strain energy density $\Psi(\boldsymbol{\epsilon})$ of the strain $\bm{\epsilon}$. The most general linear stress-strain relationship is given by Hooke's Law in three dimensions:
\begin{equation}\label{eq:hook}
\sigma_{ij} = C_{ijkl} \epsilon_{kl},
\end{equation}
where $\boldsymbol{\sigma}$ and $\bm{\epsilon}$ are second-order tensors with (3x3) = 9 elements. $C_{ijkl}$ is a fourth-order stiffness-tensor with (3x3)x(3x3) = 81 elements.
Assuming following symmetries reduces the tensors to 6 and 21 unique elements respectively:
\begin{enumerate}
\item Conservation of angular momentum: $\sigma_{ij} = \sigma_{ji} \Rightarrow C_{ijkl} = C_{jikl}$.
\item No-net-rotation: $\epsilon_{kl} = \epsilon_{lk} \Rightarrow C_{ijkl} = C_{ijlk}$.
\item \label{it:mixed_partials}Equivalence of second-order mixed partials of $\Psi$:
\begin{equation}
C _ { i j k l } = \frac { \partial ^ { 2 } { \Psi } } { \partial \epsilon _ { k l } \partial \epsilon _ { i j } } = \frac { \partial ^ { 2 } { \Psi } } { \partial \epsilon _ { i j } \partial \epsilon _ { k l } } = C _ { k l i j },
\end{equation}
\end{enumerate}
where item \ref{it:mixed_partials} holds for the strain energy density functional of an (hyper-)elastic material. This potential $\Psi$ is thus similar to potentials for example involved in gravitation, electrodynamics or fluids. The stress may then also be calculated by
\begin{equation}\label{eq:partial_energy}
\sigma _ { i j } = \sigma _ { i j } ( \boldsymbol{\epsilon} ) = \frac { \partial { \Psi } } { \partial \epsilon _ { i j } },
\end{equation}
if such a $\Psi$ is given. An isotropic (direction-independent) linear elastic material further only has three unique elements $C _ {i j k l}$. Using Voigt-Notation, which collapses indices $i=j$ and $k=l$, equation \ref{eq:hook} can be rewritten as:
\begin{equation} \label{eq:voigt}
\left[ \begin{array} {c}
{\sigma_{11}}\\
{\sigma_{22}}\\
{\sigma_{33}}\\
{\sigma_{23}}\\
{\sigma_{13}}\\
{\sigma_{12}}\\
\end{array} \right]
= \left[ \begin{array} { c c c c c c }
{ C _ { 11 } } & { C _ { 12 } } & { C _ { 12 } } & 0 & 0 & 0 \\
{} & { C _ { 11 } } & { C _ { 12 } } & 0 & 0 & 0 \\
{} & {} & { C _ { 11 } } & 0 & 0 & 0 \\
{} & {} & {} & { C _ { 22 } } & 0 & 0 \\
{} & {} & {} & {} & { C _ { 22 } } & 0 \\
{sym} & {} & {} & {} & {} & { C _ { 22 } } \\
\end{array} \right]
\left[ \begin{array} {c}
{\epsilon_{11}}\\
{\epsilon_{22}}\\
{\epsilon_{33}}\\
{\gamma_{23}}\\
{\gamma_{13}}\\
{\gamma_{12}}\\
\end{array} \right].
\end{equation}
Experimental results of Hooke's law commonly give
\begin{equation}
\boldsymbol{\epsilon} =
\epsilon _ { i j } = \frac { 1 } { E } \left[ ( 1 + \nu ) \sigma _ { i j } - \nu \sigma _ { k k } \delta _ { i j } \right]
= \boldsymbol{C}^{-1} \bm{\sigma}
\end{equation}
using engineering constants Young's modulus $E$ and Poisson ratio $\nu$. Inverting $\boldsymbol{C}^{-1}$ and switching to Lamé parameters $\lambda$ and $\mu$ results in the equation:
\begin{equation} \label{eq:stress_strain}
\sigma_{ij} = 2\mu \epsilon_{ij} + \lambda \text{tr}(\boldsymbol{\epsilon}) \delta_{ij}.
\end{equation}
Comparing with equation \ref{eq:voigt} leads to coefficients $\gamma_{ij} = 2\epsilon_{ij} \text{ for } (i \neq j)$, $C_{11} = \lambda +2 \mu$, $C_{12} = \lambda$, and $C_{22} = \mu$. $\boldsymbol{\gamma}$ is also referred to as the engineering strain. Due to the relationship in \ref{eq:partial_energy} the model for linear elasticity in terms of the strain energy density function $\Psi_{LE}$ after integration of \ref{eq:stress_strain} concludes to:
\begin{equation}\label{eq:linear_elas}
\Psi_{LE} = \mu \Vert \boldsymbol{\epsilon} \Vert^2_F + \frac{\lambda}{2} \text{tr}^2(\boldsymbol{\epsilon}).
\end{equation}
\begin{flushright}\cite{MIT:LINEAR_ELASTICITY}\end{flushright}
\subsubsection{Corotated hyperelasticity}\label{sec:cor_hyper}
The simplest tensor assumed by infinitesimal strain theory is the small strain tensor: \begin{equation}
\boldsymbol{\epsilon}_l = \frac { 1 } { 2 } \left( \boldsymbol { F }_E + \boldsymbol { F }_E ^ { T } \right) - \boldsymbol { I }.
\end{equation}
The energy produced by equation \ref{eq:linear_elas} using the small strain tensor is not rotationally invariant w.r.t. to $\boldsymbol{F}_E$: $\Psi_{LE}(\boldsymbol{\epsilon}_l(\boldsymbol{R}_{0} \boldsymbol{F}_E)) \neq \Psi_{LE}(\boldsymbol{\epsilon}_l(\boldsymbol{F}_E))$. Rigid body motions however don't result in strain and consequently don't need to be recovered from. So energy should not change. Given the polar decomposition $\boldsymbol{F}_E = \boldsymbol{R}_E\boldsymbol{S}_E$ an alternate strain measure may be defined as:
\begin{equation}
\boldsymbol { \epsilon } \left( \boldsymbol { F }_E \right) = \boldsymbol { \epsilon }_l \left( \boldsymbol { R }_E ^ { T } \boldsymbol { F }_E \right) = \frac { 1 } { 2 } \left( \boldsymbol { R }_E ^ { T } \boldsymbol { F }_E + \left( \boldsymbol { R }_E ^ { T } \boldsymbol { F }_E \right) ^ { T } \right) - \boldsymbol { I } = \boldsymbol { S }_E - \boldsymbol { I }.
\end{equation}
Substituting $ {\boldsymbol{\epsilon}}$ into equation \ref{eq:linear_elas} leads to the energy definition of corotational hyperelasticity:
\begin{equation}\label{eq:corot_S}
\Psi_{CH} = \mu \| \boldsymbol{ S }_E - \boldsymbol { I } \| _ { F } ^ { 2 } + \frac { \lambda } { 2 } \operatorname { tr } ^ { 2 } ( \boldsymbol { S }_E - \boldsymbol { I } ).
\end{equation}
Using rotation-invariance of the Frobenius norm:
\begin{equation}
\Psi_{CH} = \mu \| \boldsymbol {F}_E - \boldsymbol {R}_E \| _ { F } ^ { 2 } + \frac { \lambda } { 2 } \operatorname { tr } ^ { 2 } \left( \boldsymbol { R }_E ^ { T } \boldsymbol { F }_E - \boldsymbol { I } \right)
\end{equation}
Even more insight yields the relationship to their singular values $\sigma _ i$, also called the principal stretches:
$$\operatorname {tr}\left(\boldsymbol{S}\right) = \sum _ { i = 1 } \sigma _ { i } = \operatorname{tr}\left(\boldsymbol{\Sigma}_E\right)$$
$$\| \boldsymbol{S}_E \| _ { F }^2 = \sum _ { i = 1 } \sigma _ { i } ^ { 2 } = \| \boldsymbol{\Sigma}_E\|_{F}^2$$
$$\Rightarrow \| \boldsymbol{S}_E\boldsymbol{-I} \| _ { F }^2 = \|\boldsymbol{S}_E \| _ { F }^2 - 2 \operatorname{tr}(\boldsymbol{S}_E) + \|\boldsymbol{I}\|_{F}^2 = \| \boldsymbol { \Sigma }_E - \boldsymbol { I } \| _ { F } ^ { 2 }$$
\begin{equation} \label{eq:corot_sing}
\Psi_{CH}(\boldsymbol{\Sigma}_E) = \mu \| \boldsymbol { \Sigma }_E - \boldsymbol { I } \| _ { F } ^ { 2 } + \frac { \lambda } { 2 } \operatorname { tr } ^ { 2 } ( \boldsymbol { \Sigma }_E - \boldsymbol { I } )
\end{equation}
Equation \ref{eq:corot_sing} uses the diagonal matrix $\boldsymbol{\Sigma}_E$ containing the principal stretches typically acquired by the singular value decomposition $\boldsymbol{F}_E \boldsymbol{= U \Sigma}_E\boldsymbol{ V ^T }.$ Since the energy density in \ref{eq:corot_sing} is a function of only three (singular) values, which describe stretch/compression of the material, isotropy of the material is underlined.
\begin{flushright}\cite{ADAMS:ELASTICITY}\end{flushright}
\subsubsection{Fixed corotated hyperelasticity}
Numerical implicit stepping algorithms mostly rely on using the first and second derivative of $\Psi$, chapter \ref{sec:newton}. Furthermore one may describe an isotropic elastic model due to their principal stretches $\sigma_1,\sigma_2,\sigma_3$, chapter \ref{sec:svd}. The rest configuration is reached, when $\sigma_1,\sigma_2,\sigma_3 = 1$. I.e. no elastic forces will be exerted because the material is completely relaxed (again).
Under material relaxation these stepping algorithms are attracted to a material-dependent primary contour. For instance for a model that aims for high volume preservation (Poisson ratio $\nu =0.5$) the primary contour is primarily enforcing volume preservation ($ J_E = \sigma_1\sigma_2\sigma_3 \approx 1$). Reaching the rest configuration is only the secondary goal.
The primary contour can be formalized as $\boldsymbol{v}\cdot \boldsymbol{g} = 0$ of the gradient $g_i = \frac{\partial \Psi}{\partial \sigma_i}$ and the eigenvector with the largest-magnitude eigenvalue $\boldsymbol{v}$ of the Hessian $H_{ij} = \frac{\partial \Psi}{\partial \sigma_i \sigma_j}$. Problems arise in $\Psi_{CH}$ as the primary contour can easily cross into the inverted region ($\sigma_3 < 0$) leading to inverted configurations while relaxing. This is undesired behavior. Furthermore, the primary contour can lead into energy kinks at extreme stretches leading to oscillations.
Therefore, they propose to use the fixed corotated hyperelasticity energy density:
\begin{equation}
\Psi_{FCH} = \mu \|\boldsymbol{\Sigma}_E - \boldsymbol{I}\|^2_F + \frac{\lambda}{2}(J_E-1)^2.
\end{equation}
This leads to the primary contour $J=1$ which does not cross the inverted region as only one singular value may be negative.
\begin{flushright}\cite{MPM:INVERT}\end{flushright}
\subsection{Governing equations}
Starting from the governing equations this chapter will lead to the weak formulation core of any finite element and material point method.
\subsubsection{Conservation of mass}
Let the Eulerian mass density be $_t\rho(_t\boldsymbol{x},t)$. Similarly, let its (Lagrangian) pull back be $_0\rho(_0\boldsymbol{x},t)$.
After \ref{eq:volume_integral} they are related as:
\begin{equation} \label{eq:density_pull_back}
\int _ {_tB _ { \epsilon }}_t\rho(_t\boldsymbol{x},t)d_t\boldsymbol{x} = \int _ {_0B _ { \epsilon }} {^t_0J}_0\rho(_0\boldsymbol{x},t) d_0\boldsymbol{x}.
\end{equation}
An open ball $ _ { 0 }B _ { \epsilon } $ in the reference configuration will have the same mass as its respective open ball $B_{\epsilon}^t$ in the current configuration. Keep in mind that both refer to a deformed state.
Conservation of mass dictates that mass does not depend on motion or time. Only the space occupied by this mass may be more or less.
\begin{equation}
\frac{d}{dt} \int _ {_tB _ { \epsilon }} _t\rho(_t\boldsymbol{x},t) d_t\boldsymbol{x} = 0
\end{equation}
Equivalently this can be formulated with the constant undeformed initial mass in Lagrangian view:
\begin{equation}\label{eq:lagr_mass}
\left(\int _ {_tB _ { \epsilon }} _t\rho(_t\boldsymbol{x},t) d_t\boldsymbol{x} \stackrel{\text{\ref{eq:density_pull_back}}}{=} \right)
\int _ {_0B _ { \epsilon }} {^t_0J}_0\rho(_0\boldsymbol{x},t) d_0\boldsymbol{x}=
\int _ {_0B _ { \epsilon }} _0\rho(_0\boldsymbol{x},0)d_0\boldsymbol{x}.
\end{equation}
In Eulerian view the conservation of mass is more difficult to develop and starts with the Lagrangian view. Since the integrals do account for arbitrary volumes, they are left out in the following:
\begin{equation}\label{eq:euler_density_evol}
\frac{\partial}{\partial t}(_0\rho^t_0J) = \frac{\partial _0\rho}{\partial t}{^t_0J} + \frac{\partial{^t_0J}}{\partial t} {_0\rho} = 0.
\end{equation}
The left side could be immediately pushed forward, the right side is harder:
\begin{equation}\label{eq:evol_jacobian}
\frac{\partial{J}}{\partial t}
= \frac{\partial{J}}{\partial {F_{ij}}} \frac{\partial {F_{ij}}}{\partial t}
\stackrel{\text{\ref{eq:det_deriv},\ref{eq:evol_def_grad}}}{=} {J}{F}^{-1}_{ji}\frac{\partial v_i}{\partial x_k}F_{kj}
= J \delta_{ik}\frac{\partial v_i}{\partial x_k} = J \frac{\partial v_i}{\partial x_i}.
\end{equation}
The determinant differentiation rule can be shown by expressing the determinant with Laplace's expansion and applying the derivative on it:
\begin{equation}\label{eq:det_deriv}
\frac{\partial J}{\partial F_{ij}} = \frac{\partial (F_{ik}\text{adj}(F)_{ki})}{\partial F_{ij}} = \text{adj}(F)_{ji} = JF^{-1}_{ji}.
\end{equation}
The time-evolution of the deformation gradient is:
\begin{equation}\label{eq:evol_def_grad}
\frac{\partial {{^t_0F_{ij}}}}{\partial t} =
\frac{\partial}{\partial t}\frac{\partial {^t_0\phi_i}}{\partial _0x_j}(_0\boldsymbol{x},t) = \frac{\partial {_0v_i}}{\partial _0x_j}(_0\boldsymbol{x},t) =
\frac{\partial {_tv_i}}{\partial _tx_k}(^t_0\phi(_0\boldsymbol{x},t),t)\frac{\partial {_tx_k}}{\partial {_0x_j}}(_0\boldsymbol{x},t).
\end{equation}
Pushing forward \ref{eq:euler_density_evol} with the result of \ref{eq:evol_jacobian} using the material derivative formulation \ref{eq:material_derivative} leads to the Eulerian conservation of mass:
\begin{equation}
\frac{D}{Dt}\rho(_t\boldsymbol{x},t) + \rho(_t\boldsymbol{x},t) \vec{\nabla} \cdot \boldsymbol{v}(_t\boldsymbol{x},t)=0.
\end{equation}
Commonly used is the Nabla operator: $\vec { \nabla } = \left( \frac { \partial } { \partial x _ { 1 } } , \dots , \frac { \partial } { \partial x _ { n } } \right)$.
\cite{MPM:COURSE}
\subsubsection{Conservation of momentum}
Continuum forces are divided up into body and surface forces. A surface force acts upon the surface of the material $\partial _tB_\epsilon$. While a body force scales upon the volume of the material $_tB_\epsilon$. Conservation of momentum may then be expressed in a similar way to the conservation of mass as:
\begin{equation}
\frac { d } { d t } \int _ {_tB _ { \epsilon} } \rho ( _t\boldsymbol{x} , t ) \boldsymbol{v} ( _t\boldsymbol{x} , t ) d _t\boldsymbol{x} = \int _ { \partial _tB _ {\epsilon } } \boldsymbol{\sigma} d_t\boldsymbol{A} (_t\boldsymbol{x} ) + \int _ { _tB_\epsilon } \boldsymbol{f} ^ {\text{body} } d _t\boldsymbol{x}.
\end{equation}
Already assumed is angular momentum conservation $\boldsymbol{\sigma}^T = \boldsymbol{\sigma} $ out of the second part of this section. Beginning with a mix out of Lagrangian and Eulerian view
$$\left(\frac { d } { d t } \int _ {_tB _ { \epsilon} } \rho ( _t\boldsymbol{x} , t ) \boldsymbol{v} ( _t\boldsymbol{x} , t ) d _t\boldsymbol{x} \stackrel{\text{\ref{eq:eulerian_general}}}{=} \right)
\frac { d } { d t } \int _ {_0B _ { \epsilon} } {^t_0J} \rho ( _0\boldsymbol{x} , t ) \boldsymbol{v} ( _0\boldsymbol{x} , t ) d _0\boldsymbol{x}$$
\begin{equation}
\stackrel{\text{\ref{eq:euler_density_evol}}}{=}
\int _ {_0B _ { \epsilon} } {^t_0J} \rho ( _0\boldsymbol{x} , t ) \boldsymbol{a} ( _0\boldsymbol{x} , t ) d _0\boldsymbol{x}
= \int _ { \partial _tB _ {\epsilon } } \boldsymbol{\sigma} d_t\boldsymbol{A} (_t\boldsymbol{x} ) + \int _ { _tB_\epsilon } \boldsymbol{f} ^ {\text{body} } d _t\boldsymbol{x}
\end{equation}
where conservation of mass (equation \ref{eq:euler_density_evol}) was applied.
The Eulerian push-forward of the left side combined with the divergence theorem becomes:
\begin{equation}\label{eq:eul_transport}
\int _ {_tB _ { \epsilon} } \rho ( _t\boldsymbol{x} , t ) \boldsymbol{a} ( _t\boldsymbol{x} , t ) d _t\boldsymbol{x} =
\int _ { \partial _tB _ {\epsilon } } \vec{\nabla} \cdot \boldsymbol{\sigma} d_t\boldsymbol{x} + \int _ { _tB_\epsilon } \boldsymbol{f} ^ {\text{body} } d _t\boldsymbol{x}.
\end{equation}
The acceleration $_t\boldsymbol{a}$ is again defined due to the material derivative \ref{eq:material_derivative}. Thus the Eulerian momentum balance equation becomes:
\begin{equation}
_t\rho \frac{D_t\boldsymbol{v}}{Dt} = \vec{\nabla} \cdot \boldsymbol{\sigma} +\boldsymbol{f} ^ {\text{body} }.
\end{equation}
There is a quantity left to be defined for the Lagrangian view. The Cauchy stress $\boldsymbol{\sigma}$ is defined in the current configuration. Pulling back the Cauchy stress leads to a stress measure named the first Piola Kirchhoff stress
\begin{equation}
\int _ { \partial _tB _ {\epsilon } } _t\boldsymbol{\sigma} d_t\boldsymbol{A} (_t\boldsymbol{x} )
\stackrel{\text{\ref{eq:surface_integral}}}{=}
\int _ { \partial _0B _ {\epsilon } } _0\boldsymbol{\sigma}\boldsymbol{F}^{-T} Jd_0\boldsymbol{A} (_0\boldsymbol{x} )=
\int _ { \partial _0B _ {\epsilon } } \boldsymbol{P}d_0\boldsymbol{A} (_0\boldsymbol{x} ),
\end{equation}
denoted in the literature as:
\begin{equation}\label{eq:piola}
\boldsymbol{P} =
\frac{\partial \Psi}{\partial \boldsymbol{F}}=
\boldsymbol{\sigma}\boldsymbol{F}^{-T}J.
\end{equation}
Summarized, the Lagrangian view of the momentum equation with an initial momentum is ($^0_0J=1$):
\begin{equation}\label{eq:lagr_mom}
_0\rho(_0\boldsymbol{x},0)_0\boldsymbol{a}(_0\boldsymbol{x},t)
= \vec{\nabla} \cdot \boldsymbol{P}(_0\boldsymbol{x},t) + {_0\boldsymbol{f}} ^ {\text{body}}(_0\boldsymbol{x},t) {^t_0J}.
\end{equation}
For the stress strain relationship $\boldsymbol{C}$ in chapter \ref{sec:linear_elasticty} as well as the start of this chapter conservation of angular momentum caused $\sigma_{ij} = \sigma_{ji}$. This can be shown as follows.
The description of angular momentum follows that of linear momentum ($ _t\boldsymbol{x}\times \ref{eq:eul_transport}$) where product rule generally applies $\left(\frac{D_t\boldsymbol{x}}{Dt} \times {_t\boldsymbol{v}} = {_t\boldsymbol{v}} \times {_t\boldsymbol{v}} = 0\right)$ :
\begin{equation}
\int _ {_tB _ { \epsilon} } _t\boldsymbol{x} \times \rho ( _t\boldsymbol{x} , t ) \boldsymbol{a} ( _t\boldsymbol{x} , t ) d _t\boldsymbol{x} = \int _ { \partial _tB _ {\epsilon } }_t\boldsymbol{x} \times \boldsymbol{\sigma}^T d_t\boldsymbol{A} (_t\boldsymbol{x} ) + \int _ { _tB_\epsilon }_t\boldsymbol{x} \times \boldsymbol{f} ^ {\text{body} } d _t\boldsymbol{x}.
\end{equation}
In component form this becomes:
\begin{equation}\label{eq:gen_ang_mom}
\int _ {_tB _ { \epsilon} } \varepsilon_{ijk} \rho x_j a_k d _t\boldsymbol{x} = \int _ { \partial _tB _ {\epsilon } } \varepsilon_{ijk} x_j \sigma_{mk} dA_m(_t\boldsymbol{x} ) + \int _ { _tB_\epsilon } \varepsilon_{ijk} x_j f_k ^ {\text{body} } d _t\boldsymbol{x}.
\end{equation}
The divergence theorem is again applied to the surface forces $\left(\vec{\nabla} = \frac{\partial}{\partial x_m}\right)$:
$$
\int _ { \partial _tB _ {\epsilon } } \varepsilon_{ijk} x_j \sigma_{mk} dA_m (_t\boldsymbol{x} ) =
\int _ {_tB _ {\epsilon } } \varepsilon_{ijk} \frac{\partial(x_j \sigma_{mk})}{\partial x_m} d_t\boldsymbol{x}
$$
\begin{equation}\label{eq:surface_ang_mom}
= \int _ {_tB _ {\epsilon } } \varepsilon_{ijk} \left(\delta_{jm} \sigma_{mk}+ x_j\frac{ \partial \sigma_{mk}}{\partial x_m}\right) d_t\boldsymbol{x}.
\end{equation}
The conservation of momentum (equation \ref{eq:eul_transport}) can then be applied to the result of plugging \ref{eq:surface_ang_mom} back into \ref{eq:gen_ang_mom} leaving only:
\begin{equation}
\int _ {_tB _ {\epsilon } } \varepsilon_{ijk} \sigma_{jk} d_t\boldsymbol{x} = 0.
\end{equation}
Leaving out the integral and multiplying by ${\varepsilon_{irs}}$ enables equation \ref{eq:compact_levi}. The Cauchy stress is constrained to
\begin{equation}
\sigma_{ij} =\sigma_{ji}
\end{equation}
as assumed before. Note that $\boldsymbol{P}$ however is not constrained to be symmetric.
\begin{flushright}\cite{MPM:COURSE}\cite{MIT:CONTINUUM_MECHANICS}\end{flushright}
\subsubsection{Weak formulation}\label{sec:weak}
Before deriving the weak form an explanation of what it achieves is in order. The previous presented governing equations are written in the strong form: a solution to the equation needs to be exact on the whole domain and is as such influenced by the whole domain in general. Such a solution can be found for simplified models analytically. These act as a ground truth to numerical methods like the material point method. Analytical methods of today can't handle complex problems: Numerical solutions try to overcome that hurdle.
The complete mathematical description of the weak formulation is beyond this thesis. Numeric books that discuss finite element methods will provide one (\cite{bathe2006finite}, \cite{dahmen2008numerik}). For simplicity: The weak formulation combined with Galerkin discretization restricts the globality of the strong method due to so called 'test functions': $_0\boldsymbol{q}_h$ . A 'test function' generally only has limited support, i.e. $ _0\boldsymbol{q}_h\neq 0$ on a very small subset of the whole domain $\Omega^0$. This is mostly used to gather information on a local neighborhood and not on the whole domain.
The weak formulation requires an observation of the dot product. Consider the conservation of momentum in the strong form as formulated before \ref{eq:lagr_mom} as:
\begin{equation}\label{eq:strong}
_0\rho(_0\boldsymbol{x},0)_0\boldsymbol{a}(_0\boldsymbol{x},t)
= \vec{\nabla} \cdot \boldsymbol{P}(_0\boldsymbol{x},t) + {_0\boldsymbol{f}} ^ {\text{body}}(_0\boldsymbol{x},t) {^t_0J}.
\end{equation}
Multiply both sides with the dot product of an arbitrary function $_0\boldsymbol{q}(\cdot,t):\Omega^0 \rightarrow \mathbb{R}^d$ and integrate over $\Omega^0$. If a solution solves the balance of \ref{eq:strong} then it also solves:
$$
\int _ { \Omega ^ { 0 } } _0q _ { i } ( _0\boldsymbol { x } , t ) \left(_0\rho (_0\boldsymbol{x} , 0 ) _0a _{ i } (_0\boldsymbol{x} , t ) - {_0\boldsymbol{f}} ^ {\text{body}}(_0\boldsymbol{x},t) {^t_0J}\right) d _0\boldsymbol{x}
$$
\begin{equation}
=\int _ { \Omega ^ { 0 } } _0q _ { i } ( _0\boldsymbol { x } , t ) \frac{\partial P _ {ij}}{\partial x_j} ( _0\boldsymbol{x} , t ) d_0\boldsymbol{x}.
\end{equation}
With the help of partial integration in multiple dimension, the derivative moves over to the test-function. Balancing out the order of derivatives, by moving a derivative to the test function, is the main motive of the weak form:
$$
\int _ { \Omega ^ { 0 } } \frac{\partial (_0q _ { i } ( _0\boldsymbol { x } , t ) P _ {ij} ( _0\boldsymbol{x} , t ))}{\partial _0x_j}
- \frac{\partial _0q _ { i }}{\partial _0x_j} ( _0\boldsymbol { x } , t ) P _ {ij}( _0\boldsymbol{x} , t )
d_0\boldsymbol{x}.
$$
The divergence theorem allows to convert the first term to a boundary integral.
\begin{equation}
\int _ { \partial \Omega ^ { 0 } } _0q _ { i } ( _0\boldsymbol { x } , t ) P _ {ij} ( _0\boldsymbol{x} , t)
d_0\boldsymbol{A}(_0\boldsymbol{x})
-\int _ { \Omega ^ { 0 } } \frac{\partial _0q _ { i }}{\partial _0x_j} ( _0\boldsymbol { x } , t ) P _ {ij}( _0\boldsymbol{x} , t )
d_0\boldsymbol{x}.
\end{equation}
Mathematically the boundary integral serves as a boundary condition which is set by the specific problem (e.g. context of the simulation). Finally putting together the previous results gives the weak form of force balance in the Lagrangian view:
$$
\int _ { \Omega ^ { 0 } } _0q _ { i } ( _0\boldsymbol { x } , t ) \left(_0\rho (_0\boldsymbol{x} , 0 ) _0a _{ i } (_0\boldsymbol{x} , t ) - {_0f_i} ^ {\text{body}}(_0\boldsymbol{x},t) {^t_0J}\right) d _0\boldsymbol{x}
$$
\begin{equation} \label{eq:lagr_force_bal}
= \int _ { \partial \Omega ^ { 0 } } _0q _ { i } ( _0\boldsymbol { x } , t ) P _ {ij} ( _0\boldsymbol{x} , t)
d_0\boldsymbol{A}(_0\boldsymbol{x})
-\int _ { \Omega ^ { 0 } } \frac{\partial _0q _ { i }}{\partial _0x_j} ( _0\boldsymbol { x } , t ) P _ {ij}( _0\boldsymbol{x} , t )
d_0\boldsymbol{x}.
\end{equation}
In the material point method stress computations are more naturally done in the current configuration or equally in terms of the Cauchy stress as seen in chapter \ref{sec:linear_elasticty}. Pushing the equation forward to Eulerian view with the push forward $_t\boldsymbol{q}:\Omega^t \rightarrow \mathbb{R}^d$ of $_0\boldsymbol{q}$ is only a problem for the last term:
$$
\int _ { \Omega ^ { 0 } } \frac{\partial _0q _ { i }}{\partial _0x_j} ( _0\boldsymbol { x } , t ) P _ {ij} d_0\boldsymbol{x}
\stackrel{\text{\ref{eq:piola}, \ref{eq:volume_integral}}}{=}
\int _ { \Omega ^ { 0 } } \left({^t_0}F_{kj}\frac{\partial _tq _ { i }}{\partial _tx_k} ( _t\boldsymbol { x } , t )\right) \left({_t^0}F_{kj} \sigma _ {ik} {_0^tJ}\right) {_t^0J} d_t\boldsymbol{x}
$$
\begin{equation} = \int _ { \Omega ^ { t } } \frac{\partial _tq _ { i }}{\partial _tx_k} ( _t\boldsymbol { x } , t )\sigma _ {ik} d_t\boldsymbol{x}.
\end{equation}
This completes the Eulerian view to be:
$$
\int _ { \Omega ^ { t } } _tq _ { i } ( _t\boldsymbol { x } , t ) \left(_t\rho (_t\boldsymbol{x} , t) _ta _{ i } (_t\boldsymbol{x} , t ) - {_tf_i} ^ {\text{body}}(_t\boldsymbol{x},t)\right) d _t\boldsymbol{x}
$$
\begin{equation}\label{eq:eul_force_bal}
= \int _ { \partial \Omega ^ { t } } _tq _ { i } ( _t\boldsymbol { x } , t ) \sigma _ {ij} d_tA_j(_t\boldsymbol{x}) - \int _ { \Omega ^ { t } } \frac{\partial _tq _ { i }}{\partial _tx_k} ( _t\boldsymbol { x } , t )\sigma _ {ik} d_t\boldsymbol{x}.
\end{equation}
\begin{flushright}\cite{MPM:COURSE}\cite{strang2007computational}\cite{bathe2006finite}\end{flushright}
\subsection{Material point method}\label{sec:mat_point}
The key idea of the material point method is to use (Lagrangian) particles as a consistent storage of material properties. All stress based forces are computed on a Eulerian grid however. This grid does not store any material properties and is therefore often referred to as a scratch pad.
As a corollary there needs to be a way to transfer information from a particle to the neighboring grid cells. This also induces switching from Lagrangian to Eulerian view. After stresses are computed there also needs to be a way to get back the relevant information from the grid to the particles. Advection is typically hard to do in Eulerian/FE-like methods and cause of a lot of problems down the development pipeline. In a Lagrangian view particle advection is trivial.
It is very important that the two transfers as well as the grid solver are in compliance with all governing equations. While the grid solver will be derived from the weak form of the governing equation the transfers also need to be chosen in a way that conserve the properties defined in them.
\cite{MPM:COURSE}
\subsubsection{Interpolation weights}
The choice of interpolation weights is flexible. Nevertheless a kernel $w(\boldsymbol{x})$ requires some important properties to be qualified for MPM:
\begin{enumerate}
\item Partition of unity:
\begin{equation}\label{eq:partition_unity}
\sum _ { i } w \left( { \boldsymbol{x} } - { \boldsymbol{x} } _ { i } ^ { n } \right) & = 1.
\end{equation}
\item Identity relation:
\begin{equation}
\sum _ { i } { \boldsymbol{x} } _ { i } ^ { n } w \left( { \boldsymbol{x} } - {\boldsymbol{ x} } _ { i } ^ { n } \right) & = { \boldsymbol{x} }.
\end{equation}
\item Non-negativity: $w \geq 0$. Negative weights can cause severe instability or non-physical behavior, unlike in FEM where they are used. \cite{gao2017adaptive}
\item Limited local support to reduce the number of discretizations, chapter \ref{sec:discretization}.
\item $C^1$-continuity s.t. $\nabla w$ is continuous.
\end{enumerate}
As a reminder of chapter \ref{sec:notation}: $\boldsymbol{x}_p$ refers to a particle's position. $\boldsymbol{x}_i$ to a grid cell's position.
For a more general discussion refer to \cite{gao2017adaptive}. Often dyadic products of one-dimensional interpolation functions suffice
\begin{equation}
w(\boldsymbol{x}-\boldsymbol{x}_i^n) = w_i^n(\boldsymbol{x}) = w\left(\frac{1}{h}\left(x-x_i^n\right)\right)w\left(\frac{1}{h}\left(y-y_i^n\right)\right)w\left(\frac{1}{h}\left(z-z_i^n\right)\right)
\end{equation}
\begin{equation}
\nabla w_{i}^n(\boldsymbol{x})
=\frac { 1 } { { h } }
\left(
\begin{array} {l}
w^{\prime} (\frac { 1 } { { h } } ( x - x _ { i }^n )) { w } ( \frac { 1 } { { h } } ( { y } - { y } _ { i}^n ) ) { w } ( \frac { 1 } { { h } } ( z - z _ { i }^n ) \\
w(\frac{1} { { h } } ( x - x _ { i }^n )) { w }^{\prime} ( \frac { 1 } { { h } } ( { y } - { y } _ { i}^n ) ) { w } ( \frac { 1 } { { h } } ( z - z _ { i }^n ) \\
w(\frac{1} { { h } } ( x - x _ { i }^n )) { w } ( \frac { 1 } { { h } } ( { y } - { y } _ { i}^n ) ) { w }^{\prime}( \frac { 1 } { { h } } ( z - z _ { i }^n ) \\
\end{array}
\right)
\end{equation}
where $h$ is the grid spacing (of a uniform grid). An interpolation function often employed is a cubic B-spline ($C^2$):
\begin{equation}\label{eq:cubic_weights}
w(x) = \left\{ \begin{array} { l l } { \frac { 1 } { 2 } | x | ^ { 3 } - | x | ^ { 2 } + \frac { 2 } { 3 } } & { 0 \leqslant | x | < 1 } \\ { \frac { 1 } { 6 } ( 2 - | x | ) ^ { 3 } } & { 1 \leqslant | x | < 2 } \\ { 0 } & { 2 \leqslant | x | } \end{array} \right. .
\end{equation}
Since these function are used to weight (or filter) particles a shortening in notation may be employed as $ w_{ip}^n = w(\boldsymbol{x}_p^n-\boldsymbol{x}_i^n)$.
\cite{MPM:APIC}\cite{MPM:COURSE}\cite{steffen2008analysis}\cite{gao2017adaptive}
\subsubsection{Mass transfer}\label{sec:mass}
Each material point will be assigned an initial volume $_0V_p$ as well as an initial mass $_0m_p$. The volume the material point occupies may change in time due to equation \ref{eq:j}: ${^t_0}J _0V_p = {_tV_p}$. But, due to conservation of mass \ref{eq:lagr_mass} it will have a constant, initial mass associated with it. I.e. there will only be a transfer to the grid and no transfer back.
A transfer of mass to the grid may then be expressed as:
\begin{equation}\label{eq:mpm:mass}
m_i = \sum_p w_{ip}m_p.
\end{equation}
$\sum_i m_i =\sum_p m_p$ is a complete prove this fulfills the conservation of mass. Remember there is no transfer back. In doing so no information can be lost on the particles. Proving that the mass transfer to the grid is conserving is enough. The stress based-solver may manipulate this information further. Mass-lumping strategies typically will use this information directly, equation \ref{eq:mass_discr}.
\begin{equation}
\sum_i m_i
\stackrel{\text{\ref{eq:mpm:mass}}}{=}
\sum _i \sum_p w_{ip} m_p =
\sum _p m_p \sum_i w_{ip}
\stackrel{\text{\ref{eq:partition_unity}}}{=}
\sum_p m_p
\end{equation}
\begin{flushright}\cite{MPM:APIC}\cite{MPM:COURSE}\end{flushright}
\subsubsection{APIC-transfers}\label{sec:apic}
The momentum transfer round trip could be defined as:
\begin{enumerate}
\item Particle to grid momentum transfer:
\begin{equation}\label{eq:lin_p2g}
(m\boldsymbol{v})_i^n = \sum_p w_{ip}m_p\boldsymbol{v}^n_p.
\end{equation}
\item Factoring out mass:
\begin{equation}
\boldsymbol{v}_i^n = \frac{(m\boldsymbol{v})_i^n}{m_i^n}.
\end{equation}
\item Coupled with either ($\alpha \in \{0,1\}$) or a combination ($\alpha \in ]0;1[$) of:
\begin{equation}
\boldsymbol{v}_p^{n+1} = \alpha\boldsymbol{v}_{p,{PIC}}^{n+1} + (1-\alpha)\boldsymbol{v}_{p,{FLIP}}^{n+1}
\end{equation}
\begin{equation}\label{eq:PIC}
\boldsymbol{v}_{p,{PIC}}^{n+1} = \sum_i w_{ip}\boldsymbol{v}_i^{n+1}.
\end{equation}
\begin{equation}\label{eq:FLIP}
\boldsymbol{v}_{p,{FLIP}}^{n+1} = \boldsymbol{v}_p^{n} + \sum_i w_{ip}(\boldsymbol{v}_i^{n+1}-\boldsymbol{v}_i^{n+1}).
\end{equation}
\end{enumerate}
While $PIC$-transfers are very stable, they suffer from excessive (energy) dissipation due to double interpolating on the whole quantity \ref{eq:lin_p2g},\ref{eq:PIC}. This causes a heavy loss in angular momentum and velocity modes.
$FLIP$-Transfers avoid dissipation and loss of angular momentum by only updating the velocity with a difference \ref{eq:FLIP}. However, some velocity modes are also not recognized on the grid and may cause unpredictable and unstable behavior in following steps. Therefore, often a combination of both is taken.
$APIC$ builds on top of the very stable $PIC$-transfers and effectively only adds an extra term of the Taylor series to increase accuracy. This extra term $\boldsymbol{C}_p$ may be in short just referred to as the velocity derivative. The local velocity field around a particle may then be characterized by the affine function $\boldsymbol{v}(\boldsymbol{x}) = \boldsymbol{C}_p(\boldsymbol{x}-\boldsymbol{x}_p)$.
Motivated by the theory of angular momentum and moment of inertia one can define a quantity
\begin{equation}
\boldsymbol { D } _ { p } ^ { n } = \sum _ { i } w _ { i p } ^ { n } ( \boldsymbol { x } _ { i } ^ { n } - \boldsymbol { x } _ { p } ^ { n } ) ( \boldsymbol { x } _ { i } ^ { n } - \boldsymbol { x } _ { p } ^ { n } ) ^ { T }
\end{equation}
which is similar to the classically known inertia tensor:
\begin{equation}
\boldsymbol{I} _ { p } = - \sum _ { i } m _ { i } [\boldsymbol{x} _ { i }-\boldsymbol{x}_{p}][\boldsymbol{x} _ { i }-\boldsymbol{x}_{p}]^T
\end{equation}
$$
=
\sum _ { i } m _ { i }((\boldsymbol{x} _ { i }-\boldsymbol{x}_{p})^T(\boldsymbol{x} _ { i }-\boldsymbol{x}_{p})\boldsymbol{I} -(\boldsymbol{x} _ { i }-\boldsymbol{x}_{p})(\boldsymbol{x} _ { i }-\boldsymbol{x}_{p})^T)
$$
Bearing in mind, that $\boldsymbol{D}_p^n$ does not include a mass and is defined for an affine motion instead of an angular motion where $[a]_{\alpha\gamma} = \varepsilon_{\alpha\beta\gamma}a_{\beta}$ is the cross-product matrix and $\boldsymbol{I}$ denotes the identity matrix.
In classical mechanics the angular velocity $\boldsymbol{\omega_p}$ can be then described using the inertia tensor $\boldsymbol{I}_p$ with the help of the angular momentum $\boldsymbol{L}_p$:
\begin{equation}
\boldsymbol{\omega}_p = \boldsymbol{I}_p^{-1} \boldsymbol{L}_p.
\end{equation}
This motivates the velocity derivative to be equally defined by a similar relationship where $\boldsymbol{B}^n_p$ holds momentum information.
\begin{equation}
\boldsymbol{C}^n_p = (\boldsymbol{D}_p^n)^{-1} \boldsymbol{B}^n_p.
\end{equation}
The transfers of the $APIC$ scheme are then summarized:
\begin{enumerate}
\item Particle to grid:
\begin{equation}\label{eq:apic_mom_p2g}
(m\boldsymbol{v})^n_i = \sum_p w^n_{ip}m_p (\boldsymbol { v } _ { p } ^ { n } + \boldsymbol { B } _ { p } ^ { n } ( \boldsymbol { D } _ { p } ^ { n } ) ^ { - 1 } ( \boldsymbol { x } _ { i } ^ { n } - \boldsymbol { x } _ { p } ^ { n } ) )
\end{equation}
\item Factoring out mass:
\begin{equation}\label{eq:mass_divide}
\boldsymbol{v}_i^n = \frac{(m\boldsymbol{v})_i^n}{m_i^n}.
\end{equation}
\item Grid to particle transfer (in a $PIC$ manner), where in contrast the new particle position $\boldsymbol{x}_p$ also needs to be interpolated:
\begin{equation}
\boldsymbol{x}_{p}^{n+1} = \sum_i w_{ip}\boldsymbol{x}_i^{n+1}
\end{equation}
\begin{equation}
\boldsymbol{v}_{p}^{n+1} = \sum_i w_{ip}\boldsymbol{v}_i^{n+1}
\end{equation}
$$
{^-}\Delta \boldsymbol{x} = \boldsymbol { x } _ { i } ^ { n } - \boldsymbol { x } _ { p } ^ { n } + \boldsymbol { x }_ { i } ^ { n + 1 } - \boldsymbol { x } _ { p } ^ { n + 1 },
\quad {^+}\Delta \boldsymbol{x} = \boldsymbol { x } _ { i } ^ { n } - \boldsymbol { x } _ { p } ^ { n } - { \boldsymbol { x } } _ { i } ^ { n + 1 } + \boldsymbol { x } _ { p } ^ { n + 1 }
$$
\begin{equation}\label{eq:apic_mom_g2p}
\boldsymbol { B } _ { p } ^ { n + 1 } = \frac { 1 } { 2 } \sum _ { i } w _ { i p } ^ { n } (\boldsymbol { v }_ { i } ^ { n + 1 } ({^-}\Delta \boldsymbol{x}) ^ { T }) {+} {^+}\Delta \boldsymbol{x} ( { \boldsymbol { v } } _ { i } ^ { n + 1 } ) ^ { T }).
\end{equation}
\end{enumerate}
For a full proof, that these transfers preserve linear and angular momentum, consult \cite{MPM:APIC}. For the choice of dyadic products of cubic b-splines (\ref{eq:cubic_weights}) $\boldsymbol{D}_p$ takes on the simple form:
\begin{equation}
\boldsymbol{D}_p^n = \frac { 1 } { 3 } h ^ { 2 } \boldsymbol { I }.
\end{equation}
A simple proof (Appendices: \ref{app:dp_proof}) cancelling out the numerous polynomials can be done for instance using SymPy (\cite{Sympy}).\cite{MPM:APIC}\cite{MPM:OLD_APIC}
\subsubsection{CFL condition} \label{sec:cfl}
The CFL condition is prominent for FEM: For a stable integration a particle should not travel father than the grid spacing $h$ in a discrete time-step $\Delta t$. $\Delta t$ is thus limited by:
\begin{equation}
\Delta t \leq \frac{h}{\|\boldsymbol{v}^n_i\|_2}.
\end{equation}
Assuming $\|\boldsymbol{x}_i^n -\boldsymbol{x}_p^n\| \leq \kappa h$, where $\kappa$ is determined by the interpolation stencil support (cubic 3D: $\kappa = 2\sqrt{3}$) and additionally assuming $\boldsymbol{D}_p^n = k\boldsymbol{I} \Rightarrow (\boldsymbol{D}_p^n)^{-1} = \frac{1}{k}\boldsymbol{I}$ (cubic 3D: $k=\frac{1}{3}h^2$),
$\|\boldsymbol{v}^n_i\|_2$ can be estimated on the particles. Typically the number of particles is lower. Given Eq. \ref{eq:apic_mom_p2g} this leads to the following estimate:
$$
\|\boldsymbol{v}^n_i\|_2 \leq \frac{1}{m_i^n}\left(\sum _ { p } w _ { i p } ^ { n } m _ { p } \| \boldsymbol { v } _ { p } ^ { n } \|_2 + \sum _ { p } w _ { i p } ^ { n } m _ { p } \| \boldsymbol { B } _ { p } ^ { n } \| _ { F } \| ( \boldsymbol { D } _ { p } ^ { n } ) ^ { - 1 } ( \boldsymbol { x } _ { i } ^ { n } - \boldsymbol { x } _ { p } ^ { n } ) \|_2\right)
$$
\begin{equation}
\leq \max _ { p } ( \| \boldsymbol { v } _ { p } ^ { n } \|_2 + \frac { \kappa } { k } \Delta x \| \boldsymbol { B } _ { p } ^ { n } \| _ { F } )
\end{equation}
\begin{flushright}\cite{MPM:APIC}\end{flushright}
\subsection{Discretization}\label{sec:discretization}
The weak form of the force-balance in (\ref{eq:lagr_force_bal},\ref{eq:eul_force_bal}) implies the following, for MPM preferable, description:
\begin{equation} \label{eq:weak_mpm}
\int _ { \Omega ^ { 0 } } (_0q _ \alpha) (_0\rho_0) (_0a _\alpha) d _0\boldsymbol{x}= \int _ { \partial \Omega ^ { t^n } } _tq _ \alpha \sigma _ {\alpha\beta} d_tA_\beta(_t\boldsymbol{x}) - \int _ { \Omega ^ { t^n} } \frac{\partial _tq _ \alpha}{\partial _tx_\beta} \sigma _ {\alpha\beta} d_t\boldsymbol{x}.
\end{equation}
The boundary integral is mostly due to collisions and will be ignored. \cite{MPM:OPTIMI_INTEGR} discuss level set collisions due to constraint collisions, object penalty collision and penalty self-collisions. Collision treatment would need to be involved in the solving of the equation. A simple though less accurate method is to process particle collisions separately in a typical computer graphics manner which is assumed for now.
\subsubsection{Discretize time}
Any integrator conserving linear and angular momentum could be used to discretize time. The class of time integrators used here are characterized by \begin{equation}\label{eq:midpoint}
\frac{y ^ { n + 1 } - y ^ { n }}{ \Delta t} = f^{n+\lambda} = f \left(t^n+\lambda\Delta t, (1-\lambda) y ^ { n } +\lambda y^{n+1}\right)
\end{equation}
for a differential equation of order one:
\begin{equation}
\frac{\partial y}{\partial t}( t ) = f (t,y(t)) , \quad y (0) = y _ { 0 }.
\end{equation}
A prominent member of this class is the implicit midpoint rule ($\lambda =\frac{1}{2}$). Replacing the Lagrangian acceleration $_0a_\alpha$ in equation \ref{eq:weak_mpm} with the left side of \ref{eq:midpoint} using velocity, taking care of the right side and pushing forward to Eulerian view:
\begin{equation}\label{eq:time_disc}
\frac{1}{\Delta t} \int _{ \Omega^{t^n}}(_tq_\alpha)(_t\rho)({_tv_\alpha^{n+1}}-{_tv_\alpha^{n}}) d_t\boldsymbol{x} =
- \int _ { \Omega ^ { t^n} } \frac{\partial _tq _ \alpha}{\partial _tx_\beta} \sigma _ {\alpha\beta}^{n+\lambda} d_t\boldsymbol{x}.
\end{equation}
\subsubsection{Discretize space}
A Galerkin discretization brings all spatial terms of equation \ref{eq:time_disc} to a finite-dimen\-sion\-al space: $\boldsymbol{q} \rightarrow \boldsymbol{q}_h$. To not further clutter up the notation, the $h$ will be omitted. This will replace $q_\alpha, v^n_\alpha , v_\alpha^{n+1}$ with their finite-dimensional grid-based interpolants:
\begin{equation}
{_t q } _ { \alpha } ^ { { n } } = ({_t q } _ { { i } \alpha } ^ { { n } }) ({_t w } _ { i }) , \quad _tv _ { \alpha } ^ { { n } } = ({_tv _ { { j\alpha } } ^ { { n } }} )( {_t w } _ { { j } }) , \quad {_tv _ { \alpha } ^ { { n } + 1 }} = ({_tv _ { { j } \alpha } ^ { { n } + 1 }}) ({_t w } _ { { j } }).
\end{equation}
Further, chapter \ref{sec:weak} mentions $\boldsymbol{q}$ can be chosen arbitrarily. The Galerkin discretization of a $d$-dimensional space with $m$ grid nodes therefore uses the standard basis functions $e_1,e_2, ... ,e_{d \times m}$. Due to the scalar-product $d\times m$ equations would need to be solved:
\begin{equation}\label{eq:galerkin}
\frac{1}{\Delta t} \int _{ \Omega^{t^n}} ({ _tw } _ {i })(_t\rho)({_t w } _ { j })({_tv_{j\alpha}^{n+1}}-{_tv_{j\alpha}^{n}}) d_t\boldsymbol{x} =
- \int _ { \Omega ^ { t^ n } } \frac{\partial{_t w } _ { {i }}}{\partial_tx_\beta } \sigma _ {{ \alpha } \beta } { d } \boldsymbol { x }.
\end{equation}
A mass matrix can be factored out as:
\begin{equation}
m_{ij}^n = \int _{ \Omega^{t^n}} {_t w } _ {i }(_t\rho){_t w } _ { j } d_t\boldsymbol{x}.
\end{equation}
The Lagrangian pull-back relates this to the initial density in the Lagrangian view and discretizing the integral with the initial time-invariant particle mass $m_p \approx V_p^0\rho(_0x_p,0)$:
\begin{equation}
m_{ij}^n = \int _{ \Omega^{t^0}} ({_tw_i})(_0\rho_0)({_tw} _ { j }) d_0 \boldsymbol{x} \approx \sum _ { { p } } { m } _ { { p } } { w } _ { { i } } (\boldsymbol { x } _ { { p } } ) { w } _ { { j } } ( \boldsymbol { x } _ { { p } } ).
\end{equation}
This matrix is symmetric positive semi-definite (since mass is positive). Numerically this matrix is mostly not used as is due to possibility of it being singular. This is solved commonly due to a mass-lumping strategy. Replacing $m^n_{ii}$ with the $i$-th row sum and clearing all other elements leads to a diagonalization
\begin{equation}\label{eq:mass_discr}
\sum _ { { p } } { m } _ { { p } } { w } _ {ip} w_{jp}
\stackrel{\text{\ref{eq:partition_unity}}}{\approx}
\sum_p m_p w_{ip}
\stackrel{\text{\ref{eq:mpm:mass}}}{=}
m_i^n,
\end{equation}
where partition of unity $\sum_j w_{jp} = 1$ is used. This is exactly the mass transfer as in equation \ref{eq:mpm:mass}, s.t. no further assembling of a mass matrix is needed.
The discretization of the right side of \ref{eq:galerkin} happens with an estimated per particle stress $\boldsymbol{\sigma}_p^{n+\lambda}$:
\begin{equation}\label{eq:stress_discr}
\int _ { \Omega ^ { t^ n } } \frac{\partial{_t w } _ { {i }}}{\partial_tx_\beta } \sigma _ {{ \alpha } \beta } { d } \boldsymbol { x } \approx
\sum _ { p } (\sigma _ { p }^{n+\lambda}) _ { \alpha \beta } \frac{{ \partial w } _ { {ip}}^n}{ \partial x_\beta } V _ { p } ^ { n }.
\end{equation}
Setting in equation \ref{eq:mass_discr} and \ref{eq:stress_discr} into \ref{eq:galerkin} summarizes the space discretization as:
\begin{equation}\label{eq:space_discr}
\frac{1}{\Delta t} (({m^n\boldsymbol{v}^{n+1}})_i-({m^n\boldsymbol{v}^n})_i) =
-\sum _ { p } \boldsymbol{\sigma} _ { p }^{n+\lambda} \nabla w_{ip}^n V _ { p } ^ { n }
= \boldsymbol{f}_i^{n+\lambda}.
\end{equation}
The momentum change of the left side is by construction equal to a (grid node) force.
Given, that the material point method keeps track of the deformation by a deformation gradient, each particle will have one associated with it for the deformation of its local neighborhood $\boldsymbol{F}^n_p$. Based on this one may also gain a volume change measure around the particle as $J^n_p = \text{det}(\boldsymbol{F}^n_p)$. Starting with an initial volume of a particle $V_p^0$ the volume may be tracked in time by:
\begin{equation}\label{eq:volume_discr}
V^n_p \stackrel{\text{\ref{eq:j}}}{\approx} V_p^0J^n_p.
\end{equation}
In equation \ref{eq:piola} an alternate measure for the stress by the first Piola-Kirchhoff stress is given. The results of \ref{eq:space_discr} may therefore equally expressed by it:
$$
\boldsymbol{f}^{n+\lambda}_i \stackrel{\text{\ref{eq:piola},\ref{eq:volume_discr}}}{=}
- \sum_p \frac{1}{J^n_p}\boldsymbol{P}^{n+\lambda}_{p}(\boldsymbol{F}^{n}_{p})^T \nabla w_{ip}V^0_pJ^n_p
$$
\begin{equation}\label{eq:force_disc}
= - \sum_p \boldsymbol{P}^{n+\lambda}_{p}(\boldsymbol{F}^{n}_{p})^T \nabla w_{ip}V^0_p.
\end{equation}
\begin{flushright}\cite{MPM:COURSE} \cite{MPM:APIC} \cite{bathe2006finite}\end{flushright}
\subsubsection{Deformation gradient evolution}
In equation \ref{eq:evol_def_grad} the evolution of the deformation gradient is shown to be:
\begin{equation}
\frac{\partial ^t_0\boldsymbol{F}}{\partial t} = \nabla{_0\boldsymbol{v}}(_0\boldsymbol{x},t).
\end{equation}
Discretizing the Lagrangian deformation gradient in time with equation \ref{eq:midpoint} results in:
\begin{equation}
\frac{\boldsymbol{F}^{n+1}_p + \boldsymbol{F}^{n}_p}{\Delta t} = \nabla_0\boldsymbol{v}^{n+\lambda}(_0\boldsymbol{x}).
\end{equation}
Pushing the right side forward to Eulerian view
\begin{equation}
\frac{\boldsymbol{F}^{n+1}_p + \boldsymbol{F}^{n}_p}{\Delta t} = \nabla_t\boldsymbol{v}^{n+\lambda}(_t\boldsymbol{x})^t_0\boldsymbol{F} = \nabla_t\boldsymbol{v}^{n+\lambda}(_t\boldsymbol{x})\boldsymbol{F}^n_p
\end{equation}
and further applying the Galerkin discretization
\begin{equation}
(_tv^{n+\lambda})_\alpha = (v_{i}^{n+\lambda})_\alpha w_i \Rightarrow \frac{\partial (_tv^{n+\lambda})_\alpha}{\partial x_\beta} =(v_i^{n+\lambda})_\alpha\frac{\partial w_i}{\partial x_\beta}
\end{equation}
leads to the final update rule for the deformation gradient:
\begin{equation}\label{eq:evol_def_grad_disc}
\boldsymbol{F}^{n+1}_p = \left( \boldsymbol{I} + \Delta t\sum_i \boldsymbol{v}_i^{n+\lambda}({\nabla w_{ip}})^T \right)\boldsymbol{F}^{n}_p.
\end{equation}
The discretization of the position will also be necessary to advance the particles and weight them back:
\begin{equation} \frac{\partial _t\boldsymbol{x}}{\partial t} = {_t\boldsymbol{v}} \Rightarrow \frac{\boldsymbol{\hat{x}}_i^{n+1}
- \boldsymbol{x}_i^{n}}{\Delta t} = \boldsymbol{v}_i^{n+\lambda}.
\end{equation}
The grid position $\boldsymbol{\hat{x}}_i^{n+1}$ does not correspond to an actual deformation. The grid never actually gets deformed (unlike in FEM-methods). The discretized evolution of the deformation gradient \ref{eq:evol_def_grad_disc} is directly a function of $\boldsymbol{\hat{x}}$. For the point $\boldsymbol{\hat{x}}_i^{n+1}$ this becomes:
\begin{equation}\label{eq:evol_def_grad_disc_x}
\boldsymbol{\hat{F}}^{n+1}_p(\boldsymbol{\hat{x}}_i^{n+1}) = \left( \boldsymbol{I} + \sum_i (\boldsymbol{\hat{x}}_i^{n+1} - \boldsymbol{x}_i^{n})({\nabla w_{ip}})^T \right)\boldsymbol{F}^{n}_p.
\end{equation}
As part of the class of time integrators in use (\ref{eq:midpoint}) a function of $\boldsymbol{x}_i$ gets evaluated at an in-between point given by:
\begin{equation}\label{eq:midpoint_x}