-
Notifications
You must be signed in to change notification settings - Fork 126
/
HISTORY
241 lines (166 loc) · 6.35 KB
/
HISTORY
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
####################################################
High Performance Conjugate Gradient Benchmark (HPCG)
####################################################
:Author: Jack Dongarra and Michael Heroux and Piotr Luszczek
:Revision: 3.1
:Date: March 28, 2019
===============
History of HPCG
===============
-----------
Version 3.1
-----------
* Switched the output format for reporting the results from YAML to a basic
line-oriented key-value format with nested naming scheme for the keys.
* Added faster search for optimal 3D grid partitioning of a given integer that
does not require combinatorial search through the all 3-set partitioning of
the prime factors.
* Closed the outstanding bugs reported as issues on HPCG's Github project page
and incorporated the fixes in the source code.
-----------
Version 3.0
-----------
* Added problem generation as a timed portion of the benchmark. This time is now
added to any time spent optimizing data structures and counted as overhead when
computing the official GFLOP/s rating. The total overhead time is divided by 500
to amortize its cost over 500 iterations.
* Added memory usage counting and reporting.
* Added memory bandwidth measurement and reporting.
* Added a "Quick Path" option to make obtaining results on production systems easier.
With this option, obtaining a rating will take only a few minutes. This option also
makes profiling and debugging easier. The Quick Path option is invoked by setting the
run time to zero, either in hpcg.dat or by using the --rt=0 option.
* Added a command line option (--rt=) to specify the run time.
* Made a few small changes to support easier builds on MS Windows.
* Changed the way the residual variance is computed to make sure it is zero if all
residual values are identical.
* Changed the order of array allocation in the reference code in order to improve
performance.
* Set the minimum iteration count for the optimized run to be the same as what is
used in the reference run.
-----------
Version 2.4
-----------
* Fixed (again) the FLOP count for DDOT and WAXPBY. We forgot that the
preamble is called many times.
-----------
Version 2.3
-----------
* Fixed the FLOP count for DDOT and WAXPBY. The operations in the preamble were
not being accounted for.
-----------
Version 2.2
-----------
* Reduced the penalty for optimization overhead by a factor of 10 (amortized over 10
sets of 50 reference iterations.
* Fixed a bug that did not account for increased iterations that can occur with
multicoloring reorderings.
* Fixed numerous other small bugs reported by HPCG users.
-----------
Version 2.1
-----------
* Fixed a small but important bug in ComputeProlongation_ref.cpp (- sign should be + sign).
-----------
Version 2.0
-----------
* Added support for a synthetic multigrid V cycle. Parameters include
the number of levels in the grid hierarchy, number of pre and
post smoother steps.
* Refactored data classes to support needs for recursion in V cycle.
* Made simple modifications to make compilation on MS Windows easier.
This includes changing the format of output files to remove colons.
-----------
Version 1.1
-----------
* Added a simple code for users remove in order to indicate whether
optimization was done for dot-product, SPMV, SYMGS, or WAXPBY.
* Fixed a problem with computing the variance of results from multiple runs.
-----------
Version 1.0
-----------
* Changed the diagonal entry from 27 to 26 to influence convergence rate.
* Changed license file (COPYRIGHT) from 4-clause BSD (original BSD) to 3-clause
BSD (modified BSD)
* Added a line to the input file (hpcg.dat) that allows to specify time to run
the benchmark.
-----------
Version 0.5
-----------
* Improved the formula used for scaling of departure of symmetry
-----------
Version 0.4
-----------
* Fixed bugs in integer computations where intermediate 32-bit
integer computations resulted in values that exceeded the
32-bit range and gave incorrect results.
* Fixed the computation of the number of CG set runs to take
into account varying timing results across MPI processes.
-----------
Version 0.3
-----------
* Given out to friendly testers.
* Includes Doxygen output.
* Numerous small changes.
* Substantially improved output.
* Tested on large systems.
-----------
Version 0.2
-----------
* Given out to "friends".
* Numerous small changes.
-----------
Version 0.1
-----------
* Added local symmetric Gauss-Seidel preconditioning.
* Changed global geometry to be true 3D. Previously was a beam (subdomains
were stacked only in the z dimension).
* Introduced three global/local index modes: 32/32, 64/32, 64/64 to handle all
problem sizes.
* Changed execution strategy to perform multiple runs with just a few
iterations per run.
* Added infrastructure and rules for user adaptation of kernels for performance
optimization.
* Added benchmark modification and reporting rules.
* Changed directory and file layout to mimic HPL layouts where appropriate.
================
History of HPCCG
================
--------------------------
NAME CHANGE: HPCCG to HPCG
--------------------------
* The name was changed from HPCCG to HPCG without any code changes.
-----------
Version 1.0
-----------
* Released as part of Mantevo Suite 1.0, December 2012.
-----------
Version 0.5
-----------
* Added timing for Allreduce calls in MPI mode, printing min/max/avg times.
* Set the solver tolerance to zero to make all solves take ``max_iter``
iterations.
* Changed accumulator to a local variable for ``ddot``. It seems to help
dual-core performance.
-----------
Version 0.4
-----------
* Made total_nnz a "long long" so that MFLOP numbers were valid
when the nonzero count is more than 2^31.
-----------
Version 0.3
-----------
* Fixed a performance bug in ``make_local_matrix.cpp`` that was very noticeable
when the fraction of off-processor grid points was large.
-----------
Version 0.2
-----------
* Fixed bugs related to turning MPI compilation off.
* Added more text to README to improve understanding.
* Added new ``Makfile.x86linux.gcc`` for non-opteron systems.
* Made ``MPI_Wtime`` the default timer when in MPI mode.
-----------
Version 0.1
-----------
HPCCG (Original version) was written as a teaching code for illustrating the
anatomy of a distributed memory parallel sparse iterative solver for new
research students and junior staff members. March 2000.