forked from elemental/Elemental
-
Notifications
You must be signed in to change notification settings - Fork 0
/
TODO
125 lines (112 loc) · 5.14 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
Each of the following categories lists goals in an order which roughly
corresponds to the order in which they are hoped to be added.
Items are marked using the following code:
[x] ~ planned to be finished before the next release
[o] ~ hopefully started in the near future
[-] ~ marked for eventual development
Functionality priorities
========================
Fundamental functionality additions
-----------------------------------
[o] 2D sparse matrix distributions
[o] Matrix type tags for, for example, merging {Gemm,Hemm,Trmm,etc.} into "*"
[o] Estimate for spectral radius
[o] Low-rank modifications of QR
[o] Banded Cholesky factorization
[o] QR with full pivoting (Businger-Golub plus row-sorting or row-pivoting)
[o] Finishing prototype generalized Spectral Divide and Conquer
[-] Windowed QR with column pivoting
[-] Power-method-like p-norm estimation
[-] QL factorization and ql::SolveAfter
[-] Strong RRQR and RRLQ
[-] CUR decompositions (already have (pseudo-)skeleton)
[-] Complete Orthogonal Decompositions (especially URV)
[-] LU and LDL with rook pivoting
[-] (Blocked) Aasen's
[-] TSQR for non-powers-of-two
[-] TSLU (via tournament pivoting)
[-] Successive Band Reduction
[-] Native nonsymmetric (generalized) eigensolver via QR (QZ) algorithm
[-] Generalized Sylvester equations
Incremental functionality improvements
--------------------------------------
[o] General redistribution routine between any two matrices with any
process grids (with equivalent viewing communicators)
[o] Add Bunch-Kaufman C now that explicit permutations are used
[o] Extend operator() submatrix interfaces to support equivalent of ":", e.g.,
an enum named "ALL"
[o] Extend Grid class to support mappings from, e.g., (MDRank,root) -> VCRank
and use these mappings to build an (owner,root) -> VCRank mapping for
[Block]DistMatrix
[o] Rescaled multi-shift Hessenberg solves
[o] Blocked algorithms for low-rank Cholesky updates
[o] Relative interval subset computation for HermitianEig (i.e., in [-1,1])
[o] Sequential blocked reduction to tridiagonal form
[o] Quadratic-time Haar generation via random Householder reflectors
[-] 'Control' equivalents to 'Attach' for DistMatrix, and ability to forfeit
buffers in (Dist)Matrix
[-] Axpy interface implementation using one-sided communication
[-] Square process grid specializations of LDL and Bunch-Kaufman
[-] Businger-esque element-growth monitoring in GEPP and Bunch-Kaufman
[-] More Sign algorithms (switch to Newton-Schulz near convergence)
[-] Distribute between different grids for any distribution
[-] Way for DistMatrix with single process to view Matrix, and operator=
[-] Ostrowski matrices
[-] Various approaches (e.g., HJS) for parallel tridiagonalization
[-] Wrappers for more LAPACK eigensolvers
[-] Sequential versions of Trr2k
[-] More explicit expansions of packed Householder reflectors
[-] More Trtrmm/Trtrsm routines
[-] Compressed pseudoinverse solves which avoid unnecessary backtransformations
[-] Additional CIRC distributions, e.g., (MC,CIRC)
Performance priorities
======================
[o] Accelerator support for local Gemm calls
[o] Support for BLIS and fused Trmv's to accelerate HermitianEig
[-] Optimized version of ApplySymmetricPivots
[-] Exploit structure in matrix sign based control solvers
Maintenance priorities
======================
Bug avoidance
-------------
Instrumentation/visualization/testing
-------------------------------------
[-] Global command-line options which are automatic for every driver, e.g.,
"--colMajor <true/false>" for column-major process grids and
"--nb <blocksize>" for the algorithmic blocksize
[-] Means of easily tracking/plotting heap memory usage over time
[-] Provide way to zoom in/out and add colorbar to DisplayWidget
[-] Better organization of test matrices into relevant classes, e.g., Hermitian,
normal, triangular, Hessenberg, etc., so that each test driver can easily
test each member from that class.
Consistency/modularity
----------------------
[-] Modify Grid to return communicators based upon the distribution, e.g.,
Comm(VC)?
[-] Extract BLAS/LAPACK/MPI wrappers into a separate project
[-] Make transpose-options of LocalTrr(2)k more consistent with Trr(2)k
[-] Consistent implementation of unblocked routines
[-] Safe down-casting of integers in BLAS/LAPACK calls
Documentation
-------------
[o] Finish adding per-directory README's (e.g., cmake/toolchains/)
Licensing
---------
External Interfaces
-------------------
[o] Build a Julia interface on top of the C interface
Build system
------------
[o] Support PMRRR when pthreads are not available (e.g., Windows)
[o] Support for OpenBLAS [-D MATH_LIBS="-lopenblas;-lpthread;-lm;-lgfortran"]
[o] Support for BLIS
[o] Support for automatically downloading and building netlib BLAS and LAPACK
[o] Speed up build with C++11's extern templates
MPI and Threading
-----------------
[-] Implement message-splitting in collectives for count > 2^31
[-] Use MPI contiguous datatype for all messages with count > 2^31
(may not work with older MPIs)
[-] Detect oversubsription using sysconf/sysctl and {OMP,MKL,*}_NUM_THREADS
[-] Add MPI wrappers for all nonblocking collectives
[-] Add MPI wrappers for RMA