-
Notifications
You must be signed in to change notification settings - Fork 22
/
Chap_API_Sync.tex
200 lines (150 loc) · 9.96 KB
/
Chap_API_Sync.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Chapter: Synchronization
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Synchronization}
\label{chap:api_sync}
Applications may need to synchronize their operations at various points in
their execution. Depending on a variety of factors (e.g., the programming
model and where the synchronization point lies), the application may choose to
execute the operation using \ac{PMIx} to access the communication capabilities
of the host environment's infrastructure. This is particularly useful in
situations where communication libraries are not yet initialized by the application.
Synchronization operations also offer an opportunity for processes to exchange
data at a known point in their execution. For example, communication libraries within
different processes can synchronize to exchange information on communication endpoints
for subsequent wireup of messaging protocols.
\ac{PMIx} clients can use the \refapi{PMIx_Fence} and \refapi{PMIx_Fence_nb} functions
to synchronize a set of processes. The fence operation can be useful after an application
performs a number of \refapi{PMIx_Put} operations to coordinate with other processes that the
data is available for access. This avoids unsuccessful \refapi{PMIx_Get} calls that might
otherwise be invoked before the cooresponding \refapi{PMIx_Put} call is complete.
In its default form, the fence operation acts as a barrier between the processes and does not exchange data.
Clients can pass the \refattr{PMIX_COLLECT_DATA} attribute to request
that the \refapi{PMIx_Fence} and \refapi{PMIx_Fence_nb} functions exchange all committed
data between all involved servers during the synchronization operation.
This will make local to each process the data put by other processes resulting
in faster resolution of \refapi{PMIx_Get} and \refapi{PMIx_Get_nb} function calls at
the cost of a synchronous data exchange and associated memory footprint expansion.
In many situations
this attribute may have performance benefits as many systems are optimized for transporting
larger amounts of data. In such applications, a 'put/commit/fence/get'
pattern is common for efficiently exchanging key-value pairs.
For applications where only a small subset of clients access another small subset's key-value pairs
this attribute may not be beneficial. As such, applications are not required to use
\refapi{PMIx_Fence} or \refapi{PMIx_Fence_nb} functions nor the associated data collection
attribute to ensure correctness of \ac{PMIx} get/put functionality.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{\code{PMIx_Fence}}
\declareapi{PMIx_Fence}
%%%%
\summary
Execute a blocking barrier across the processes identified in the specified array, collecting information posted via \refapi{PMIx_Put} as directed.
%%%%
\format
\copySignature{PMIx_Fence}{1.0}{
pmix_status_t \\
PMIx_Fence(const pmix_proc_t procs[], size_t nprocs, \\
\hspace*{11\sigspace}const pmix_info_t info[], size_t ninfo);
}
\begin{arglist}
\argin{procs}{Array of \refstruct{pmix_proc_t} structures (array of handles)}
\argin{nprocs}{Number of elements in the \refarg{procs} array (integer)}
\argin{info}{Array of info structures (array of handles)}
\argin{ninfo}{Number of elements in the \refarg{info} array (integer)}
\end{arglist}
\returnsimple
\reqattrstart
The following attributes are required to be supported by all \ac{PMIx} libraries:
\pasteAttributeItem{PMIX_COLLECT_DATA}
\pasteAttributeItem{PMIX_COLLECT_GENERATED_JOB_INFO}
\reqattrend
\optattrstart
The following attributes are optional for \ac{PMIx} implementations:
\pasteAttributeItem{PMIX_ALL_CLONES_PARTICIPATE}
The following attributes are optional for host environments:
\pasteAttributeItem{PMIX_TIMEOUT}
\optattrend
%%%%
\descr
Passing a \code{NULL} pointer as the \refarg{procs} parameter indicates that the fence is to span all processes in the client's namespace.
Each provided \refstruct{pmix_proc_t} struct can pass \refconst{PMIX_RANK_WILDCARD} to indicate that all processes in the given namespace are participating.
The ordering of the entries in the \refarg{procs} has no significance. However, all processes engaged in a given
\refapi{PMIx_Fence}
operation must use the same method to identify processes. Callers which describe
the target set of processes using PMIX_RANK_WILDCARD will not be matched with
callers which list the individual processes of a namespace explicitly.
The \refarg{info} array is used to pass user directives regarding the behavior of the fence operation. Note that for scalability reasons, the default behavior for \refapi{PMIx_Fence} is to not collect data posted by the operation's participants.
\adviceimplstart
\refapi{PMIx_Fence} and its non-blocking form are both \emph{collective} operations. Accordingly, the \ac{PMIx} server library is required to aggregate participation by local clients, passing the request to the host environment once all local participants have executed the \ac{API}.
\adviceimplend
\advicermstart
The host will receive a single call for each collective operation. It is the responsibility of the host to identify the nodes containing participating processes, execute the collective across all participating nodes, and notify the local \ac{PMIx} server library upon completion of the global collective.
\advicermend
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{\code{PMIx_Fence_nb}}
\declareapi{PMIx_Fence_nb}
%%%%
\summary
Execute a nonblocking \refapi{PMIx_Fence} across the processes identified in the specified array of processes, collecting information posted via \refapi{PMIx_Put} as directed.
%%%%
\format
\copySignature{PMIx_Fence_nb}{1.0}{
pmix_status_t \\
PMIx_Fence_nb(const pmix_proc_t procs[], size_t nprocs, \\
\hspace*{14\sigspace}const pmix_info_t info[], size_t ninfo, \\
\hspace*{14\sigspace}pmix_op_cbfunc_t cbfunc, void *cbdata);
}
\begin{arglist}
\argin{procs}{Array of \refstruct{pmix_proc_t} structures (array of handles)}
\argin{nprocs}{Number of elements in the \refarg{procs} array (integer)}
\argin{info}{Array of info structures (array of handles)}
\argin{ninfo}{Number of elements in the \refarg{info} array (integer)}
\argin{cbfunc}{Callback function (function reference)}
\argin{cbdata}{Data to be passed to the callback function (memory reference)}
\end{arglist}
\returnsimplenb
\returnstart
\begin{itemize}
\item \refconst{PMIX_OPERATION_SUCCEEDED}, indicating that the request was immediately processed and returned \textit{success} - the \refarg{cbfunc} will \textit{not} be called. This can occur if the collective involved only processes on the local node.
\end{itemize}
\returnend
\reqattrstart
The following attributes are required to be supported by all \ac{PMIx} libraries:
\pasteAttributeItem{PMIX_COLLECT_DATA}
\pasteAttributeItem{PMIX_COLLECT_GENERATED_JOB_INFO}
\reqattrend
\optattrstart
The following attributes are optional for \ac{PMIx} implementations:
\pasteAttributeItem{PMIX_ALL_CLONES_PARTICIPATE}
The following attributes are optional for host environments that support this operation:
\pasteAttributeItem{PMIX_TIMEOUT}
\optattrend
%%%%
\descr
Nonblocking version of the \refapi{PMIx_Fence} routine. See the \refapi{PMIx_Fence} description for further details.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Fence-related attributes}
The following attributes are defined specifically to support the fence operation:
%
\declareAttribute{PMIX_COLLECT_DATA}{"pmix.collect"}{bool}{
Collect all data posted by the participants using \refapi{PMIx_Put} that
has been committed via \refapi{PMIx_Commit}, making the collection locally
available to each participant at the end of the operation. By default, this will include all job-level information that was locally generated by \ac{PMIx} servers unless excluded using the \refattr{PMIX_COLLECT_GENERATED_JOB_INFO} attribute.
}
\declareAttributeProvisional{PMIX_LOCAL_COLLECTIVE_STATUS}{"pmix.loc.col.st"}{pmix_status_t}{
Status code for local collective operation being reported to the host by the server library. PMIx servers may aggregate the participation by local client processes in a collective operation - e.g., instead of passing individual client calls to \refapi{PMIx_Fence} up to the host environment, the server may pass only a single call to the host when all local participants have executed their \refapi{PMIx_Fence} call, thereby reducing the burden placed on the host. However, in cases where the operation locally fails (e.g., if a participating client abnormally terminates prior to calling the operation), the server upcall functions to the host do not include a \refstruct{pmix_status_t} by which the PMIx server can alert the host to that failure. This attribute resolves that problem by allowing the server to pass the status information regarding the local collective operation.
}
\advicermstart
The PMIx server is allowed to pass \refconst{PMIX_SUCCESS} using this attribute, but is not required to do so. PMIx implementations may choose to only report errors in this manner. The lack of an included status shall therefore be taken to indicate that the collective operation locally succeeded.
\advicermend
%
\declareAttribute{PMIX_COLLECT_GENERATED_JOB_INFO}{"pmix.collect.gen"}{bool}{
Collect all job-level information (i.e., reserved keys) that was locally generated by \ac{PMIx} servers. Some job-level information (e.g., distance between processes and fabric devices) is best determined on a distributed basis as it primarily pertains to local processes. Should remote processes need to access the information, it can either be obtained collectively using the \refapi{PMIx_Fence} operation with this directive, or can be retrieved one peer at a time using \refapi{PMIx_Get} without first having performed the job-wide collection.
}
%
\declareAttribute{PMIX_ALL_CLONES_PARTICIPATE}{"pmix.clone.part"}{bool}{
All \refterm{clones} of the calling process must participate in the collective operation.
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%