Skip to content

Semantics

Philip Davis edited this page Jul 14, 2017 · 13 revisions

Terminology

Server: The dataspaces_server runtime. The server provides data storage, retrieval, indexing, and locking services.

Client: A process using the DataSpaces API to store and/or retrieve data from the server.

Application: A process group of dataspaces clients. The user should assign a unique, non-zero application ID for each application that will connect to the dataspaces server. In an HPC context, applications will typically be independently launched in a job script.

Shared space: An N-dimensional array, which is the primary data abstraction underlying Dataspaces. A tuple of shared space coordinates and variable name provides a key on a single element stored in dataspaces. The client interacts with the shared space by specifying N-dimensional bounding boxes (subsets of the shared space) to write to/read from. NB: Like Fortran (and unlike C) DataSpaces uses column-major ordering when translating between N-dimensional arrays and data buffers. In other words, the ordering of the dimensions in a dataspaces array goes from highest-locatility to lowest locality when read from (or written to) a buffer. So, the buffer [a b c d e f g h i j k l] would be...{tough to word, visual would be best here...}

API Semantics

dspaces_init(num_peers, appid, comm, parameters)

Initializes the Dataspaces library, connects to a Dataspaces server, and creates all necessary connections state between the clients and the server. Connecting to the Dataspaces server requires that the special rendezvous file, named “conf” in the working directory (not to be confused with “dataspaces.conf”) The conf file is generated by dataspaces_server executable when it starts. This means that dataspaces_server and all clients must have the same working directory in order for the clients to find the server. Should be called once, before any other dspaces operations.

num_peers: the number of processes in the application. In an MPI context, this should match the communicator size.

appid: A non-zero integer chosen by the user to uniquely identify the application. All members of the same application should have the same application ID.

comm: An MPI communicator for the process group. If this is non-NULL, the Dataspaces library will use it to simplify collective operations. Some transports {which?} are able to proceed without a communicator.

parameters: unused.

dspaces_finalize()

Finalizes the dataspaces library on a client. The clients deregister from the servers, all state is torn-down and memory freed. Pending messages are handled. All ranks in a client must call this. NB: existing locks are not released, so the application should do so before calling finalize.

dspaces_lock_on_read(lock_name, comm)

Acquire a read lock. This is a collective operation and will block all calling processes until the lock is granted. The semantics of locking operations will depend on the lock type in use, as specified in dataspaces.conf. See more about locking later in this document.

lock_name: Char string that acts as a global handle for the lock.

comm: An MPI communicator for the process group. If this is non-NULL, the Dataspaces library will use it to simplify collective operations. Some transports {which?} are able to proceed without a communicator.

dspaces_unlock_on_read(lock_name, comm)

Release a read lock. This is a collective operation and is non-blocking. The semantics of locking operations will depend on the lock type in use, as specified in dataspaces.conf. See more about locking later in this document.

lock_name: Char string that acts as a global handle for the lock.

comm: An MPI communicator for the process group. If this is non-NULL, the Dataspaces library will use it to simplify collective operations. Some transports {which?} are able to proceed without a communicator.

dspaces_lock_on_write(lock_name, comm)

Acquire a write lock. This is a collective operation and will block all calling processes until the lock is granted. The semantics of locking operations will depend on the lock type in use, as specified in dataspaces.conf. See more about locking later in this document.

lock_name: Char string that acts as a global handle for the lock.

comm: An MPI communicator for the process group. If this is non-NULL, the Dataspaces library will use it to simplify collective operations. Some transports {which?} are able to proceed without a communicator.

dspaces_unlock_on_write(lock_name, comm)

Release a write lock. This is a collective operation and is non-blocking. The semantics of locking operations will depend on the lock type in use, as specified in dataspaces.conf. See more about locking later in this document.

lock_name: Char string that acts as a global handle for the lock.

comm: An MPI communicator for the process group. If this is non-NULL, the Dataspaces library will use it to simplify collective operations. Some transports {which?} are able to proceed without a communicator.

dspaces_put(var_name, version, element_size, ndim, lower_bound, upper_bound, data)

Write data into dataspaces server. The data is being written into a bounding box (see note above about data arrangement.) This operation is performed independently of other ranks, i.e. the data buffer should contain data for the entire specified bounding box. This operation is non-blocking, and will return before the data is on the server.

var_name: The name of the variable object being written.

version: the version of the data being written. The dataspaces server keeps multiple versions of an object. A put will overwrite existing data on the following conditions: new_version % max_versions == old_version % max_version, and there is an overlap between the new bounding box and the old bounding box. {NTS: this is confusing, see issue #24}

element_size: the size in bytes of each element of the object being written.

ndim: the dimensionality of the bounding box being written.

lower_bound: coordinates for the lower corner of the local bounding box. This is an array of ndim integers.

upper_bound: coordinates for the upper corner of the local bounding box. This is an array of ndim integers.

data: the data buffer. The number of bytes that are read from this buffer is the product of the volume of the bounding box being requested and element_size.

Locking Semantics

dataspaces.conf

Usage patterns