Load openpmd datasets to scipp
DataArrays
.
scipp
is an alternative to xarray
and provides basically numpy arrays with axes description and units.
- Automatically load axes and units with openPMD data.
- Axes information is automatically updated when slicing, indexing, or filtering your data.
- With
scipp
's plotting libraryplopp
it becomes an alternative toopenpmd-viewer
. - Many numpy and some scipy functions including all the basic algebraic operations on arrays are supported by
scipp
. When using these, the units and coordinates are automatically taken care of.
scipp
currently handles units with a library, that does not support non-integer exponents for units. This can become problematic in some calculations.
It can be easily installed with pip.
git clone https://github.com/pordyna/openpmd_scipp.git
cd openpmd-scipp
pip install .
Get example data sets from the openPMD-example-datasets
repository.
git clone https://github.com/openPMD/openPMD-example-datasets.git
cd openPMD-example-datasets
tar -zxvf example-2d.tar.gz
tar -zxvf example-3d.tar.gz
import openpmd_scipp as pmdsc
import scipp as sc
path = 'openPMD-example-datasets/example-3d/hdf5/data%T.h5'
data_loader = pmdsc.DataLoader(path)
print(data_loader.iterations)
<scipp.Dataset>
Dimensions: Sizes[t:5, ]
Coordinates:
* t float64 [s] (t) [3.28471e-14, 6.56942e-14, ..., 1.31388e-13, 1.64236e-13]
Data:
iteration_id int64 [dimensionless] (t) [100, 200, ..., 400, 500]
Let us plot electric field's x component at 65 fs.
Ex = data_loader.get_field('E', 'x', time=65 * sc.Unit('fs'))
print(Ex)
Series does not contain iteration at the exact time. Using closest iteration instead.
<scipp.DataArray>
Dimensions: Sizes[x:26, y:26, z:201, ]
Coordinates:
* t float64 [s] () 6.56942e-14
* x float64 [m] (x) [-9.6e-06, -8.8e-06, ..., 9.6e-06, 1.04e-05]
* y float64 [m] (y) [-1e-05, -9.2e-06, ..., 9.2e-06, 1e-05]
* z float64 [m] (z) [4.7e-06, 4.8e-06, ..., 2.46e-05, 2.47e-05]
Data:
float64 [V/m] (x, y, z) [-1.08652e+08, -1.9758e+08, ..., 0, 0]
You may have noticed, that the time requested does not have to match exactly any iteration. By default, if there is an iteration within 10 fs distance it will be used instead. This 10 fs tolerance can be adjusted by setting time_tolerance
. The check can be also disabled by setting time_tolerance=None
, with that the method will return the closest iteration regardless of the difference. So that this will also work:
print(data_loader.get_field('E', 'x', time=20 * sc.Unit('fs'), time_tolerance=20 * sc.Unit('fs')))
Series does not contain iteration at the exact time. Using closest iteration instead.
<scipp.DataArray>
Dimensions: Sizes[x:26, y:26, z:201, ]
Coordinates:
* t float64 [s] () 3.28471e-14
* x float64 [m] (x) [-9.6e-06, -8.8e-06, ..., 9.6e-06, 1.04e-05]
* y float64 [m] (y) [-1e-05, -9.2e-06, ..., 9.2e-06, 1e-05]
* z float64 [m] (z) [-5.2e-06, -5.1e-06, ..., 1.47e-05, 1.48e-05]
Data:
float64 [V/m] (x, y, z) [-1.08549e+07, -1.3967e+07, ..., 0, 0]
, but data_loader.get_field('E', 'x', time=20 * sc.Unit('fs'))
not.
# It is also possible to use iteration number instead:
print(data_loader.get_field('E', 'x', iteration=200))
<scipp.DataArray>
Dimensions: Sizes[x:26, y:26, z:201, ]
Coordinates:
* t float64 [s] () 6.56942e-14
* x float64 [m] (x) [-9.6e-06, -8.8e-06, ..., 9.6e-06, 1.04e-05]
* y float64 [m] (y) [-1e-05, -9.2e-06, ..., 9.2e-06, 1e-05]
* z float64 [m] (z) [4.7e-06, 4.8e-06, ..., 2.46e-05, 2.47e-05]
Data:
float64 [V/m] (x, y, z) [-1.08652e+08, -1.9758e+08, ..., 0, 0]
# For scalar fields just omit the second argument:
print(data_loader.get_field('rho', iteration=200))
<scipp.DataArray>
Dimensions: Sizes[x:26, y:26, z:201, ]
Coordinates:
* t float64 [s] () 6.56942e-14
* x float64 [m] (x) [-1e-05, -9.2e-06, ..., 9.2e-06, 1e-05]
* y float64 [m] (y) [-1e-05, -9.2e-06, ..., 9.2e-06, 1e-05]
* z float64 [m] (z) [4.7e-06, 4.8e-06, ..., 2.46e-05, 2.47e-05]
Data:
float64 [mC/L] (x, y, z) [-7169.01, -7526.4, ..., 3.16049e-11, 1.22782e-11]
W can't directly plot 3D data.
But we can for example select a slice. For that we can use a helper function pmdsc.closest
to get the closets index, since scipp
requires exact match. You can read more about indexing scipp
arrays in scipp
's documentation.
slicing_idx = pmdsc.closest(Ex, 'x', 2 * sc.Unit('um'))
Ex_slice = Ex['x', slicing_idx]
print(Ex_slice)
<scipp.DataArray>
Dimensions: Sizes[y:26, z:201, ]
Coordinates:
* t float64 [s] () 6.56942e-14
x float64 [m] () 2.4e-06
* y float64 [m] (y) [-1e-05, -9.2e-06, ..., 9.2e-06, 1e-05]
* z float64 [m] (z) [4.7e-06, 4.8e-06, ..., 2.46e-05, 2.47e-05]
Data:
float64 [V/m] (y, z) [4.86614e+08, 6.67018e+08, ..., 0, 0]
Ex_slice.plot()
# We can also plot line plots:
Ex_line = Ex_slice['z', pmdsc.closest(Ex_slice, 'z', 1.4e-5 * sc.Unit('m'))]
print(Ex_line)
<scipp.DataArray>
Dimensions: Sizes[y:26, ]
Coordinates:
* t float64 [s] () 6.56942e-14
x float64 [m] () 2.4e-06
* y float64 [m] (y) [-1e-05, -9.2e-06, ..., 9.2e-06, 1e-05]
z float64 [m] () 1.4e-05
Data:
float64 [V/m] (y) [3.46069e+07, -2.94134e+07, ..., 8.89353e+06, -4.32182e+07]
Ex_line.plot()
Alternatively it is possible to work interactively with plopp
's tools for visualizing multidimensional data, such as pp.slicer
or pp.inspector
.
Just as an example we can easily plot the square of the field:
(Ex_line * Ex_line).plot()
In the above example the whole 3D field is loaded into memory and sliced afterward. It is also possible to just load a sub-chunk into memory. When the relay
option in get_field
is set to True
it will return a dummy object that only allocates memory for a single value. This relay object can be indexed, sliced etc. using the scipp
indexing just like before. (The only limitation given by the openpmd-api
is that the result has to a be contiguous chunk of the original array). The load_data
method loads data and returns a proper scipp
data array.
# The full 3D array is not loaded into memory at this point.
Ex = data_loader.get_field('E', 'x', time=65 * sc.Unit('fs'), relay=True)
# This time we will select a range rather than a slice. For a range there is no need for an exact match.
# But, we could also select a slice just like in the previous example.
Ex = Ex['x', -2e-6 * sc.Unit('m'):2e-6 * sc.Unit('m')]
# Only now the smaller subset wil be loaded into memory
Ex = Ex.load_data()
print(Ex)
Series does not contain iteration at the exact time. Using closest iteration instead.
<scipp.DataArray>
Dimensions: Sizes[x:5, y:26, z:201, ]
Coordinates:
* t float64 [s] () 6.56942e-14
* x float64 [m] (x) [-1.6e-06, -8e-07, ..., 8e-07, 1.6e-06]
* y float64 [m] (y) [-1e-05, -9.2e-06, ..., 9.2e-06, 1e-05]
* z float64 [m] (z) [4.7e-06, 4.8e-06, ..., 2.46e-05, 2.47e-05]
Data:
float64 [V/m] (x, y, z) [-3.65733e+08, -5.01237e+08, ..., 0, 0]
It is also possible to combine arrays from different iterations into one using scipp
's concat
function. Here is an example for creating a 4D array from all iterations:
Ex = sc.concat([data_loader.get_field('E', 'x', iteration=iteration.value, time_tolerance=None)
for iteration in data_loader.iterations['iteration_id']], dim='t')
print(Ex)
<scipp.DataArray>
Dimensions: Sizes[t:5, x:26, y:26, z:201, ]
Coordinates:
* t float64 [s] (t) [3.28471e-14, 6.56942e-14, ..., 1.31388e-13, 1.64236e-13]
* x float64 [m] (x) [-9.6e-06, -8.8e-06, ..., 9.6e-06, 1.04e-05]
* y float64 [m] (y) [-1e-05, -9.2e-06, ..., 9.2e-06, 1e-05]
* z float64 [m] (t, z) [-5.2e-06, -5.1e-06, ..., 5.41e-05, 5.42e-05]
Data:
float64 [V/m] (t, x, y, z) [-1.08549e+07, -1.3967e+07, ..., 0, 0]
The reason for the z coordinate having two dimensions (t,z) is the fact that the data comes from a moving window simulation. This is clearly visible in the plot below.
# Let us just slice at some points to get a 2D dataset
Ex = Ex['x', 10]['y', 10]
print(Ex)
<scipp.DataArray>
Dimensions: Sizes[t:5, z:201, ]
Coordinates:
* t float64 [s] (t) [3.28471e-14, 6.56942e-14, ..., 1.31388e-13, 1.64236e-13]
x float64 [m] () -1.6e-06
y float64 [m] () -2e-06
* z float64 [m] (t, z) [-5.2e-06, -5.1e-06, ..., 5.41e-05, 5.42e-05]
Data:
float64 [V/m] (t, z) [-8.41738e+08, -7.8752e+08, ..., 0, 0]
Ex.plot()
Coming soon!