Meta-JSON-Parser is written in C++, and it uses CMake build system (version 3.18 or higher). You need a C++ compiler that supports the C++17 standard.
Meta-JSON-Parser needs CUDA Toolkit, at least version 10.0, to be installed for it to compile and to run. The project uses the following header-only libraries from the CUDA SDK:
Meta-JSON-Parser also requires the following libraries to be either
installed locally or system-wide, or fetched as submodules into
the third_party/
subdirectory:
- Boost.Mp11: A C++11 metaprogramming library (minimum 1.73)
- GoogleTest: Google's C++ testing and mocking framework
(only formeta-json-parser-test
binary) - CLI11: Command line parser for C++11
(only formeta-json-parser-benchmark
binary)
The RAPIDS suite of libraries and APIs gives the ability to execute end-to-end data science and analytics pipelines entirely on NVIDIA GPUs. RAPIDS include the cuDF, a pandas-like DataFrame manipulation library for Python, that Meta-JSON-Parser intends to integrate with.
cuDF in turn uses the libcudf, a C++ GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data. Meta-JSON-Parser can optionally make use of libcudf, either to benchmark JSON parsing time against, or to integrate with the RAPIDS ecosystem.
To configure the build system to compile meta-json-parser-benchmark
with the libcudf support, use:
cmake -DUSE_LIBCUDF=1 ..
RAPIDS is available as conda packages, docker images, and from source builds.
To install Miniconda environment together with the conda
tool on Linux, one
can install it from RPM and Debian (dpkg) repositories for Miniconda, as
described in the RPM and Debian Repositories for Miniconda
section of the Conda User's Guide.
-
Download the public GPG key for conda repositories and add it to the keyring
$ curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor >~/conda.gpg $ sudo install -o root -g root -m 644 ~/conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg
-
Check whether fingerprint is correct (will output an error message otherwise)
$ gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806
-
Add conda repo to list of sources for apt
$ echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | sudo tee -a /etc/apt/sources.list.d/conda.list
-
Conda is ready to install
$ sudo apt update $ sudo apt install conda
-
To use
conda
you need to configure some environment variables$ source /opt/conda/etc/profile.d/conda.sh
-
You can see if the installation was successful by typing
$ conda -V conda 4.9.2
-
The command to install RAPIDS libraries, cuDF in particular, can be found via RAPIDS Release Selector.
The version 0.19 of the cuDF library (that was used in previous comparison) can be installed with:
$ conda create -n rapids-0.19 -c rapidsai -c nvidia -c conda-forge \ cudf=0.19 python=3.8 cudatoolkit=11.2
The current stable release of cuDF (as of January 2022), assuming Python 3.8 (there is Python 3.9.9 on 'choinka'), and CUDA 11.4 (version installed on 'choinka') can be installed with:
$ conda create -n rapids-21.12 -c rapidsai -c nvidia -c conda-forge \ cudf=21.12 python=3.8 cudatoolkit=11.4
Note that this installs cuDF locally, for the currently logged in user.
Note: to remove the created conda environment, use
$ conda env remove -n rapids-0.19
-
To use conda-installed cuDF (and libcudf), and to compile using this version of the library and its header files, one needs to activate 'rapids-21.12' environment with
conda
(as told byconda create ...
command that was invoked in the previous step):$ conda activate rapids-21.12
This commands, among other things, set-ups environment variables (including
CONDA_PREFIX
) and modifies shell prompt to include information about current conda environment. You can then recompile the project withmake clean && make
.To turn off using cuDF, run
conda deactivate
. -
To run
meta-json-parser-benchmark
with conda-installed libcudf, one needs to have 'rapids-21.22' environment active (see the previous step), at least to use the command as given here.For
meta-json-parser-benchmark
to use conda-installed libcudf, and to prefer it over system-installed version (if any), one needs to setLD_LIBRARY_PATH
environment variable correctly. It needs to include the path to conda-installed libcudf library, but also paths to libraries used by./meta-json-parser-benchmark
(you can find them withldd
command), for example:TODO: check for correctness.
LD_LIBRARY_PATH="${CONDA_PREFIX}/lib:/lib/x86_64-linux-gnu:/lib64" \ ./meta-json-parser-benchmark ../../data/json/generated/sample_400000.json 400000 \ --max-string-size=32 --const-order -o sample_b.csv
The RAPIDS images are based on nvidia/cuda, and are intended to be drop-in replacements for the corresponding CUDA images in order to make it easy to add RAPIDS libraries while maintaining support for existing CUDA applications.
RAPIDS images come in three types, distributed in two different repos:
base
- contains a RAPIDS environment ready for use.runtime
- extends the base image by adding a notebook server (JupyterLab) and example notebooks.devel
- contains the full RAPIDS source tree, pre-built with all artifacts in place, and the compiler toolchain, the debugging tools, the headers and the static libraries for RAPIDS development.
- NVIDIA Pascal GPU architecture (compute capability 6.1) or better
- CUDA 10.1+ with a compatible NVIDIA driver
- Ubuntu 18.04/20.04 or CentOS 7/8
- Docker CE v18+
- nvidia-container-toolkit
Docker Engine is an open source containerization technology for building and containerizing your applications. To install it on Linux, follow distribution-specific documentation
For example on Debian installing Docker CE takes the following steps:
-
Uninstall old versions (that were not working correctly):
$ sudo apt-get remove docker docker-engine docker.io containerd runc
-
Update the
apt
package index and install packages to allowapt
to use a repository over HTTPS:$ sudo apt-get update $ sudo apt-get install \ ca-certificates \ curl \ gnupg \ lsb-release
-
Add Docker’s official GPG key for signing Debian packages:
$ curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
-
Set up the stable package repository for Docker
$ echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
Note that for Debian unstable you might need to explicitly use the latest stable version instead (bullseye):
$ echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian bullseye stable" | sudo tee /etc/apt/sources.list.d/docker.list >/dev/null
-
Update the
apt
package index again,$ sudo apt-get update
checking that proper repository is used:
Get:6 https://download.docker.com/linux/debian bullseye InRelease Get:7 https://download.docker.com/linux/debian bullseye/stable amd64 Reading package lists... Done
-
Install the latest version of Docker Engine and containerd:
$ sudo apt-get install docker-ce docker-ce-cli containerd.io
You can check that the correct version of Docker was installed with
$ apt-cache madison docker-ce
which should return results from https://download.docker.com/linux/debian
You can check that Docker Engine is installed and runs correctly with
$ sudo docker run hello-world
You can also use
$ sudo docker ps -a
to see what Docker images are running and what images are installed.
As an optional post-installation step on Linux, you can configure Docker for use as a non-root user.
The NVIDIA Container Toolkit allows users to build and run GPU accelerated containers. The toolkit includes a container runtime library and utilities to automatically configure containers to leverage NVIDIA GPUs.
See NVIDIA Container Toolkit Installation Guide or NVIDIA Docker Engine wrapper repository for details.
-
Add NVIDIA’s official GPG key for signing packages:
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
-
Setup the
stable
repository$ distribution=$(. /etc/os-release; echo $ID$VERSION_ID) \ && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
If this did not work, you may need to set up Linux distribution and its version manually, for example:
$ curl -s -L https://nvidia.github.io/nvidia-docker/debian11/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
(which, as you can see, returns incorrect URLs?).
-
Update the
apt
package index,$ sudo apt-get update
checking that NVIDIA container repository is listed among repositories (https://nvidia.github.io/).
-
Install the
nvidia-docker2
package (and dependencies):$ sudo apt-get install nvidia-docker2
-
Restart the Docker daemon to complete the installation, and check that it works correctly:
$ sudo systemctl restart docker $ sudo systemctl status docker
At this point, a working setup can be tested by running a base CUDA container:
$ sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
If running a CUDA container fails with the following error message
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused:
process_linux.go:545: container init caused:
Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli:
container error: cgroup subsystem devices not found: unknown.
this might mean that the hierarchical v2 cgroups are used.
There are two possible solutions:
- turn off hierarchical cgroups by using
systemd.unified_cgroup_hierarchy=false
kernel command line parameter, or - turn off using cgroups by setting
no-cgroups = true
in/etc/nvidia-container-runtime/config.toml
, and adding NVIDIA devices manually to the container$ sudo docker run --rm --gpus all \ --device /dev/nvidia0 --device /dev/nvidia-modeset \ --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools \ --device /dev/nvidiactl \ nvidia/cuda:11.0-base nvidia-smi
Use the RAPIDS Release Selector, choose "Docker + Dev Env", and appropriate switches, to find the correct invocation:
$ docker pull rapidsai/rapidsai-core-dev:22.02-cuda11.5-devel-ubuntu20.04-py3.9
$ docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \
rapidsai/rapidsai-core-dev:22.02-cuda11.5-devel-ubuntu20.04-py3.9
Note that running docker
might require using sudo
, and that with
the cgroups2 workaround (cgroups disabled) one also needs to add appropriate
--device
options, see above.
The following ports are used by the runtime
and core-dev
containers only
(not base
containers):
- 8888 - exposes a JupyterLab notebook server
- 8786 - exposes a Dask scheduler
- 8787 - exposes a Dask diagnostic web server
Read more at RAPIDS at DockerHub.
To have access to the meta-json-parser
project in the RAPIDS Docker container,
you can mount the directory on host with this project to specific location
within container (using bind mount).
Running the container can be done like this:
$ sudo docker run --gpus all --rm -it \
--device /dev/nvidia0 --device /dev/nvidiactl \
--device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools \
-p 8888:8888 -p 8787:8787 -p 8786:8786 \
-v ${HOME}/GPU-IDUB/meta-json-parser:/meta-json-parser \
-v ${HOME}/GPU-IDUB/data:/data \
rapidsai/rapidsai-core-dev:21.12-cuda11.5-devel-ubuntu20.04-py3.8
Then you just need to change the directory in the container:
(rapids) root@8c8501d8b358:/rapids/notebooks# cd /meta-json-parser/build/
Before running cmake
you might need to remove its cache; simplest
solution is to clean the build
directory; it might be enough to
just remove CMakeCache.txt
.
To configure the build system to compile meta-json-parser-benchmark
with the libcudf support, use:
(rapids) root@8c8501d8b358:/meta-json-parser/build# cmake -DUSE_LIBCUDF=1 ..
Note that Boost installed in a RAPIDS Docker contained might be too old:
-- Could NOT find Boost: Found unsuitable version "1.72.0", but required is at least "1.73" (found /opt/conda/envs/rapids/include, )
Workaround for cmake
/make
/g++
using system-installed Boost:
(rapids) root@ccefed838be5:/meta-json-parser/build# cd /opt/conda/envs/rapids/include/boost
(rapids) root@ccefed838be5:/opt/conda/envs/rapids/include/boost# mv mp11 mp11_do_not_use
(rapids) root@ccefed838be5:/opt/conda/envs/rapids/include/boost# cd -
or simply
( cd /opt/conda/envs/rapids/include/boost ; mv mp11 mp11_do_not_use )
You need to also use local libraries from third_parties/
with
cmake -DUSE_LIBCUDF=1 -DLOCAL_LIB=1 ..
To use CUDA runtime in docker build
you need to install
nvidia-container-runtime.
source
sudo apt-get install nvidia-container-runtime
CUDA runtime must be set as a default runtime in /etc/docker/daemon.json
.
If you installed nvidia-docker2, then nvidia runtime might be already configured.
In that case provide just a default-runtime option.
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
}
It's the only way
to enable CUDA runtime in docker build
.
If you wish to cite this software in an academic publication, please use the following reference:
Formatted:
- K. Kaczmarski, J. Narębski, S. Piotrowski and P. Przymus, "Fast JSON parser using metaprogramming on GPU," 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA), Shenzhen, China, 2022, pp. 1-10, doi: 10.1109/DSAA54385.2022.10032381.
BibTeX:
@inproceedings{10032381,
author={Kaczmarski, Krzysztof and Narębski, Jakub and Piotrowski, Stanisław and Przymus, Piotr},
booktitle={2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA)},
title={Fast JSON parser using metaprogramming on GPU},
year={2022},
pages={1-10},
doi={10.1109/DSAA54385.2022.10032381}}