In this document, we show an example of installing the core components of Pixels in AWS EC2. Installation in other cloud or on-premises environments is almost the same. In this example, we use Trino as the query engine. To use other query engines, skip Install Trino and check the instructions for the other query engine:
Here, we only install and configure the essential components for query processing. To use the following optional components, follow the instructions in the corresponding README.md after the basic installation:
- Pixels Cache: The distributed columnar cache to accelerate query processing.
- Pixels Turbo: The hybrid query engine that invokes serverless resources to help process unpredictable workload spikes.
- Pixels Amphi: The adaptive query scheduler that enables cost-efficient query processing in both on-perm and in-cloud environments.
In AWS EC2, create an Ubuntu-20.04 or 22.04 instance with x86 arch and at least 4GB memory and 20GB root volume.
8GB or larger memory is recommended for performance evaluations on datasets larger than 10GB.
Login the instance as ubuntu
user, and install the following components.
Installation steps marked with
*
are optional.
Install JDK 17.0 in the EC2 instance:
sudo apt install openjdk-17-jdk openjdk-17-jre
Check the java version:
java -version
If the other version of JDK is in use, switch to JDK 17:
update-java-alternatives --list
sudo update-java-alternatives --set /path/to/jdk-17.0
Oracle JDK 17.0, Azul Zulu JDK 17, or GraalVM 22 for Java 17 also works.
Pixels requires maven 3.8 or later to build the source code (some early maven versions may work, but we haven't test them yet). On some operating systems, the maven installed by apt
or yum
might be incompatible with new JDKs such as 17.
In this case, manually install a newer maven compatible with your JDK.
To use S3 as the underlying storage system, we have to configure the AWS credentials.
Currently, we do not configure the AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, and AWS_DEFAULT_REGION
from Pixels.
Therefore, configure these credentials using
environment variables or
credential files.
To build Pixels from source and install it locally, you can simply run:
echo 'export PIXELS_HOME=$HOME/opt/pixels/' >> ~/.bashrc
source ~/.bashrc
./install.sh
But you still need to:
- Put the MySQL JDBC connector into
PIXELS_HOME/lib
. - Modify
pixels.properties
to ensure the following properties are valid:
pixels.var.dir=/home/pixels/opt/pixels/var/
metadata.db.driver=com.mysql.jdbc.Driver
metadata.db.user=pixels
metadata.db.password=password
metadata.db.url=jdbc:mysql://localhost:3306/pixels_metadata?useUnicode=true&characterEncoding=UTF-8&zeroDateTimeBehavior=convertToNull
metadata.server.port=18888
metadata.server.host=localhost
trans.server.port=18889
trans.server.host=localhost
# query scheduling server for pixels-turbo
query.schedule.server.port=18893
query.schedule.server.host=localhost
etcd.hosts=localhost
etcd.port=2379
metrics.node.text.dir=/home/pixels/opt/node_exporter/text/
presto.pixels.jdbc.url=jdbc:trino://localhost:8080/pixels/tpch
The hostnames, ports, paths, usernames, and passwords in these properties are to be configured in the following steps of installation.
Note: optionally, you can also set the
PIXEL_CONFIG
system environment variable to specify a different location ofpixels.properties
. This can be a http or https URL to a remote location.
If you are installing a cluster, install Pixels on each node of the cluster.
Optionally, to install Pixels step-by-step, see the guidance below.
Here, we install Pixels and other binary packages into the ~/opt
directory:
mkdir ~/opt
Create the home ~/opt/pixels
for Pixels:
cd ~/opt
mkdir pixels
Append this line into ~/.bashrc
and source it:
export PIXELS_HOME=$HOME/opt/pixels/
Create the following directories in PIXELS_HOME
:
cd $PIXELS_HOME
mkdir bin
mkdir lib
mkdir listener
mkdir logs
mkdir sbin
mkdir var
Put the sh scripts in scripts/bin
and scripts/sbin
into PIXELS_HOME/bin
and PIXELS_HOME/sbin
, respectively.
Put pixels-daemon-*-full.jar
into PIXELS_HOME
.
Put pixels-cli-*-full.jar
into PIXELS_HOME/sbin
Put the jdbc connector of MySQL into PIXELS_HOME/lib
.
Put pixels-common/src/main/resources/pixels.properties
into PIXELS_HOME
.
Modify pixels.properties
to ensure that the URLs, ports, paths, usernames, and passwords are valid.
Leave the other config parameters as default.
Set cache.enabled
to false
in $PIXELS_HOME/pixels.properties
if you don't use pixels-cache.
MySQL and etcd are used to store the metadata and states of Pixels. MySQL/MariaDB 5.5 or later has been tested. Other forks or variants may also work. You only need to install one etcd and one MySQL instance, even in a cluster.
To install MySQL:
sudo apt update
sudo apt install mysql-server
sudo mysql_secure_installation
Note that for mysql 8+, you may have to set a native password for mysql
root
user before runningmysql_secure_installation
, for example:sudo mysql mysql> ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'your_root_password'; mysql> exit sudo mysql_secure_installation
Login MySQL and create a user and a metadata database for Pixels:
CREATE USER 'pixels'@'%' IDENTIFIED BY 'password';
CREATE DATABASE pixels_metadata;
GRANT ALL PRIVILEGES ON pixels_metadata.* to 'pixels'@'%';
FLUSH PRIVILEGES;
Ensure that MySQL server can be accessed remotely. Sometimes the default MySQL configuration binds the server to localhost thus declines remote connections.
Use scripts/sql/metadata_schema.sql
to create tables in pixels_metadata
.
First install go-lang:
sudo apt install golang
Copy scripts/tars/etcd-v3.3.4-linux-amd64.tar.xz
into ~/opt
and decompress it.
Append the following lines into ~/.bashrc
and source it:
export ETCDCTL_API=3
export ETCD=$HOME/opt/etcd-v3.3.4-linux-amd64-bin
export PATH=$PATH:$ETCD
All the following commands are executed under ~/opt
by default.
Create the link and start etcd:
ln -s etcd-v3.3.4-linux-amd64-bin etcd
cd etcd
./start-etcd.sh
You can use screen
or nohup
to run it in the background.
The default etcd/conf.yml
is good for the default Ubuntu user in AWS Ubuntu instances.
If you are using your own OS installation or a different user, please modify the settings in conf.yml
accordingly,
especially for the data-dir
and *-urls
.
Hadoop is optional. It is only needed if you want to use HDFS as an underlying storage.
Pixels has been tested to be compatible with Hadoop-2.7.3 and Hadoop-3.3.1. Follow the official docs to install Hadoop.
Modify hdfs.config.dir
in PIXELS_HOME/pixels.properties
and point it to the etc/hadoop
directory under the home of Hadoop.
Pixels will read the Hadoop configuration files core-site.xml
and hdfs-site.xml
from this directory.
Note: (1) Some default ports used by Hadoop may conflict with the default ports used by Trino. In this case, modify the default port configuration of either system. (2) Hadoop 2.7.x and 3.3.x are not compatible with high-version JDKs such as JDK 17. Please configure Hadoop to use JDK 8.
Trino is the recommended query engine that works with Pixels. Currently, Pixels is compatible with Trino-405. Download and install Trino-405 following the instructions here.
Here, we install Trino to ~/opt/trino-server-405
and create a link for it:
ln -s trino-server-405 trino-server
Then download trino-cli into ~/opt/trino-server/bin/
and give executable permission to it.
There are two important directories in the home of trino-server: etc
and plugin
.
Decompress pixels-trino-listener-*.zip
and pixels-trino-connector-*.zip
into the plugin
directory.
The etc
directory contains the configuration files of Trino.
In addition to the configurations mentioned in the official docs, add the following configurations
for Pixels:
- Create the listener config file named
event-listener.properties
in theetc
directory, with the following content:
event-listener.name=pixels-event-listener
enabled=true
listened.user.prefix=none
listened.schema=pixels
listened.query.type=SELECT
log.dir=/home/ubuntu/opt/pixels/listener/
log-dir
should point to
an existing directory where the listener logs will appear.
- Create the catalog config file named
pixels.properties
for Pixels in theetc/catalog
directory, with the following content:
connector.name=pixels
# serverless config
# it can be on, off, auto, or session
cloud.function.switch=off
clean.intermediate.result=true
Note that etc/catalog/pixels.proterties
under Trino's home is different from PIXELS_HOME/pixels.properties
.
In Trino, Pixels can push projections, filters, joins, and aggregations into serverless computing services (e.g., AWS Lambda).
This feature is named Pixels-Turbo
and can be turned on by setting cloud.function.switch
to auto
(adaptively enabled) or on
(always enabled).
Turn it off
to only use Trino workers for query processing.
We can also set it to session
so that this switch can be dynamically turned on or off by the session properties pixels.cloud_function_enabled
.
This allows pixels-server
choosing whether to execute the query with cloud functions enabled.
Append the following two lines into etc/jvm.config
:
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
--add-opens=java.base/java.nio=ALL-UNNAMED
Thus, pixels can reflect internal or low-level classes to improve performance. This is only needed for Java 9+.
Some scripts in Trino may require python:
sudo apt-get install python
Prometheus and Grafana are optional. We can install them to monitor the performance metrics of the whole system.
To install Prometheus, copy node_exporter-1.5.0.linux-amd64.tar.xz
, jmx_exporter-0.18.0.tar.xz
, and prometheus-2.43.0.linux-amd64.tar.xz
under scripts/tars
into the ~/opt
directory and decompress them.
Node exporter and jmx exporter are needed for every node in a cluster, whereas Prometheus and Grafana are only needed on the node where you want
to store the performance metrics and provide visualization.
Create links:
ln -s jmx_exporter-0.18.0/ jmx_exporter
ln -s node_exporter-1.5.0.linux-amd64 node_exporter
ln -s prometheus-2.43.0.linux-amd64 prometheus
Append the following lines into ~/.bashrc
:
export PROMETHEUS_HOME=$HOME/opt/prometheus/
export NODE_EXPORTER_HOME=$HOME/opt/node_exporter
export PATH=$PATH:$PROMETHEUS_HOME:$NODE_EXPORTER_HOME
Enter the home of Trino-server. Append this line to etc/jvm.config
:
-javaagent:/home/ubuntu/opt/jmx_exporter/jmx_prometheus_javaagent-0.18.0.jar=9101:/home/ubuntu/opt/jmx_exporter/pixels-jmx.yml
Start node_exporter
and prometheus
respectively, using the start-*.sh
in their directories.
You can use screen
or nohup
to run them in the background.
Then, follow the instructions here to install Grafana.
Log in Grafana, create a Prometheus data source named cluster-monitor
.
Set URL
to http://localhost:9090
, Scrape Interval
to 5s
, and HTTP Method
to GET
. Other configurations remains default.
Import the json dashboard configuration files under scripts/grafana
into Grafana.
Then we get three dashboards Node Exporter
and JVM Exporter
in Grafana.
These dashboards can be used to monitor the performance metrics of the instance.
Enter PIXELS_HOME
, follow the pixels-cache document to configure and start Pixels if pixels-cache is enabled.
If pixels-cache is not enabled, directly start the daemons of Pixels using:
./sbin/start-pixels.sh
The essential services of Pixels, such as the metadata server, transaction manager, cache coordinator, cache manager, and metrics server, are running in these daemons.
To start Pixels in a cluster, edit PIXELS_HOME/sbin/workers
and list all the worker nodes in this file.
Each line in this file is in the following format:
hostname_of_the_worker pixels_home_of_the_worker
pixels_home_of_the_worker
is optional if the worker has the same PIXELS_HOME
as the coordinator node where you run
start_pixels.sh
.
You can also start Pixels coordinator and workers separately using PIXELS_HOME/start-coordinator
and PIXELS_HOME/start-workers.sh
.
Note: You can also add JVM OPTS for Pixels daemons in
PIXELS_HOME/bin/coordinator-jvm.config
andPIXELS_HOME/bin/worker-jvm.config
. This is useful for profiling and remote debugging.
After starting Pixels, enter the home of trino-server and start Trino:
./bin/launcher start
Connect to trino-server using trino-cli:
./bin/trino --server localhost:8080 --catalog pixels
Run SHOW SCHEMAS
in trino-cli, the result should be as follows if everything is installed correctly.
Schema
--------------------
information_schema
(1 row)
By now, Pixels has been installed in the EC2 instance. The aforementioned instructions should also work in other kinds of VMs or physical servers.
Run the script PIXELS_HOME/sbin/stop-pixels.sh
to stop Pixels, or run PIXELS_HOME/stop-coordinator
and PIXELS_HOME/stop-workers.sh
to stop the coordinator and workers, respectively.