This project is a framework for building HDP/HDF clusters on bare metal hardware with Ansible and deployed across a Docker Swarm.
- Step 1:
- Install Docker and K8s on target hosts
- Step 2:
All hosts information and deployments are captured here
Docker Swarm and the base network is created here.
Once the Swarm has been initialized, each host needs to join it.
After the Docker Swarm environment has been created, assign docker node labels with this ansible playbook.
https://portainer.readthedocs.io/en/stable/
Every cluster has basic dependencies on infrastructure components like a 'database', a local 'repo' and 'ldap'.
With the 'hdp_base' network created with (./infrastructure/docker-init-swarm.sh) script, run the infrastructure playbook to create the shared 'mysql' database, 'http repo' and 'proxy' host used by the clusters.
The 'proxy' host, which is an 'sshd' server with an exposed port, is attached to the docker network used by all the clusters. I use this to 'tunnel' into the network via dynamic proxy, which I configure a browser to use.
In the bin directory are helper scripts for creating hosts to deploy an HDP/HDF cluster on. Each script requires a configuration parameter -i <instance>
. This reference a configuration file. The 'instance' value is anything between 01 and 99.
Before running the initialization script, create the supporting config file from either the 2.6 template or the 2.7 template. The configuration file identifies details about how to setup / start the desired cluster, starting with the location.
There are 4 location configurations (full, left, center, and right). See deployments for details on these locations.
Each location template has been assigned to a particular set of hosts. The hosts in the docker swarm cluster have been assigned 'node labels' to assist with placement.
The cluster deployments use 'docker stack deploy' via a docker compose file to publish hosts for the cluster.
The docker images have been configured to build on the hosts 'pam/sssd' integration with the FreeIPA server that's configured on OS1. This is done by mounting the following on for images, see the compose file:
- type: bind
source: /var/lib/sss/pipes/
target: /var/lib/sss/pipes/
- type: bind
source: /var/lib/sss/mc/
target: /var/lib/sss/mc/
If you are deploying the cluster without a blueprint, the 01_deploy.py
script will create the hosts, install and configure each host with Ambari and manually register those Ambari Agent hosts with the Ambari Server.
- Create the Cluster via Ambari.
- See the notes on cluster configuration adjustments to work in a docker stack environment.
- Add support of ExtJs for Oozie Web UI.
- Run the oozie extjs playbook to add the libraries needed for the Oozie Web UI.
- Restart the services.
Depending on where you've install the Ranger Admin service, adjust your docker hosts yaml file for the cluster to include the references to the Ranger Admin UI. Fix the following properties:
- zookeeper_server
- ranger_url_base
Then run the best practices scripts against the clusters. This playbook can be found in the hwx-sdlc-apps along with a user onboarding helper script.
ansible-playbook -i <environment_host.yaml> standards/01_bp_ranger_policies.yaml
This will setup basic user and cluster rules for HDFS and Hive.
This is a docker environment, so we've needed to make a few adjustments to the HDP deployments to get them working. See HDP Readme and HDF Readme for details.