From 45a7c544c0f4751ca2e3b372bdbbbd341192266a Mon Sep 17 00:00:00 2001 From: kencx Date: Wed, 3 Apr 2024 01:18:39 +0800 Subject: [PATCH] Update documentation --- README.md | 47 +++++++-- docs/src/SUMMARY.md | 5 - docs/src/ansible/roles/unseal_vault.md | 2 + docs/src/getting_started.md | 127 +++++++++++++++++-------- docs/src/images/packer.md | 26 ++--- docs/src/index.md | 50 ++++++++-- docs/src/prerequisites.md | 21 ++-- docs/src/references/TODO.md | 21 ++-- docs/src/references/issues.md | 7 +- docs/src/terraform/proxmox.md | 34 +++---- 10 files changed, 228 insertions(+), 112 deletions(-) diff --git a/README.md b/README.md index 53e4f1e..873a8c4 100644 --- a/README.md +++ b/README.md @@ -3,8 +3,8 @@ **[Documentation](https://kencx.github.io/homelab)** This repository contains infrastructure-as-code for the automated deployment and -configuration and management of a Hashicorp (Nomad + Consul + Vault) cluster. -The cluster is hosted on Proxmox as a personal, private homelab. +configuration, and management of a Hashicorp (Nomad + Consul + Vault) cluster on +Proxmox. ## Disclaimer @@ -17,13 +17,12 @@ actions that are irreversible! ## Overview -This project aims to provision a full Hashicorp cluster in a semi-automated +This project aims to provision a full Hashicorp cluster in a **semi-automated** manner. It utilizes Packer, Ansible and Terraform: -- Packer creates base Proxmox VM templates from cloud images and ISOs -- Terraform provisions cluster nodes by cloning existing VM templates -- Ansible installs and configures Vault, Consul, Nomad on cluster - nodes +1. Packer creates base Proxmox VM templates from cloud images and ISOs +2. Terraform provisions cluster nodes by cloning existing VM templates +3. Ansible installs and configures Vault, Consul, Nomad on cluster nodes It comprises minimally of one server and one client node with no high availability (HA). The nodes run Vault, Consul and Nomad as a cluster. @@ -41,15 +40,47 @@ physical machines. - [x] Consul service discovery - [x] Secure node communication via mTLS - [x] Personal Certificate Authority hosted on Vault +- [x] Secrets management, retrieval and rotation with Vault - [x] Automated certificate management with Vault and consul-template - [x] Let's Encrypt certificates on Traefik reverse proxy -- [x] Scheduled, automated backups with Restic and Autorestic ## Getting Started See the [documentation](https://kencx.github.io/homelab/getting_started) for more information on the concrete steps to configure and provision the cluster. +## Folder Structure + +```bash +. +├── ansible/ +│ ├── roles +│ ├── playbooks +│ ├── inventory # inventory files +│ └── goss # goss config +├── bin # custom scripts +├── packer/ +│ ├── base # VM template from ISO +│ └── base-clone # VM template from existing template +└── terraform/ + ├── cluster # config for cluster + ├── dev # config where I test changes + ├── minio # config for Minio buckets + ├── modules # tf modules + ├── nomad # nomad jobs + ├── postgres # config for Postgres DB users + ├── proxmox # config for Proxmox accounts + └── vault # config for Vault +``` + +## Limitations + +- Manual Vault unseal on reboot +- Inter-job dependencies are [not supported](https://github.com/hashicorp/nomad/issues/545) in Nomad +- Vault agent is run as root + +See [issues]() for more information. + ## Acknowledgements - [CGamesPlay/infra](https://github.com/CGamesPlay/infra) diff --git a/docs/src/SUMMARY.md b/docs/src/SUMMARY.md index 9570ddb..0d576c2 100644 --- a/docs/src/SUMMARY.md +++ b/docs/src/SUMMARY.md @@ -29,10 +29,7 @@ - [Unseal Vault](ansible/roles/unseal_vault.md) - [Vault](ansible/roles/vault.md) - - - @@ -45,8 +42,6 @@ - [Diun](apps/diun.md) - [Registry](apps/registry.md) -- [Backups](backups.md) - # References - [Known Issues](references/issues.md) diff --git a/docs/src/ansible/roles/unseal_vault.md b/docs/src/ansible/roles/unseal_vault.md index fb760fe..e75921b 100644 --- a/docs/src/ansible/roles/unseal_vault.md +++ b/docs/src/ansible/roles/unseal_vault.md @@ -1,3 +1,5 @@ +# Unseal Vault + >**Work in Progress**: This role is unfinished and untested. This role unseals an initialized but sealed Vault server. The unseal key shares diff --git a/docs/src/getting_started.md b/docs/src/getting_started.md index 242d62d..f6b4cba 100644 --- a/docs/src/getting_started.md +++ b/docs/src/getting_started.md @@ -1,40 +1,63 @@ # Getting Started -This documents provides an overview for provisioning and installing the cluster. +Our goal is to provision a Nomad, Consul and Vault cluster with one server node +and one client node. The basic provisioning flow is as follows: ->**Note**: It is assumed that all nodes are running on Proxmox as Debian 11 VMs. ->Please fork the project and make the necessary configuration changes should you ->choose to run the cluster with LXCs or an alternative distro. +1. Packer creates base Proxmox VM templates from cloud images and ISOs +2. Terraform provisions cluster nodes by cloning existing VM templates +3. Ansible installs and configures Vault, Consul, Nomad on cluster nodes -## Prerequisites +### Assumptions -See [Prerequisites](prerequisites.md) for the full requirements. +The following assumptions are made in this guide: ->**Note**: Use the `bin/generate-vars` script to quickly generate variable files ->in `packer` and `terraform` subdirectories. +- All [prerequisites](./prerequisites.md) are fulfilled +- The cluster is provisioned on a Proxmox server +- All nodes are running Debian 11 virtual machines (not LXCs) + +Please make the necessary changes if there are any deviations from the above. ## Creating a VM template -There are two methods to create a VM template: +The Proxmox builder plugin is used to create a new VM template. It supports two +different builders: -- From an [ISO file](./images/packer.md#proxmox-iso) (WIP) -- From an [existing cloud image](./images/packer.md#proxmox-clone) (recommended) +- `proxmox-clone` - From an [existing VM template](./images/packer.md#proxmox-clone) (recommended) +- `proxmox-iso` - From an [ISO file](./images/packer.md#proxmox-iso) (incomplete) -We will be building the template from an existing cloud image. +We will be using the first builder. If you have an existing template to +provision, you may [skip to the next section](#provisioning-with-terraform). +Otherwise, assuming that we are lacking an existing, clean VM template, we will +import a cloud image and turn it into a new template. ->**Note**: See [Cloud Image](images/cloud_image.md) for how to import an ->existing cloud image into Proxmox. +>**Note**: It is important that the existing template [must +>have](https://pve.proxmox.com/wiki/Cloud-Init_Support#_preparing_cloud_init_templates): +> +> - An attached cloud-init drive for the builder to add the SSH communicator +> configuration +> - cloud-init installed +> - qemu-guest-agent installed -1. Navigate to `packer/base-clone`. -2. Populate the necessary variables in `auto.pkrvars.hcl`: +1. (Optional) Run the `bin/import-cloud-image` [script](./images/cloud_image.html#script) to import a new cloud image: + +```bash +$ import-cloud-image [URL] +``` + +2. Navigate to `packer/base-clone` + +>**Tip**: Use the `bin/generate-vars` script to quickly generate variable files +>in `packer` and `terraform` subdirectories. + +3. Populate the necessary variables in `auto.pkrvars.hcl`: ```hcl -proxmox_url = "https://${PVE_IP}:8006/api2/json" -proxmox_username = "user@pam" -proxmox_password = "password" +proxmox_url = "https://:8006/api2/json" +proxmox_username = "@pam" +proxmox_password = "" -clone_vm = "cloud-image-name" -vm_name = "base-template" +clone_vm = "" +vm_name = "" vm_id = 5000 ssh_username = "debian" @@ -42,25 +65,29 @@ ssh_public_key_path = "/path/to/public/key" ssh_private_key_path = "/path/to/private/key" ``` -3. Build the image: +4. Build the image: ```bash $ packer validate -var-file="auto.pkrvars.hcl" . $ packer build -var-file="auto.pkrvars.hcl" . ``` -Packer will create a new base image that has common configuration and -software installed (eg. Docker). For more information, refer to -[Packer](./images/packer.md#proxmox-clone). +Packer will create a new base image and use the Ansible post-provisioner to +install and configure software (eg. Docker, Nomad, Consul and Vault). For more +details, see [Packer](./images/packer.md#proxmox-clone). ## Provisioning with Terraform -1. Navigate to `terraform/cluster`. +We are using the +[bpg/proxmox](https://registry.terraform.io/providers/bpg/proxmox/latest/docs) +provider to provision virtual machines from our Packer templates. + +1. Navigate to `terraform/cluster` 2. Populate the necessary variables in `terraform.tfvars`: ```hcl -proxmox_ip = "https://${PVE_IP}:8006/api2/json" -proxmox_api_token = "${API_TOKEN}" +proxmox_ip = "https://:8006/api2/json" +proxmox_api_token = "" template_id = 5000 ip_gateway = "10.10.10.1" @@ -94,11 +121,8 @@ ssh_private_key_file = "/path/to/ssh/private/key" ssh_public_key_file = "/path/to/ssh/public/key" ``` ->**Note**: To create a Proxmox API token, see [Access ->Management](./terraform/proxmox.md#access-management). - ->**Note**: Any template to be cloned by Terraform must have `cloud-init` and ->`qemu-guest-agent` installed. + + 3. Provision the cluster: @@ -118,30 +142,57 @@ Client node: VMID 111 at 10.10.10.111 An Ansible inventory file `tf_ansible_inventory` should be generated in the same directory with the given VM IPs in the `server` and `client` groups. -For more information, refer to the [Terraform configuration for +For more details, refer to the [Terraform configuration for Proxmox](terraform/proxmox.md). ## Configuration with Ansible -1. Navigate to `ansible`. +At this stage, there should be one server node and one client node running on +Proxmox that is reachable by SSH. These nodes should have Nomad, Consul and +Vault installed. We will proceed to use Ansible (and Terraform) to configure +Vault, Consul and Nomad (in that order) into a working cluster. + +1. Navigate to `ansible` 2. Ensure that the Terraform-generated Ansible inventory file is being read: ```bash $ ansible-inventory --graph ``` -3. Populate and check the `group_vars` file in +3. Populate and check the `group_vars` files in `inventory/group_vars/{prod,server,client}.yml` ```bash $ ansible-inventory --graph --vars ``` +>**Note**: The `nfs_share_mounts` variable in `inventory/group_vars/client.yml` +>should be modified or removed if not required + 4. Run the playbook: ```bash $ ansible-playbook main.yml ``` -This will configure and start Vault, Consul and Nomad in both nodes with mTLS -and gossip encryption. +The playbook will perform the following idempotently: + +1. Create a root and intermediate CA for Vault +2. Configure Vault to use new CA +3. Initialize Vault roles, authentication and PKI with Terraform with + [configuration](./terraform/vault.md) in `terraform/vault` +4. Configure Vault-agent and consul-template in server node +5. Configure Consul and Nomad in server node. These roles depend on Vault being + successfully configured and started as they require Vault to generate a + gossip key and TLS certificates +6. Repeat 4-5 for client node + +### Note on Data Loss + +When re-running the playbook on the same server, Vault will not be +re-initialized. However, if the playbook is run on a separate server (eg. for +testing on a dev cluster), the Vault role will permanently delete any +existing state in the `terraform/vault` subdirectory if a different +`vault_terraform_workspace` is not provided. This WILL result in permanent data +loss and care should be taken when running the role (and playbook) on multiple +clusters or servers. diff --git a/docs/src/images/packer.md b/docs/src/images/packer.md index fe7d69f..b73072d 100644 --- a/docs/src/images/packer.md +++ b/docs/src/images/packer.md @@ -10,27 +10,27 @@ Proxmox. ## Proxmox-clone -The `proxmox-clone` builder creates a new VM template from an existing one. This -is best used with an [uploaded cloud image](./cloud_image.md) which has been -converted into a VM template. +The `proxmox-clone` builder creates a new VM template from an existing one. If +you do not have an existing VM template or want to create a new template, you +can [upload a new cloud image](./cloud_image.md) and convert it into a new VM template. -This existing template [must +Note that this existing template [must have](https://pve.proxmox.com/wiki/Cloud-Init_Support#_preparing_cloud_init_templates): - An attached cloud-init drive for the builder to add the SSH communicator - configuration. -- `cloud-init` installed. + configuration +- `cloud-init` installed -The builder will do the following: +After running the builder, it will do the following: -1. Clone existing template. -2. Add a SSH communicator configuration via cloud-init. +1. Clone existing template by given name +2. Add a SSH communicator configuration via cloud-init 3. Connect via SSH and run the shell provisioner scripts to prepare the VM for - Ansible. -4. Install and start `qemu-guest-agent`. -5. Run the Ansible provisioner with the `ansible/common.yml` playbook. + Ansible +4. Install and start `qemu-guest-agent` +5. Run the Ansible provisioner with the `ansible/common.yml` playbook 6. Stop and convert the VM into a template with a new (and empty) cloud-init - drive. + drive ### Variables diff --git a/docs/src/index.md b/docs/src/index.md index 1dcb2c9..873a8c4 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -3,26 +3,26 @@ **[Documentation](https://kencx.github.io/homelab)** This repository contains infrastructure-as-code for the automated deployment and -configuration and management of a Hashicorp (Nomad + Consul + Vault) cluster. -The cluster is hosted on Proxmox as a personal, private homelab. +configuration, and management of a Hashicorp (Nomad + Consul + Vault) cluster on +Proxmox. ## Disclaimer -This project is in pre-alpha status and subject to +This project is in alpha status and subject to [bugs](https://kencx.github.io/homelab/references/issues) and breaking changes. + Please do not run any code on your machine without understanding the provisioning flow, in case of data loss. Some playbooks may perform destructive actions that are irreversible! ## Overview -This project aims to provision a full Hashicorp cluster in a semi-automated +This project aims to provision a full Hashicorp cluster in a **semi-automated** manner. It utilizes Packer, Ansible and Terraform: -- Packer creates base Proxmox VM templates from cloud images and ISOs -- Terraform provisions cluster nodes by cloning existing VM templates -- Ansible installs and configures Vault, Consul, Nomad on cluster - nodes +1. Packer creates base Proxmox VM templates from cloud images and ISOs +2. Terraform provisions cluster nodes by cloning existing VM templates +3. Ansible installs and configures Vault, Consul, Nomad on cluster nodes It comprises minimally of one server and one client node with no high availability (HA). The nodes run Vault, Consul and Nomad as a cluster. @@ -40,15 +40,47 @@ physical machines. - [x] Consul service discovery - [x] Secure node communication via mTLS - [x] Personal Certificate Authority hosted on Vault +- [x] Secrets management, retrieval and rotation with Vault - [x] Automated certificate management with Vault and consul-template - [x] Let's Encrypt certificates on Traefik reverse proxy -- [x] Scheduled, automated backups with Restic and Autorestic ## Getting Started See the [documentation](https://kencx.github.io/homelab/getting_started) for more information on the concrete steps to configure and provision the cluster. +## Folder Structure + +```bash +. +├── ansible/ +│ ├── roles +│ ├── playbooks +│ ├── inventory # inventory files +│ └── goss # goss config +├── bin # custom scripts +├── packer/ +│ ├── base # VM template from ISO +│ └── base-clone # VM template from existing template +└── terraform/ + ├── cluster # config for cluster + ├── dev # config where I test changes + ├── minio # config for Minio buckets + ├── modules # tf modules + ├── nomad # nomad jobs + ├── postgres # config for Postgres DB users + ├── proxmox # config for Proxmox accounts + └── vault # config for Vault +``` + +## Limitations + +- Manual Vault unseal on reboot +- Inter-job dependencies are [not supported](https://github.com/hashicorp/nomad/issues/545) in Nomad +- Vault agent is run as root + +See [issues]() for more information. + ## Acknowledgements - [CGamesPlay/infra](https://github.com/CGamesPlay/infra) diff --git a/docs/src/prerequisites.md b/docs/src/prerequisites.md index 299d309..c459844 100644 --- a/docs/src/prerequisites.md +++ b/docs/src/prerequisites.md @@ -34,14 +34,23 @@ The following are optional, but highly recommended: [Coredns](roles/coredns.md). - A custom domain from any domain registrar, added to Cloudflare as a zone. -## Controller Host +## Controller Node -A controller host with the provisioning tools (Packer, Ansible, Terraform) installed. +A workstation, controller node or separate host system will be used to run the +required provisioning tools. This system will need to have the following tools +installed: + +- Packer +- Terraform +- Ansible +- Python 3 for various scripts (optional) + +Alternatively, you are free to install the above tools on the same server that +you are provisioning the cluster. ## Cluster Requirements -- A Proxmox base image template, either from [an existing cloud - image](images/cloud_image.md) or built with [Packer](images/packer.md). +- An existing Proxmox server that is reachable by the controller node - (Optional) An offline, private root and intermediate CA. - A self-signed certificate, private key for TLS encryption of Vault. A default key-pair is @@ -51,5 +60,5 @@ A controller host with the provisioning tools (Packer, Ansible, Terraform) insta >**Note**: While Vault can use certificates generated from its own PKI secrets >engine, a temporary key pair is still required to start up Vault. -- (Optional) A secure password manager. This project supports [Bitwarden](https://bitwarden.com/) with - custom scripts. + + diff --git a/docs/src/references/TODO.md b/docs/src/references/TODO.md index 7d0d39e..c04605a 100644 --- a/docs/src/references/TODO.md +++ b/docs/src/references/TODO.md @@ -1,19 +1,12 @@ # Roadmap -- [ ] Secure sudo user -- [ ] Fix configuragble cert TTL by Vault -- [ ] Make Bitwarden scripts in Vault role more robust -- [ ] Nomad, Consul automated gossip key rotation -- [ ] Nomad, Consul ACLs - [ ] Run consul-template as non-root user -- [ ] Replace fail2ban with crowdsec -- [ ] Setup Authelia -- [ ] Complete `autorestic` role - - Installation of restic and autorestic not implemented - - `autorestic.env` not populated by Ansible -- [ ] Complete `unseal_vault` role -- [ ] Fix Packer `base` ISO build +- [ ] Run vault-agent as non-root user +- [ ] Automated gossip key rotation for Nomad and Consul +- [ ] ACLs for Nomad and Consul +- [ ] `unseal_vault` role +- [ ] Packer `base` builder - `preseed.cfg` is unreachable by boot command when controller host and Proxmox VM are on different subnets. -- [ ] systemd notification on failure -- [ ] Monitoring stack on separate node +- [ ] Fix configurable cert TTL by Vault +- [ ] Improve robustness of Bitwarden scripts in Vault role diff --git a/docs/src/references/issues.md b/docs/src/references/issues.md index bbd1180..707e808 100644 --- a/docs/src/references/issues.md +++ b/docs/src/references/issues.md @@ -1,12 +1,15 @@ +# Issues + This documents known issues that have not been fixed. ## Manual Vault Unseal Process Vault server must be manually unsealed when host is rebooted. -## Nomad +## Unreachable Nomad Jobs on Reboot -On some occasions, restarting the Nomad client results in some running jobs being unreachable. The temporary fix is to restart the job (not alloc or task). +On some occasions, restarting the Nomad client results in some running jobs +being unreachable. The temporary fix is to restart the job (not alloc or task). ## ~Vault-agent not reloading TLS certs~ diff --git a/docs/src/terraform/proxmox.md b/docs/src/terraform/proxmox.md index 2bc9237..e32b890 100644 --- a/docs/src/terraform/proxmox.md +++ b/docs/src/terraform/proxmox.md @@ -9,23 +9,23 @@ provider to manage three types of Proxmox resources: - Cloud images - VMs -## Access Management - -This configuration is found in `terraform/proxmox` and creates a dedicated -Terraform user for the management of Proxmox VMs to be described later. It -defines a `terraform@pam` user in a `Terraform` group which have the minimum -roles required for creating, cloning and destroying VMs. This configuration -requires credentials with at least the `PVEUserAdmin` role (I use the root user -for convenience). - -After creating the user, we must create an API token in the web console with the -following options: - -```text -user: terraform@pam -token_id: some_secret -privilege_separation: false -``` + + + + + + + + + + + + + + + + + ## Upload of Cloud Images