Skip to content

Commit

Permalink
Ability to set custom ccp_monitoring pass
Browse files Browse the repository at this point in the history
With this change users can update the
<cluster>-monitoring secret with a password, in either the `stringData`
or `data` secret fields, and remove the verifier to update the
ccp_monitoring password in postgres. After this users will need to
restart the exporter process by deleting the instance pods (a solution
that doesn't require full pod restarts is coming).

This change is to support monitoring for standby clusters. Before this a
standby cluster would be created without having access to the
ccp_monitoring user password that was replicated from the primary
cluster.

Test to ensure that the postgres_exporter can scrape postgres using a
custom ccp_monitoring password.

The tests will:
1. create a cluster with exporter enabled and ensure metrics can be
   collected
2. Update the password and restart the pod
3. ensure that metrics can still be collected with the new password

Tests now require jq to run

- Refactor existing exporter tests

  - Split out the tls and no-tls tests into separate directories.
  - Update the tests to check the containers ready conditions
  - Add collectors for test failures
  - Include a test where we deploy a postgres-cluster with monitoring
    enabled on a replica. It will then check that the exporter on the
    replica can hit the query the database

- Update exporter to use pass file

  - The exporter container now provides the ccp_monitoring password to
    postgres_exporter using a password file instead of an environment
    variable. With this, the password can be updated without requiring a
    container restart. The path to the password file has also been added to
    the exporter watcher logic meaning that the postgres_exporter process
    will be restarted when either the queries directory or password file
    change.

  - The password change test is updated to check that the postgres_exporter
    pid has been updated before trying to re-connect.

- Update pgmonitor 4.9

  - Update to pull pgMonitor 4.9 queries. The new version has a specific
    file for the global_dbsize metric that needs to be included when
    generating the default queries

- Standby metrics testing

  - Now that the password for the monitoring user is configurable, users can
    configure a standby cluster to allow the exporter to query postgres
    using the ccp_monitoring user. This change implements testing to
    validate this use case. This test is included in e2e-other because it
    requires more work. We need to ensure a backup is complete before
    attempting to curl metrics. See note below*

Note: Move standby and replica tests to e2e-other

These two test can fail because of a scrape_error if a backup has not
completed. They need to be updated to check that a backup is complete
before attempting to collect metrics. There is a related story in our
backlog.

Due to the race condition, backup not being complete, they could pass or
fail. After a backup chack is in place they should be able to move back
into the e2e directory.
  • Loading branch information
jmckulk committed Sep 22, 2023
1 parent c4d39af commit be108de
Show file tree
Hide file tree
Showing 73 changed files with 1,059 additions and 494 deletions.
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ PGO_IMAGE_URL ?= https://www.crunchydata.com/products/crunchy-postgresql-for-kub
PGO_IMAGE_PREFIX ?= localhost

PGMONITOR_DIR ?= hack/tools/pgmonitor
PGMONITOR_VERSION ?= v4.8.1
PGMONITOR_VERSION ?= v4.9.0
QUERIES_CONFIG_DIR ?= hack/tools/queries

# Buildah's "build" used to be "bud". Use the alias to be compatible for a while.
Expand Down Expand Up @@ -247,7 +247,7 @@ generate-kuttl: ## Generate kuttl tests
source="$${1}" target="$${1/e2e/e2e-generated}"; \
mkdir -p "$${target%/*}"; render < "$${source}" > "$${target}"; \
shift; \
done' - testing/kuttl/e2e/*/*.yaml testing/kuttl/e2e-other/*/*.yaml
done' - testing/kuttl/e2e/*/*.yaml testing/kuttl/e2e-other/*/*.yaml testing/kuttl/e2e/*/*/*.yaml testing/kuttl/e2e-other/*/*/*.yaml

##@ Generate

Expand Down
59 changes: 34 additions & 25 deletions internal/controller/postgrescluster/pgmonitor.go
Original file line number Diff line number Diff line change
Expand Up @@ -197,25 +197,33 @@ func (r *Reconciler) reconcileMonitoringSecret(

intent.Data = make(map[string][]byte)

if len(existing.Data["password"]) == 0 || len(existing.Data["verifier"]) == 0 {
// Copy existing password and verifier into the intent
if existing.Data != nil {
intent.Data["password"] = existing.Data["password"]
intent.Data["verifier"] = existing.Data["verifier"]
}

// When password is unset, generate a new one
if len(intent.Data["password"]) == 0 {
password, err := util.GenerateASCIIPassword(util.DefaultGeneratedPasswordLength)
if err != nil {
return nil, err
}
intent.Data["password"] = []byte(password)
// We generated a new password, unset the verifier so that it is regenerated
intent.Data["verifier"] = nil
}

// Generate the SCRAM verifier now and store alongside the plaintext
// password so that later reconciles don't generate it repeatedly.
// NOTE(cbandy): We don't have a function to compare a plaintext password
// to a SCRAM verifier.
verifier, err := pgpassword.NewSCRAMPassword(password).Build()
// When a password has been generated or the verifier is empty,
// generate a verifier based on the current password.
// NOTE(cbandy): We don't have a function to compare a plaintext
// password to a SCRAM verifier.
if len(intent.Data["verifier"]) == 0 {
verifier, err := pgpassword.NewSCRAMPassword(string(intent.Data["password"])).Build()
if err != nil {
return nil, err
return nil, errors.WithStack(err)
}
intent.Data["password"] = []byte(password)
intent.Data["verifier"] = []byte(verifier)
} else {
intent.Data["password"] = existing.Data["password"]
intent.Data["verifier"] = existing.Data["verifier"]
}

err = errors.WithStack(r.setControllerReference(cluster, intent))
Expand Down Expand Up @@ -267,19 +275,7 @@ func addPGMonitorExporterToInstancePodSpec(
Env: []corev1.EnvVar{
{Name: "DATA_SOURCE_URI", Value: fmt.Sprintf("%s:%d/%s", pgmonitor.ExporterHost, *cluster.Spec.Port, pgmonitor.ExporterDB)},
{Name: "DATA_SOURCE_USER", Value: pgmonitor.MonitoringUser},
{Name: "DATA_SOURCE_PASS", ValueFrom: &corev1.EnvVarSource{
// Environment variables are not updated after a secret update.
// This could lead to a state where the exporter does not have
// the correct password and the container needs to restart.
// https://kubernetes.io/docs/concepts/configuration/secret/#environment-variables-are-not-updated-after-a-secret-update
// https://github.com/kubernetes/kubernetes/issues/29761
SecretKeyRef: &corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{
Name: naming.MonitoringUserSecret(cluster).Name,
},
Key: "password",
},
}},
{Name: "DATA_SOURCE_PASS_FILE", Value: "/opt/crunchy/password"},
},
SecurityContext: securityContext,
// ContainerPort is needed to support proper target discovery by Prometheus for pgMonitor
Expand All @@ -293,12 +289,25 @@ func addPGMonitorExporterToInstancePodSpec(
Name: "exporter-config",
// this is the path for both custom and default queries files
MountPath: "/conf",
}, {
Name: "monitoring-secret",
MountPath: "/opt/crunchy/",
}},
}

template.Spec.Containers = append(template.Spec.Containers, exporterContainer)

// add exporter config volume
passwordVolume := corev1.Volume{
Name: "monitoring-secret",
VolumeSource: corev1.VolumeSource{
Secret: &corev1.SecretVolumeSource{
SecretName: naming.MonitoringUserSecret(cluster).Name,
},
},
}
template.Spec.Volumes = append(template.Spec.Volumes, passwordVolume)

// add custom exporter config volume
configVolume := corev1.Volume{
Name: "exporter-config",
VolumeSource: corev1.VolumeSource{
Expand Down
15 changes: 6 additions & 9 deletions internal/controller/postgrescluster/pgmonitor_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -121,14 +121,7 @@ func TestAddPGMonitorExporterToInstancePodSpec(t *testing.T) {
expectedENV := []corev1.EnvVar{
{Name: "DATA_SOURCE_URI", Value: fmt.Sprintf("localhost:%d/postgres", *cluster.Spec.Port)},
{Name: "DATA_SOURCE_USER", Value: pgmonitor.MonitoringUser},
{Name: "DATA_SOURCE_PASS", ValueFrom: &corev1.EnvVarSource{
SecretKeyRef: &corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{
Name: naming.MonitoringUserSecret(cluster).Name,
},
Key: "password",
},
}}}
{Name: "DATA_SOURCE_PASS_FILE", Value: "/opt/crunchy/password"}}
assert.DeepEqual(t, container.Env, expectedENV)

assert.Assert(t, container.Ports[0].ContainerPort == int32(9187), "Exporter container port number not set to '9187'.")
Expand Down Expand Up @@ -555,7 +548,7 @@ func TestReconcilePGMonitorExporterStatus(t *testing.T) {
podExecCalled: false,
// Status was generated manually for this test case
// TODO jmckulk: add code to generate status
status: v1beta1.MonitoringStatus{ExporterConfiguration: "79b86d7d69"},
status: v1beta1.MonitoringStatus{ExporterConfiguration: "66c45b8cfd"},
statusChangedAfterReconcile: false,
}} {
t.Run(test.name, func(t *testing.T) {
Expand Down Expand Up @@ -666,6 +659,8 @@ func TestReconcileMonitoringSecret(t *testing.T) {
cluster.UID = types.UID("hippouid")
cluster.Namespace = setupNamespace(t, cc).Name

// If the exporter is disabled then the secret should not exist
// Existing secrets should be removed
t.Run("ExporterDisabled", func(t *testing.T) {
t.Run("NotExisting", func(t *testing.T) {
secret, err := reconciler.reconcileMonitoringSecret(ctx, cluster)
Expand All @@ -688,6 +683,8 @@ func TestReconcileMonitoringSecret(t *testing.T) {
})
})

// If the exporter is enabled then a monitoring secret should exist
// It will need to be created or left in place with existing password
t.Run("ExporterEnabled", func(t *testing.T) {
var (
existing, actual *corev1.Secret
Expand Down
19 changes: 9 additions & 10 deletions internal/pgmonitor/exporter.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ var DefaultValuesForQueries = map[string]string{
func GenerateDefaultExporterQueries(ctx context.Context, cluster *v1beta1.PostgresCluster) string {
log := logging.FromContext(ctx)
var queries string
baseQueries := []string{"backrest", "global", "per_db", "nodemx"}
baseQueries := []string{"backrest", "global", "global_dbsize", "per_db", "nodemx"}
queriesConfigDir := GetQueriesConfigDir(ctx)

// TODO: When we add pgbouncer support we will do something like the following:
Expand Down Expand Up @@ -141,9 +141,6 @@ func ExporterStartCommand(commandFlags []string) []string {
// run function to combine queries files and start postgres_exporter
`start_postgres_exporter "$@"`,

// set directory to watch
`declare -r directory=/conf`,

// Create a file descriptor with a no-op process that will not get
// cleaned up
`exec {fd}<> <(:)`,
Expand All @@ -152,17 +149,19 @@ func ExporterStartCommand(commandFlags []string) []string {
// which uses up a lot of memory
`while read -r -t 3 -u "${fd}" || true; do`,

// If the directory's modify time is newer than our file descriptor's,
// If either directories' modify time is newer than our file descriptor's,
// something must have changed, so kill the postgres_exporter and rerun
// the function to combine queries files and start postgres_exporter
` if [ "${directory}" -nt "/proc/self/fd/${fd}" ] && echo "Something changed..." &&`,
` kill $(head -1 ${POSTGRES_EXPORTER_PIDFILE?}) && start_postgres_exporter "$@"`,
` if ([ "/conf" -nt "/proc/self/fd/${fd}" ] || [ "/opt/crunchy/password" -nt "/proc/self/fd/${fd}" ]) \`,
` && kill $(head -1 ${POSTGRES_EXPORTER_PIDFILE?}) && start_postgres_exporter "$@";`,
` then`,

// We then want to get rid of the old file descriptor, get a fresh one
// When something changes we want to get rid of the old file descriptor, get a fresh one
// and restart the loop
` then`,
` echo "Something changed..."`,
` exec {fd}>&- && exec {fd}<> <(:)`,
` stat --format='Latest queries file dated %%y' "${directory}"`,
` stat --format='Latest queries file dated %y' "/conf"`,
` stat --format='Latest password file dated %y' "/opt/crunchy/password"`,
` fi`,
`done`,
}, "\n")
Expand Down

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
apiVersion: kuttl.dev/v1beta1
kind: TestStep
apply:
- files/exporter-append-queries-configmap.yaml
- files/exporter-append-queries-cluster.yaml
assert:
- files/exporter-append-queries-cluster-checks.yaml
Loading

0 comments on commit be108de

Please sign in to comment.