Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed replication slave failing to start #60

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

agronholm
Copy link
Contributor

Currently, replication slaves fail to start up using the official TimescaleDB images for two reasons:

  1. The init scripts are run on both master and slave nodes but they shouldn't because no database modifications should be attempted on the slave.
  2. The auto-tuning init script modifies server parameters which may cause the slave to fail at start because it connects to the master before any such tuning is done on the slave and finds that the server has a different max_worker_processes setting and bails out.

This PR disables autotuning when replication is used, and makes the first init script exit at start when it's being run on the slave so as not to attempt any modifications on the (read-only) database.

@LeeHampton
Copy link
Contributor

Quick sanity check here @agronholm . My concern is that, since the postgresql.conf itself is not replicated, this could cause the replica to use a default postgresql.conf which could have a negative impact on performance. Is there something in the bitnami image setup of replication that causes this not to be a concern?

If it is a concern, I think we might have to consider fixing this at the timescaledb-tune level, for example by passing in a replica flag which prevents it from changing settings that would be incompatible with streaming replication.

@agronholm
Copy link
Contributor Author

Quick sanity check here @agronholm . My concern is that, since the postgresql.conf itself is not replicated, this could cause the replica to use a default postgresql.conf which could have a negative impact on performance. Is there something in the bitnami image setup of replication that causes this not to be a concern?

Both master and slaves would then use the default settings, requiring one to do the tuning after the installation (or beforehand). But to me this would be preferable since a non-tuned working installation is better than a tuned non-working installation.

If it is a concern, I think we might have to consider fixing this at the timescaledb-tune level, for example by passing in a replica flag which prevents it from changing settings that would be incompatible with streaming replication.

Sure. Do you know which settings interfere with the replication when they're not the same on both master and slaves? But even with this flag, the end result might be suboptimal since some of those settings need to be tuned to get the best performance.

@LeeHampton
Copy link
Contributor

@agronholm I don't have an immediate answer to the question of which settings interfere with replication when not the same. As a reference, can you provide me the exact error that having mismatched max_worker_processes gives you in this replication scenario?

@agronholm
Copy link
Contributor Author

It just occurred to me that if there was a way to disable replication until the init scripts were done, that could be a much better solution.

@LeeHampton
Copy link
Contributor

@agronholm Agreed, running tune on both nodes then setting up replication would probably solve this.

Unfortunately, bitnami does not seem to offer this level of customization with their docker image. Their entrypoint script handles replication through a function called postgresql_initialize. It executes custom scripts like the ones modified in this PR through a function called postgresql_custom_init_scripts. The order of execution of those 2 functions appears to be completely static though, with initialize always getting called before custom_init_scripts. So short of opening a PR against bitnami's repo to make this customizable, we might not have a good option for the replicate-after-tuning approach.

I still think we might be able to fix this by modifying timescaledb-tune to increase the max_worker_processes. If you could provide the exact steps and error you're getting in this PR I might be able to help with looking into a workaround.

Worst case scenario if there's no easy fix after digging a bit deeper, I think we should go ahead with merging this, because you're right, having an out of tune replica is usually better than having one that doesn't work at all.

@LeeHampton
Copy link
Contributor

Just pinging here @agronholm to see if you had any further thoughts on this

@agronholm
Copy link
Contributor Author

I've been busy with other matters. One thought that did occur to me is that even if replication could be disabled until the init scripts are done, the slave could still fail to start if it was deployed on a heterogenous cluster. This probably doesn't happen often but it's a consideration.

As far as the upstream image not supporting such level of customization, I don't think that is much of a concern since any necessary modifications could be added to TimescaleDB's own Dockerfile. A bigger concern is what to do instead: start the slave node in master mode while the init scripts run, or don't start it at all? I see potential problems in both approaches.

@svenklemm svenklemm removed their request for review July 30, 2019 06:06
@Rho-Oof
Copy link

Rho-Oof commented Sep 11, 2019

when running with replication on it's very likely that the runner can pre-determine the max_worker_processes and mount a postgresql.conf with that setting. With that in mind, I would prefer if the auto-tuning init script simply did not attempt to set max_worker_processes when replication was detected but did set the rest of the parameters. In particular the tuning around the amount of memory available has had major impacts for the project I'm on.

@agronholm
Copy link
Contributor Author

Yeah, that sounds like the proper solution. Also, would it be possible for the auto-tuning script to detect the memory limit imposed on the pod, rather than using the max host memory as it does now?

@LeeHampton LeeHampton removed their request for review September 19, 2019 18:06
@benoist
Copy link

benoist commented Oct 11, 2019

@agronholm have you managed to setup replication with a workaround for the time being?
I'm running against the same issue at the moment, where the replication wont start properly

@benoist
Copy link

benoist commented Oct 11, 2019

I added your changes for the 000_ file and the slave now starts correctly

@agronholm
Copy link
Contributor Author

I've come back to this issue since we now need replication.
The following docker-compose file works with the upstream Bitnami image:

version: "3.7"
services:
  master:
    image: bitnami/postgresql:11
    volumes:
      - pg-master-data:/var/lib/postgresql/data
    environment:
      POSTGRESQL_PASSWORD: secret
      POSTGRESQL_REPLICATION_MODE: master
      POSTGRESQL_REPLICATION_USER: my_repl_user
      POSTGRESQL_REPLICATION_PASSWORD: my_repl_password

  slave:
    image: bitnami/postgresql:11
    volumes:
      - pg-slave-data:/var/lib/postgresql/data
    environment:
      POSTGRESQL_PASSWORD: secret
      POSTGRESQL_REPLICATION_MODE: slave
      POSTGRESQL_MASTER_HOST: master
      POSTGRESQL_REPLICATION_USER: my_repl_user
      POSTGRESQL_REPLICATION_PASSWORD: my_repl_password

volumes:
  pg-master-data:
  pg-slave-data:

But with the TimescaleDB image, the slave fails to start after the initial synchronization:

version: "3.7"
services:
  master:
    image: timescale/timescaledb:1.5.1-pg11-bitnami
    volumes:
      - pg-master-data:/var/lib/postgresql/data
    environment:
      POSTGRESQL_PASSWORD: secret
      POSTGRESQL_REPLICATION_MODE: master
      POSTGRESQL_REPLICATION_USER: my_repl_user
      POSTGRESQL_REPLICATION_PASSWORD: my_repl_password

  slave:
    image: timescale/timescaledb:1.5.1-pg11-bitnami
    volumes:
      - pg-slave-data:/var/lib/postgresql/data
    environment:
      POSTGRESQL_REPLICATION_MODE: slave
      POSTGRESQL_MASTER_HOST: master
      POSTGRESQL_REPLICATION_USER: my_repl_user
      POSTGRESQL_REPLICATION_PASSWORD: my_repl_password
      POSTGRESQL_PASSWORD: secret

volumes:
  pg-master-data:
  pg-slave-data:

Here is the log from the slave:

$ docker-compose up slave
Creating timescaletest_slave_1 ... done
Attaching to timescaletest_slave_1
slave_1   | postgresql 14:58:27.48 
slave_1   | postgresql 14:58:27.48 Welcome to the Bitnami postgresql container
slave_1   | postgresql 14:58:27.48 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql
slave_1   | postgresql 14:58:27.48 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql/issues
slave_1   | postgresql 14:58:27.48 Send us your feedback at containers@bitnami.com
slave_1   | postgresql 14:58:27.49 
slave_1   | postgresql 14:58:27.50 INFO  ==> ** Starting PostgreSQL setup **
slave_1   | postgresql 14:58:27.53 INFO  ==> Validating settings in POSTGRESQL_* env vars..
slave_1   | postgresql 14:58:27.53 INFO  ==> Initializing PostgreSQL database...
slave_1   | postgresql 14:58:27.54 INFO  ==> postgresql.conf file not detected. Generating it...
slave_1   | postgresql 14:58:27.55 INFO  ==> pg_hba.conf file not detected. Generating it...
slave_1   | postgresql 14:58:27.55 INFO  ==> Waiting for replication master to accept connections (60 timeout)...
slave_1   | master:5432 - accepting connections
slave_1   | postgresql 14:58:27.56 INFO  ==> Replicating the initial database
slave_1   | pg_basebackup: initiating base backup, waiting for checkpoint to complete
slave_1   | pg_basebackup: checkpoint completed
slave_1   | pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
slave_1   | pg_basebackup: starting background WAL receiver
slave_1   | pg_basebackup: created temporary replication slot "pg_basebackup_227"
slave_1   |     0/26094 kB (0%), 0/1 tablespace (...ami/postgresql/data/backup_label)
slave_1   | 26104/26104 kB (100%), 0/1 tablespace (...ostgresql/data/global/pg_control)
slave_1   | 26104/26104 kB (100%), 1/1 tablespace                                         
slave_1   | pg_basebackup: write-ahead log end point: 0/20000F8
slave_1   | pg_basebackup: waiting for background process to finish streaming ...
slave_1   | pg_basebackup: base backup completed
slave_1   | postgresql 14:58:34.09 INFO  ==> Configuring replication parameters
slave_1   | postgresql 14:58:34.11 INFO  ==> Configuring fsync
slave_1   | postgresql 14:58:34.12 INFO  ==> Setting up streaming replication slave...
slave_1   | postgresql 14:58:34.13 INFO  ==> Loading custom scripts...
slave_1   | postgresql 14:58:34.13 INFO  ==> Loading user's custom files from /docker-entrypoint-initdb.d ...
slave_1   | postgresql 14:58:34.13 INFO  ==> Starting PostgreSQL in background...
slave_1   | postgresql 14:58:34.34 INFO  ==> Stopping PostgreSQL...
timescaletest_slave_1 exited with code 1

@agronholm
Copy link
Contributor Author

The only way I could get this docker-compose example to work with TimescaleDB's Bitnami-based image was to do all of the following:

  1. Add NO_TS_TUNE=1 in the environment variables on the master
  2. Substitute 000_init_timescaledb.sh on the slave with the one that skips initialization if POSTGRESQL_REPLICATION_MODE is neither master or an empty string

Hopefully this is enough to easily reproduce the issue. At the very minimum, the 000_init_timescaledb.sh script MUST be patched to allow any kind of replication to work. The tuning issue can be worked around by setting the NO_TS_TUNE env variable if you feel uncomfortable with my patch.

@agronholm
Copy link
Contributor Author

@LeeHampton

My concern is that, since the postgresql.conf itself is not replicated, this could cause the replica to use a default postgresql.conf which could have a negative impact on performance. Is there something in the bitnami image setup of replication that causes this not to be a concern?

Is it not a bigger concern that the Bitnami image overwrites postgresql.conf on every restart of the container, effectively erasing all of the automatic tuning? The tuning script seems to have been written with the assumption that it's either launched every time the container is started, or that the configuration is not regenerated every time. We hit this problem in production and I was forced to use a bunch of ALTER SYSTEM commands to get the tuning parameters to stick.

@agronholm
Copy link
Contributor Author

I've reduced my PR to a more palatable form. This is the minimum change that is required to get replication to work at all. I will probably send another PR to fix the issue of lost tuning parameters.

@benoist
Copy link

benoist commented Jan 9, 2020

@agronholm Shouldn't it be POSTGRESQL_REPLICATION_MODE ?

@agronholm
Copy link
Contributor Author

Shouldn't it be POSTGRESQL_REPLICATION_MODE ?

No: https://github.com/helm/charts/blob/master/stable/postgresql/templates/statefulset.yaml#L150

@benoist
Copy link

benoist commented Jan 9, 2020

But the bitnami base docker specifies POSTGRESQL as prefix

https://github.com/bitnami/bitnami-docker-postgresql/blob/master/README.md#setting-up-a-streaming-replication

I had to use that to make it work in an image I'm building for myself.
I'm not using helm charts though, I'm using it with a docker compose, so maybe thats where it's different...

@agronholm
Copy link
Contributor Author

I could not get it to work with the POSTGRESQL prefix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants