-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed replication slave failing to start #60
base: main
Are you sure you want to change the base?
Conversation
Quick sanity check here @agronholm . My concern is that, since the If it is a concern, I think we might have to consider fixing this at the |
Both master and slaves would then use the default settings, requiring one to do the tuning after the installation (or beforehand). But to me this would be preferable since a non-tuned working installation is better than a tuned non-working installation.
Sure. Do you know which settings interfere with the replication when they're not the same on both master and slaves? But even with this flag, the end result might be suboptimal since some of those settings need to be tuned to get the best performance. |
@agronholm I don't have an immediate answer to the question of which settings interfere with replication when not the same. As a reference, can you provide me the exact error that having mismatched |
It just occurred to me that if there was a way to disable replication until the init scripts were done, that could be a much better solution. |
@agronholm Agreed, running tune on both nodes then setting up replication would probably solve this. Unfortunately, bitnami does not seem to offer this level of customization with their docker image. Their I still think we might be able to fix this by modifying timescaledb-tune to increase the Worst case scenario if there's no easy fix after digging a bit deeper, I think we should go ahead with merging this, because you're right, having an out of tune replica is usually better than having one that doesn't work at all. |
Just pinging here @agronholm to see if you had any further thoughts on this |
I've been busy with other matters. One thought that did occur to me is that even if replication could be disabled until the init scripts are done, the slave could still fail to start if it was deployed on a heterogenous cluster. This probably doesn't happen often but it's a consideration. As far as the upstream image not supporting such level of customization, I don't think that is much of a concern since any necessary modifications could be added to TimescaleDB's own Dockerfile. A bigger concern is what to do instead: start the slave node in master mode while the init scripts run, or don't start it at all? I see potential problems in both approaches. |
when running with replication on it's very likely that the runner can pre-determine the |
Yeah, that sounds like the proper solution. Also, would it be possible for the auto-tuning script to detect the memory limit imposed on the pod, rather than using the max host memory as it does now? |
@agronholm have you managed to setup replication with a workaround for the time being? |
I added your changes for the 000_ file and the slave now starts correctly |
I've come back to this issue since we now need replication. version: "3.7"
services:
master:
image: bitnami/postgresql:11
volumes:
- pg-master-data:/var/lib/postgresql/data
environment:
POSTGRESQL_PASSWORD: secret
POSTGRESQL_REPLICATION_MODE: master
POSTGRESQL_REPLICATION_USER: my_repl_user
POSTGRESQL_REPLICATION_PASSWORD: my_repl_password
slave:
image: bitnami/postgresql:11
volumes:
- pg-slave-data:/var/lib/postgresql/data
environment:
POSTGRESQL_PASSWORD: secret
POSTGRESQL_REPLICATION_MODE: slave
POSTGRESQL_MASTER_HOST: master
POSTGRESQL_REPLICATION_USER: my_repl_user
POSTGRESQL_REPLICATION_PASSWORD: my_repl_password
volumes:
pg-master-data:
pg-slave-data: But with the TimescaleDB image, the slave fails to start after the initial synchronization: version: "3.7"
services:
master:
image: timescale/timescaledb:1.5.1-pg11-bitnami
volumes:
- pg-master-data:/var/lib/postgresql/data
environment:
POSTGRESQL_PASSWORD: secret
POSTGRESQL_REPLICATION_MODE: master
POSTGRESQL_REPLICATION_USER: my_repl_user
POSTGRESQL_REPLICATION_PASSWORD: my_repl_password
slave:
image: timescale/timescaledb:1.5.1-pg11-bitnami
volumes:
- pg-slave-data:/var/lib/postgresql/data
environment:
POSTGRESQL_REPLICATION_MODE: slave
POSTGRESQL_MASTER_HOST: master
POSTGRESQL_REPLICATION_USER: my_repl_user
POSTGRESQL_REPLICATION_PASSWORD: my_repl_password
POSTGRESQL_PASSWORD: secret
volumes:
pg-master-data:
pg-slave-data: Here is the log from the slave:
|
The only way I could get this docker-compose example to work with TimescaleDB's Bitnami-based image was to do all of the following:
Hopefully this is enough to easily reproduce the issue. At the very minimum, the |
Is it not a bigger concern that the Bitnami image overwrites postgresql.conf on every restart of the container, effectively erasing all of the automatic tuning? The tuning script seems to have been written with the assumption that it's either launched every time the container is started, or that the configuration is not regenerated every time. We hit this problem in production and I was forced to use a bunch of |
I've reduced my PR to a more palatable form. This is the minimum change that is required to get replication to work at all. I will probably send another PR to fix the issue of lost tuning parameters. |
@agronholm Shouldn't it be POSTGRESQL_REPLICATION_MODE ? |
No: https://github.com/helm/charts/blob/master/stable/postgresql/templates/statefulset.yaml#L150 |
But the bitnami base docker specifies POSTGRESQL as prefix I had to use that to make it work in an image I'm building for myself. |
I could not get it to work with the |
Currently, replication slaves fail to start up using the official TimescaleDB images for two reasons:
max_worker_processes
setting and bails out.This PR disables autotuning when replication is used, and makes the first init script exit at start when it's being run on the slave so as not to attempt any modifications on the (read-only) database.