Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Log is full of "failed to start a background worker" #7602

Open
cheggerdev opened this issue Jan 19, 2025 · 5 comments
Open

[Bug]: Log is full of "failed to start a background worker" #7602

cheggerdev opened this issue Jan 19, 2025 · 5 comments
Labels
bgw The background worker subsystem, including the scheduler bug

Comments

@cheggerdev
Copy link

cheggerdev commented Jan 19, 2025

What type of bug is this?

Configuration

What subsystems and features are affected?

Background worker

What happened?

The timescaledb log is full of

zabbix-timescaledb-1 | 2025-01-19 09:55:28.045 UTC [37] WARNING: failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker
zabbix-timescaledb-1 | 2025-01-19 09:55:29.723 UTC [36] WARNING: failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker

Increasing the workers in config files has no effect in the sense of the launch failures do not disappear.

max_worker_processes = 64 (increased from 32)
timescaledb.max_background_workers = 48 (increased from 8 to 16, then to 32, then to 48)
max_parallel_workers = 4 (number of CPUs)

show timescaledb.telemetry_level; => basic

TimescaleDB version affected

docker-compose image tag latest-pg16

PostgreSQL version used

16

What operating system did you use?

Alpine Linux

What installation method did you use?

Docker

What platform did you run on?

Other

Relevant log output and stack trace

zabbix-timescaledb-1  | 2025-01-19 09:55:28.045 UTC [37] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker
zabbix-timescaledb-1  | 2025-01-19 09:55:29.723 UTC [36] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker

How can we reproduce the bug?

Not sure.
I am using timescaledb with zabbix 7 or zabbix 7 with timescaledb :-) .
I also have pgbackrest configured which means archive_mode is turned on.

I have StartPollers=200 in zabbix_server.conf
I have set max_connections=300 in postgresql.conf

Increasing the number of timescaledb.max_background_workers and of max_worker_processes show no effect to the log output in the sense of the launch failures do not disappear.
@cheggerdev cheggerdev added the bug label Jan 19, 2025
@mkindahl
Copy link
Contributor

@cheggerdev Thank you for the bug report. The error is generated when the scheduler cannot spawn a new job and RegisterDynamicBackgroundWorker returns false. This can happen for different reasons, but the only reason that does not print any other error message is when you've run out of background worker slots (which is controlled by max_worker_processes). The message alone is unfortunately not enough information to figure out what is wrong.

So, can you provide some more information? In particular:

  • Do you see any other events in the log that could be related to this? In particular, what did the same PID print before these messages?
  • Have you restarted the server after changing the parameters?

@mkindahl mkindahl added the bgw The background worker subsystem, including the scheduler label Jan 20, 2025
@cheggerdev
Copy link
Author

So, can you provide some more information? In particular:

* Do you see any other events in the log that could be related to this? In particular, what did the same PID print before these messages?

Log from a fresh (re-)start:

zabbix-timescaledb-1  | 
zabbix-timescaledb-1  | PostgreSQL Database directory appears to contain a database; Skipping initialization
zabbix-timescaledb-1  | 
zabbix-timescaledb-1  | 2025-01-20 11:37:12.517 UTC [1] LOG:  starting PostgreSQL 16.6 on x86_64-pc-linux-musl, compiled by gcc (Alpine 13.2.1_git20240309) 13.2.1 20240309, 64-bit
zabbix-timescaledb-1  | 2025-01-20 11:37:12.517 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
zabbix-timescaledb-1  | 2025-01-20 11:37:12.517 UTC [1] LOG:  listening on IPv6 address "::", port 5432
zabbix-timescaledb-1  | 2025-01-20 11:37:12.604 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
zabbix-timescaledb-1  | 2025-01-20 11:37:12.772 UTC [30] LOG:  database system was shut down at 2025-01-20 10:14:07 UTC
zabbix-timescaledb-1  | 2025-01-20 11:37:13.005 UTC [1] LOG:  database system is ready to accept connections
zabbix-timescaledb-1  | 2025-01-20 11:37:13.014 UTC [34] LOG:  TimescaleDB background worker launcher connected to shared catalogs
zabbix-timescaledb-1  | 2025-01-20 11:37:17.502 UTC [36] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker
zabbix-timescaledb-1  | 2025-01-20 11:37:17.737 UTC [37] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker
zabbix-timescaledb-1  | 2025-01-20 11:37:18.070 UTC [38] LOG:  skipping analyze of "bgw_job_stat" --- lock not available
zabbix-timescaledb-1  | 2025-01-20 11:37:18.415 UTC [36] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker
zabbix-timescaledb-1  | 2025-01-20 11:37:18.510 UTC [37] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker
zabbix-timescaledb-1  | 2025-01-20 11:37:19.941 UTC [36] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker
zabbix-timescaledb-1  | 2025-01-20 11:37:20.059 UTC [37] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker
zabbix-timescaledb-1  | 2025-01-20 11:37:21.464 UTC [36] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker
zabbix-timescaledb-1  | 2025-01-20 11:37:21.771 UTC [37] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker
zabbix-timescaledb-1  | 2025-01-20 11:37:22.881 UTC [36] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker
zabbix-timescaledb-1  | 2025-01-20 11:37:23.068 UTC [39] LOG:  skipping analyze of "bgw_job_stat" --- lock not available
zabbix-timescaledb-1  | 2025-01-20 11:37:23.461 UTC [37] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker
zabbix-timescaledb-1  | 2025-01-20 11:37:24.400 UTC [36] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker
zabbix-timescaledb-1  | 2025-01-20 11:37:24.976 UTC [37] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker
zabbix-timescaledb-1  | 2025-01-20 11:37:26.080 UTC [36] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker
zabbix-timescaledb-1  | 2025-01-20 11:37:26.612 UTC [37] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker
zabbix-timescaledb-1  | 2025-01-20 11:37:27.520 UTC [36] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker
* Have you restarted the server after changing the parameters?

Yes, I do.

@cheggerdev
Copy link
Author

Hi,

I got more log output with

ALTER SYSTEM SET timescaledb.bgw_log_level TO 'DEBUG5';
SELECT pg_reload_conf();

and the result:

zabbix-timescaledb-1  | 
zabbix-timescaledb-1  | PostgreSQL Database directory appears to contain a database; Skipping initialization
zabbix-timescaledb-1  | 
zabbix-timescaledb-1  | 2025-01-21 10:02:49.396 UTC [1] LOG:  starting PostgreSQL 16.6 on x86_64-pc-linux-musl, compiled by gcc (Alpine 13.2.1_git20240309) 13.2.1 20240309, 64-bit
zabbix-timescaledb-1  | 2025-01-21 10:02:49.396 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
zabbix-timescaledb-1  | 2025-01-21 10:02:49.396 UTC [1] LOG:  listening on IPv6 address "::", port 5432
zabbix-timescaledb-1  | 2025-01-21 10:02:49.911 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
zabbix-timescaledb-1  | 2025-01-21 10:02:49.998 UTC [29] LOG:  database system was shut down at 2025-01-21 10:01:45 UTC
zabbix-timescaledb-1  | 2025-01-21 10:02:50.070 UTC [1] LOG:  database system is ready to accept connections
zabbix-timescaledb-1  | 2025-01-21 10:02:50.074 UTC [33] LOG:  TimescaleDB background worker launcher connected to shared catalogs
zabbix-timescaledb-1  | 2025-01-21 10:02:50.213 UTC [36] DEBUG:  StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
zabbix-timescaledb-1  | 2025-01-21 10:02:50.213 UTC [35] DEBUG:  StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
zabbix-timescaledb-1  | 2025-01-21 10:02:50.217 UTC [36] DEBUG:  updating scheduled jobs list
zabbix-timescaledb-1  | 2025-01-21 10:02:50.217 UTC [36] DEBUG:  StartSubTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
zabbix-timescaledb-1  | 2025-01-21 10:02:50.217 UTC [36] DEBUG:  StartSubTransaction(2) name: next start on failure; blockState: SUBBEGIN; state: INPROGRESS, xid/subid/cid: 0/2/0
zabbix-timescaledb-1  | 2025-01-21 10:02:50.217 UTC [36] DEBUG:  CommitSubTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
zabbix-timescaledb-1  | 2025-01-21 10:02:50.217 UTC [36] DEBUG:  CommitSubTransaction(2) name: next start on failure; blockState: SUBINPROGRESS; state: INPROGRESS, xid/subid/cid: 0/2/0
zabbix-timescaledb-1  | 2025-01-21 10:02:50.217 UTC [36] DEBUG:  CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
zabbix-timescaledb-1  | 2025-01-21 10:02:50.217 UTC [36] DEBUG:  database scheduler for database 5 starting
zabbix-timescaledb-1  | 2025-01-21 10:02:50.217 UTC [36] DEBUG:  scheduler wakeup in database 5
zabbix-timescaledb-1  | 2025-01-21 10:02:50.217 UTC [36] DEBUG:  starting scheduled job 3
zabbix-timescaledb-1  | 2025-01-21 10:02:50.217 UTC [36] DEBUG:  StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
zabbix-timescaledb-1  | 2025-01-21 10:02:50.218 UTC [35] DEBUG:  updating scheduled jobs list
zabbix-timescaledb-1  | 2025-01-21 10:02:50.218 UTC [36] DEBUG:  CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 16799308/1/1
zabbix-timescaledb-1  | 2025-01-21 10:02:50.219 UTC [35] DEBUG:  StartSubTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
zabbix-timescaledb-1  | 2025-01-21 10:02:50.219 UTC [35] DEBUG:  StartSubTransaction(2) name: next start on failure; blockState: SUBBEGIN; state: INPROGRESS, xid/subid/cid: 0/2/0
zabbix-timescaledb-1  | 2025-01-21 10:02:50.219 UTC [35] DEBUG:  CommitSubTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
zabbix-timescaledb-1  | 2025-01-21 10:02:50.219 UTC [35] DEBUG:  CommitSubTransaction(2) name: next start on failure; blockState: SUBINPROGRESS; state: INPROGRESS, xid/subid/cid: 0/2/0
zabbix-timescaledb-1  | 2025-01-21 10:02:50.219 UTC [35] DEBUG:  StartSubTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
zabbix-timescaledb-1  | 2025-01-21 10:02:50.219 UTC [35] DEBUG:  StartSubTransaction(2) name: next start on failure; blockState: SUBBEGIN; state: INPROGRESS, xid/subid/cid: 0/3/0
zabbix-timescaledb-1  | 2025-01-21 10:02:50.219 UTC [35] DEBUG:  CommitSubTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
zabbix-timescaledb-1  | 2025-01-21 10:02:50.219 UTC [35] DEBUG:  CommitSubTransaction(2) name: next start on failure; blockState: SUBINPROGRESS; state: INPROGRESS, xid/subid/cid: 0/3/0
zabbix-timescaledb-1  | 2025-01-21 10:02:50.219 UTC [35] DEBUG:  CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 0/1/0
zabbix-timescaledb-1  | 2025-01-21 10:02:50.219 UTC [35] DEBUG:  database scheduler for database 16384 starting
zabbix-timescaledb-1  | 2025-01-21 10:02:50.219 UTC [35] DEBUG:  scheduler wakeup in database 16384
zabbix-timescaledb-1  | 2025-01-21 10:02:50.219 UTC [35] DEBUG:  starting scheduled job 3 in 3853 seconds
zabbix-timescaledb-1  | 2025-01-21 10:02:50.219 UTC [35] DEBUG:  starting scheduled job 1 in 4049 seconds
zabbix-timescaledb-1  | 2025-01-21 10:02:50.219 UTC [35] DEBUG:  scheduler wakeup in database 16384
zabbix-timescaledb-1  | 2025-01-21 10:02:50.219 UTC [35] DEBUG:  starting scheduled job 3 in 3853 seconds
zabbix-timescaledb-1  | 2025-01-21 10:02:50.219 UTC [35] DEBUG:  starting scheduled job 1 in 4049 seconds
zabbix-timescaledb-1  | 2025-01-21 10:02:50.352 UTC [36] DEBUG:  launching job 3 "Job History Log Retention Policy [3]"
zabbix-timescaledb-1  | 2025-01-21 10:02:50.353 UTC [36] NOTICE:  unable to register background worker
zabbix-timescaledb-1  | 2025-01-21 10:02:50.353 UTC [36] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker
zabbix-timescaledb-1  | 2025-01-21 10:02:50.353 UTC [36] DEBUG:  StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
zabbix-timescaledb-1  | 2025-01-21 10:02:50.355 UTC [36] DEBUG:  CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 16799309/1/3
zabbix-timescaledb-1  | 2025-01-21 10:02:50.775 UTC [36] DEBUG:  starting scheduled job 1 in 4049 seconds
zabbix-timescaledb-1  | 2025-01-21 10:02:51.775 UTC [36] DEBUG:  scheduler wakeup in database 5
zabbix-timescaledb-1  | 2025-01-21 10:02:51.775 UTC [36] DEBUG:  starting scheduled job 3
zabbix-timescaledb-1  | 2025-01-21 10:02:51.775 UTC [36] DEBUG:  StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
zabbix-timescaledb-1  | 2025-01-21 10:02:51.775 UTC [36] DEBUG:  CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 16799310/1/1
zabbix-timescaledb-1  | 2025-01-21 10:02:51.972 UTC [36] DEBUG:  launching job 3 "Job History Log Retention Policy [3]"
zabbix-timescaledb-1  | 2025-01-21 10:02:51.972 UTC [36] NOTICE:  unable to register background worker
zabbix-timescaledb-1  | 2025-01-21 10:02:51.972 UTC [36] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker
zabbix-timescaledb-1  | 2025-01-21 10:02:51.972 UTC [36] DEBUG:  StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
zabbix-timescaledb-1  | 2025-01-21 10:02:51.973 UTC [36] DEBUG:  CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 16799311/1/3
zabbix-timescaledb-1  | 2025-01-21 10:02:52.292 UTC [36] DEBUG:  starting scheduled job 1 in 4047 seconds
zabbix-timescaledb-1  | 2025-01-21 10:02:53.292 UTC [36] DEBUG:  scheduler wakeup in database 5
zabbix-timescaledb-1  | 2025-01-21 10:02:53.292 UTC [36] DEBUG:  starting scheduled job 3
zabbix-timescaledb-1  | 2025-01-21 10:02:53.292 UTC [36] DEBUG:  StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
zabbix-timescaledb-1  | 2025-01-21 10:02:53.293 UTC [36] DEBUG:  CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 16799312/1/1
zabbix-timescaledb-1  | 2025-01-21 10:02:53.377 UTC [36] DEBUG:  launching job 3 "Job History Log Retention Policy [3]"
zabbix-timescaledb-1  | 2025-01-21 10:02:53.377 UTC [36] NOTICE:  unable to register background worker
zabbix-timescaledb-1  | 2025-01-21 10:02:53.377 UTC [36] WARNING:  failed to launch job 3 "Job History Log Retention Policy [3]": failed to start a background worker

@mkindahl
Copy link
Contributor

This line is generated when the RegisterDynamicBackgroundWorker call fails, and since you have no additional message preceding it, the only reason is because the slots are exhausted.

zabbix-timescaledb-1  | 2025-01-21 10:02:50.353 UTC [36] NOTICE:  unable to register background worker

I am not sure there is a good way to check the number of slots or slot assignment through the SQL interface, but could you check pg_stat_activity and see how many processes you have there?

@cheggerdev
Copy link
Author

select count(*) from pg_stat_activity;

=> 42 most times
=> 43 sometimes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bgw The background worker subsystem, including the scheduler bug
Projects
None yet
Development

No branches or pull requests

2 participants