Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current server name '' doesn't match configured server name. How to set it? #52

Closed
tamasgal opened this issue Sep 12, 2024 · 42 comments
Closed

Comments

@tamasgal
Copy link

tamasgal commented Sep 12, 2024

I managed to get the latest() Indico version up and running and also fixed issues like the missing BASE_URL etc.

The web-service is running but does not accept any requests from the load balancer. Flask is giving warnings that the Current server name '' doesn't match configured server name 'indico.whatever.com':

indico.whatever.com-web.1.q628m10ce5um@ecap-s026    | /opt/indico/.venv/lib/python3.9/site-packages/flask/app.py:1883: UserWarning: Current server name '' doesn't match configured server name 'indico.whatever.com'
indico.whatever.com-web.1.q628m10ce5um@ecap-s026    |   return self.url_map.bind_to_environ(
indico.whatever.com-web.1.q628m10ce5um@ecap-s026    | 2024-09-12 14:45:57,664  INFO     8881289aac694c73  indico.flask              Received request with invalid url root for http:///
indico.whatever.com-web.1.q628m10ce5um@ecap-s026    | /opt/indico/.venv/lib/python3.9/site-packages/flask/app.py:1883: UserWarning: Current server name 'indico-web:59999' doesn't match configured server name 'indico.whatever.com'

also logs like

... indico.flask              Received request with invalid url root for http:/// ...

How do I set that? I checked #28 but could not find any hint.
I also found https://talk.getindico.io/t/how-to-containerzie-my-application-using-docker/2989/2 which reports the same error message and points to #28 but I don't understand what's missing. The BASE_URL is set correctly.

@tomasr8
Copy link
Member

tomasr8 commented Sep 12, 2024

Hmm seems like there's a few things which are not right. First, the compose file should probably look like this:

  indico-web: &indico-web
-    image: getindico/indico:latest
+    build: worker

This ensures that you install the latest version of Indico, not the latest one on Dockerhub which is a bit outdated now.
I am getting another permission error though:

indico-web-1          | Traceback (most recent call last):                                                                                                                                                                                                    
indico-web-1          |   File "/usr/local/lib/python3.12/logging/config.py", line 608, in configure                                                                                                                                                          
indico-web-1          |     handler = self.configure_handler(handlers[name])                                                                                                                                                                                  
indico-web-1          |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                  
indico-web-1          |   File "/usr/local/lib/python3.12/logging/config.py", line 876, in configure_handler                                                                                                                                                  
indico-web-1          |     result = factory(**kwargs)                                                                                                                                                                                                        
indico-web-1          |              ^^^^^^^^^^^^^^^^^                                                                                                                                                                                                        
indico-web-1          |   File "/usr/local/lib/python3.12/logging/__init__.py", line 1231, in __init__                                                                                                                                                        
indico-web-1          |     StreamHandler.__init__(self, self._open())                                                                                                                                                                                        
indico-web-1          |                                  ^^^^^^^^^^^^                                                                                                                                                                                         
indico-web-1          |   File "/usr/local/lib/python3.12/logging/__init__.py", line 1263, in _open                                                                                                                                                           
indico-web-1          |     return open_func(self.baseFilename, self.mode,
indico-web-1          |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
indico-web-1          | PermissionError: [Errno 13] Permission denied: '/opt/indico/log/celery.log'

I'll try to dig into that tomorrow

@tamasgal
Copy link
Author

Ah, so I need to use my own build. I see. I am using Docker Swarm so I also need to propagate that image to all the nodes.

@tamasgal
Copy link
Author

tamasgal commented Sep 12, 2024

Just for the record, here is my Docker Swarm stack file:

version: "3.4"
services:
  indico-web: &indico-web
    image: getindico/indico:latest
    command: /opt/indico/run_indico.sh
    environment:
      - SERVICE_HOSTNAME=indico.whatever.com
      - SERVICE_PORT=80
      - SERVICE_PROTOCOL=http
      - PGHOST=indico-postgres
      - PGUSER=...
      - PGPASSWORD=...
      - PGDATABASE=...
      - PGPORT=5432
      - INDICO_DEFAULT_TIMEZONE=Europe/Berlin
      - INDICO_DEFAULT_LOCALE=en_GB
      - USE_EXTERNAL_DB=y
      - C_FORCE_ROOT=true
      - INDICO_AUTH_PROVIDERS={}
      - INDICO_IDENTITY_PROVIDERS={}
      - INDICO_LOCAL_IDENTITIES=yes
    volumes:
      - 'indicocustom-vol2:/opt/indico/custom'
      - 'indicostatic-vol2:/opt/indico/static'
      - 'indicolog-vol2:/opt/indico/log'
    networks:
      - backend
    configs:
      - source: ecap-indico-conf
        target: /opt/indico/etc/indico.conf
    tmpfs:
      - /opt/indico/tmp

  indico-celery:
    <<: *indico-web
    command: /opt/indico/run_celery.sh
    ports: []
    volumes: []
    networks:
      - backend

  indico-redis:
    image: redis
    networks:
      - backend

  indico-postgres:
    image: postgres:15-bookworm
    environment:
      - POSTGRES_USER=...
      - POSTGRES_PASSWORD=...
      - POSTGRES_DATABASE=...
    volumes:
      - '/raid/ecap-indico/db:/var/lib/postgresql/data'
    networks:
      - backend
    deploy:
      replicas: 1
      placement:
        constraints: [node.labels.db == true]

  indico-nginx:
    image: nginx:latest
    networks:
      - backend
      - ecap-indico
    environment:
      - SERVICE_HOSTNAME=indico.whatever.com
      - SERVICE_PROTOCOL=http
    configs:
      - source: ecap-indico-nginx-conf
        target: /etc/nginx/conf.d/default.conf
    volumes:
      - 'indicocustom-vol2:/opt/indico/custom'
      - 'indicostatic-vol2:/opt/indico/static'

volumes:
  indicocustom-vol2:
   driver: local
   driver_opts:
     type: "nfs4"
     o: addr=...
  indicoarchive-vol2:
   driver: local
   driver_opts:
     type: "nfs4"
     o: addr=...
  indicolog-vol2:
   driver: local
   driver_opts:
     type: "nfs4"
     o: addr=...
  indicostatic-vol2:
   driver: local
   driver_opts:
     type: "nfs4"
     o: addr=...

configs:
  ecap-indico-conf:
    external: true
  ecap-indico-nginx-conf:
    external: true
  ecap-indico-db-conf:
    external: true

networks:
  ecap-indico:
    external: true
  backend:

and the indico.conf which is stored as Docker Config under ecap-indico-conf:

# Database
import os
SQLALCHEMY_DATABASE_URI = f'postgresql://{os.environ["PGUSER"]}:{os.environ["PGPASSWORD"]}@{os.environ["PGHOST"]}:5432/{os.environ["PGDATABASE"]}'
del os

SECRET_KEY = '\xfoo\xbar\xbaz'
BASE_URL = 'https://indico.whatever.com
USE_PROXY = True
REDIS_CACHE_URL='redis://indico-redis:6379/1'
CELERY_BROKER='redis://indico-redis:6379/0'
DEFAULT_TIMEZONE = 'Europe/Berlin'
DEFAULT_LOCALE = 'en_GB'
ENABLE_ROOMBOOKING = True

CACHE_DIR = '/opt/indico/cache'
TEMP_DIR = '/opt/indico/tmp'
LOG_DIR = '/opt/indico/log'
#ASSETS_DIR = '/opt/indico/assets'
#XELATEX_PATH = '/opt/texlive/bin/x86_64-linux/xelatex'

STORAGE_BACKENDS = {'default': 'fs:/opt/indico/archive'}
ATTACHMENT_STORAGE = 'default'

@tamasgal
Copy link
Author

tamasgal commented Sep 12, 2024

OK I built and distributed a worker from your current Dockerfile (went well and installed Indico 3.3.4!) and also adapted the environment variables and configuration file to the examples in your repository. It still complains about UserWarning: Current server name 'indico-web:59999' does not match configured server name 'indico.whatever.com' and in addition to that, I now get a Fontconfig error: No writable cache directories.

I am not sure if this is related to the CACHE_DIR = '/opt/indico/cache' setting in indico.conf since that folder is definitely writable fro the indico user:

indico@e5f8285f86c5:~$ ls -al /opt/indico/cache/
total 12
drwxr-x--- 2 indico indico 4096 Sep 12 19:47 .
drwxr-xr-x 1 indico indico 4096 Sep 12 19:58 ..

At this moment, I don't see any other errors from the worker. Here is the full log of the Docker container:

# docker service logs -f ecap-indico_indico-web
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    |  count
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | -------
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    |      0
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | (1 row)
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    |
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Starting Indico...
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | [uWSGI] getting INI configuration from /etc/uwsgi.ini
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | *** Starting uWSGI 2.0.26 (64bit) on [Thu Sep 12 19:58:43 2024] ***
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | compiled with version: 12.2.0 on 12 September 2024 19:48:02
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | os: Linux-5.4.0-121-generic #137-Ubuntu SMP Wed Jun 15 13:33:07 UTC 2022
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | nodename: e5f8285f86c5
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | machine: x86_64
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | clock source: unix
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | pcre jit disabled
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | detected number of CPU cores: 128
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | current working directory: /opt/indico
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | detected binary path: /opt/indico/.venv/bin/uwsgi
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | your memory page size is 4096 bytes
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    |  *** WARNING: you have enabled harakiri without post buffering. Slow upload could be rejected on post-unbuffered webservers ***
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | detected max file descriptor number: 1048576
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | lock engine: pthread robust mutexes
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | thunder lock: disabled (you can enable it with --thunder-lock)
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | uwsgi socket 0 bound to TCP address 0.0.0.0:59999 fd 3
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Python version: 3.12.6 (main, Sep  9 2024, 18:06:01) [GCC 12.2.0]
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | PEP 405 virtualenv detected: /opt/indico/.venv
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Set PythonHome to /opt/indico/.venv
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Python main interpreter initialized at 0x7f0fe4cc2d18
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | python threads support enabled
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | your server socket listen backlog is limited to 100 connections
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | your mercy for graceful operations on workers is 60 seconds
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | mapped 364600 bytes (356 KB) for 4 cores
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | *** Operational MODE: preforking ***
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | /opt/indico/.venv/lib/python3.12/site-packages/indico/web/flask/app.py:415: UserWarning: Logging config file not found; using defaults. Copy /opt/indico/.venv/lib/python3.12/site-packages/indico/logging.yaml.sample to /opt/indico/etc/logging.yaml to get rid of this warning.
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    |   Logger.init(app)
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | /opt/indico/.venv/lib/python3.12/site-packages/sentry_sdk/_compat.py:201: Warning: IMPORTANT: We detected the use of uWSGI in preforking mode without thread support. This might lead to crashing workers. Please run uWSGI with both "--enable-threads" and "--py-call-uwsgi-fork-hooks" for full support.
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    |   warn(
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Fontconfig error: No writable cache directories
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | WSGI app 0 (mountpoint='') ready in 6 seconds on interpreter 0x7f0fe4cc2d18 pid: 11 (default app)
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | spawned uWSGI master process (pid: 11)
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | spawned uWSGI worker 1 (pid: 37, cores: 1)
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Thu Sep 12 19:58:49 2024 - mem-collector thread started for worker 1
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | spawned 4 offload threads for uWSGI worker 1
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | spawned uWSGI worker 2 (pid: 43, cores: 1)
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Thu Sep 12 19:58:49 2024 - mem-collector thread started for worker 2
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | spawned 4 offload threads for uWSGI worker 2
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | spawned uWSGI worker 3 (pid: 49, cores: 1)
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Thu Sep 12 19:58:49 2024 - mem-collector thread started for worker 3
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | spawned 4 offload threads for uWSGI worker 3
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | spawned uWSGI worker 4 (pid: 55, cores: 1)
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | Thu Sep 12 19:58:49 2024 - mem-collector thread started for worker 4
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | /opt/indico/.venv/lib/python3.12/site-packages/flask/app.py:425: UserWarning: Current server name 'indico-web:59999' doesn't match configured server name 'indico.whatever.foo'
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    |   return self.url_map.bind_to_environ(
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | spawned 4 offload threads for uWSGI worker 4
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | /opt/indico/.venv/lib/python3.12/site-packages/flask/app.py:425: UserWarning: Current server name 'indico-web:59999' doesn't match configured server name 'indico.whatever.foo'
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    |   return self.url_map.bind_to_environ(
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | /opt/indico/.venv/lib/python3.12/site-packages/flask/app.py:425: UserWarning: Current server name 'indico-web:59999' doesn't match configured server name 'indico.whatever.foo'
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    |   return self.url_map.bind_to_environ(
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    | /opt/indico/.venv/lib/python3.12/site-packages/flask/app.py:425: UserWarning: Current server name 'indico-web:59999' doesn't match configured server name 'indico.whatever.foo'
ecap-indico_indico-web.1.la3sceo80keb@ecap-s024    |   return self.url_map.bind_to_environ(

While I am still a bit confused about the nginx Reverse Proxy configuration, when I launch lynx http://indico-web:59999 on the nginx container, I first get a 404 page, then a few seconds later another HTML page showing

This Indico instance can only be accessed via https://indico.whatever.foo

@tomasr8
Copy link
Member

tomasr8 commented Sep 13, 2024

I pushed some fixes to the PR, namely I swapped the image directive with build and added some missing config keys.

After setting a non-empty SECRET_KEY in indico-prod/indico.conf, I am able to get a working Indico instance accessible via nginx at localhost:8080 (after cleaning up old containers and volumes):

cd indico-prod
docker compose up

Could you check if it works for you as well now?

@tamasgal
Copy link
Author

OK I will try, thanks so far!

I can see that the only relevant change to my setup is in a0fdb9f

Just to make sure we are on the same page: I am not using the docker compose but a very similar config based on that, which is adapted to work with Docker Swarm. Actually the only differences are the volumes (which are NFS shares) and there is no build, so I am building the worker from the Docker recipe of your PR.

@tomasr8
Copy link
Member

tomasr8 commented Sep 13, 2024

Could you share your nginx.conf? Seems like it's not setting X-Forwarded-Host like we do in the example config.

@tamasgal
Copy link
Author

I am using the same as in the PR:

server {
  # localhost:8080 is the main entrypoint of this docker-compose setup
  listen 8080;
  listen [::]:8080;

  access_log /dev/stdout combined;
  error_log stderr info;

  root       /var/empty;

  sendfile on;

  # Serve static files
  location ~ ^/(images|fonts)(.*)/(.+?)(__v[0-9a-f]+)?\.([^.]+)$ {
    alias /opt/indico/static/$1$2/$3.$5;
  }

  location ~ ^/(css|dist|images|fonts)/(.*)$ {
    alias /opt/indico/static/$1/$2;
  }

  location / {
    # Pass request to the container running Indico
    proxy_pass http://indico-web:59999;
    # Set headers for Indico to receive the correct base URL
    proxy_set_header X-Forwarded-Host $http_host;
  }
}

with the following service:

  indico-nginx:
    #image: nginx:1.27
    #image: nginx:latest
    image: ghcr.io/nginxinc/nginx-unprivileged:stable-alpine
    networks:
      - backend
      - ecap-indico
    environment:
      - SERVICE_HOSTNAME=indico.whatever.com
      - SERVICE_PROTOCOL=http
    configs:
      - source: ecap-indico-nginx-conf
        target: /etc/nginx/conf.d/default.conf
    volumes:
      - 'indicocustom-vol2:/opt/indico/custom'
      - 'indicostatic-vol2:/opt/indico/static'

I am currently trying to get more logs out of this...

@tamasgal
Copy link
Author

tamasgal commented Sep 13, 2024

So the only thing I can get out of Indico is this (via telnet from my load balancer):

    <h1>Invalid URL</h1>
    <p>This Indico instance can only be accessed via <a href="https://indico.whatever.com">https://indico.whatever.com</a></p>

So in this case I directly telnet, which means that the URL is not set anyways.

I am using HAProxy and the configuration of the backend (with global settings for X-Forward options) is similar to all other service I have (which work fine, including X-Forward with the correct IP):

backend be_indico.whatever.com
    balance roundrobin
    dynamic-cookie-key ECAPINDICO
    cookie SRVID insert dynamic
    option httpchk HEAD /
    default-server check maxconn 20
    server-template ecap-indico- 1 ecap-indico_indico-nginx:8080 check resolvers docker init-addr libc,none

@tamasgal
Copy link
Author

tamasgal commented Sep 13, 2024

Alright, I have some news. I sorted out the issues with HAProxy (actually forgot to add the dnsrr deploy option in the Dockerfile, so HAProxy was struggling to find the right server).

Anyways, now I get a 502 Bad Gateway from nginx:

ecap-indico_indico-nginx.1.dv0at48j8joh@ecap-s027    | 2024/09/13 11:33:06 [error] 94#94: *147 upstream prematurely closed connection while reading response header from upstream, client: 10.0.17.185, server: , request: "GET /bootstrap HTTP/1.1", upstream: "http://10.1.59.2:59999/bootstrap", host: "indico.whatever.com"
ecap-indico_indico-nginx.1.dv0at48j8joh@ecap-s027    | 10.0.17.185 - admin [13/Sep/2024:11:33:06 +0000] "GET /bootstrap HTTP/1.1" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.3 Safari/605.1.15"

and this is what Indico reports. Essentially a missing manifest.json and there is some timeout from SMTP. The folder where manifest.json is expected is not there, it only has these files/folder:

indico@2c73b72ab7cb:~$ find /opt/indico/.venv/lib/python3.12/site-packages/indico/web/static/
/opt/indico/.venv/lib/python3.12/site-packages/indico/web/static/
/opt/indico/.venv/lib/python3.12/site-packages/indico/web/static/.no-headers

here are the logs:

--- Logging error ---
Traceback (most recent call last):
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/web/flask/util.py", line 80, in wrapper
    return obj().process()
           ^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/web/rh.py", line 307, in process
    res = self._do_process()
          ^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/web/rh.py", line 275, in _do_process
    rv = self._process()
         ^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/modules/bootstrap/controllers.py", line 40, in _process
    return render_template('bootstrap/bootstrap.html',
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask/templating.py", line 150, in render_template
    return _render(app, template, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask/templating.py", line 131, in _render
    rv = template.render(context)
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/jinja2/environment.py", line 1304, in render
    self.environment.handle_exception()
  File "/opt/indico/.venv/lib/python3.12/site-packages/jinja2/environment.py", line 939, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/modules/bootstrap/templates/bootstrap.html", line 3, in top-level template code
    {% from 'forms/_form.html' import form_header, form_row, form_rows, form_fieldset, form_footer %}
    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/web/templates/base.html", line 15, in top-level template code
    {{ webpack['exports.js'] }}
    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/web/flask/templating.py", line 316, in getitem
    rv = super().getitem(obj, argument)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/jinja2/environment.py", line 468, in getitem
    return obj[argument]
           ~~~^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/werkzeug/local.py", line 318, in __get__
    obj = instance._get_current_object()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/werkzeug/local.py", line 526, in _get_current_object
    return get_name(local())
                    ^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask_webpackext/proxies.py", line 22, in <lambda>
    lambda: current_app.extensions['flask-webpackext'].manifest)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask_webpackext/ext.py", line 87, in manifest
    return self.manifest_loader().load(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/core/webpack.py", line 25, in load
    key = (filepath, os.path.getmtime(filepath))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen genericpath>", line 67, in getmtime
FileNotFoundError: [Errno 2] No such file or directory: '/opt/indico/.venv/lib/python3.12/site-packages/indico/web/static/dist/manifest.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/smtplib.py", line 398, in getreply
    line = self.file.readline(_MAXLINE + 1)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/socket.py", line 720, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
TimeoutError: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/logging/handlers.py", line 1078, in emit
    smtp = smtplib.SMTP(self.mailhost, port, timeout=self.timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/smtplib.py", line 255, in __init__
    (code, msg) = self.connect(host, port)
                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/smtplib.py", line 343, in connect
    (code, msg) = self.getreply()
                  ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/smtplib.py", line 401, in getreply
    raise SMTPServerDisconnected("Connection unexpectedly closed: "
smtplib.SMTPServerDisconnected: Connection unexpectedly closed: timed out
Call stack:
  File "/opt/indico/.venv/lib/python3.12/site-packages/sentry_sdk/integrations/flask.py", line 85, in sentry_patched_wsgi_app
    return SentryWsgiMiddleware(lambda *a, **kw: old_app(self, *a, **kw))(
  File "/opt/indico/.venv/lib/python3.12/site-packages/sentry_sdk/integrations/wsgi.py", line 108, in __call__
    rv = self.app(
  File "/opt/indico/.venv/lib/python3.12/site-packages/sentry_sdk/integrations/flask.py", line 85, in <lambda>
    return SentryWsgiMiddleware(lambda *a, **kw: old_app(self, *a, **kw))(
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask/app.py", line 1498, in __call__
    return self.wsgi_app(environ, start_response)
  File "/opt/indico/.venv/lib/python3.12/site-packages/werkzeug/middleware/proxy_fix.py", line 183, in __call__
    return self.app(environ, start_response)
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask/app.py", line 1473, in wsgi_app
    response = self.full_dispatch_request()
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask/app.py", line 882, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask/app.py", line 772, in handle_user_exception
    return self.ensure_sync(handler)(e)  # type: ignore[no-any-return]
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/web/flask/errors.py", line 113, in handle_exception
    Logger.get('flask').exception(str(exc) or 'Uncaught Exception')
  File "/usr/local/lib/python3.12/logging/__init__.py", line 1574, in exception
    self.error(msg, *args, exc_info=exc_info, **kwargs)
  File "/usr/local/lib/python3.12/logging/__init__.py", line 1568, in error
    self._log(ERROR, msg, args, **kwargs)
  File "/usr/local/lib/python3.12/logging/__init__.py", line 1684, in _log
    self.handle(record)
  File "/usr/local/lib/python3.12/logging/__init__.py", line 1700, in handle
    self.callHandlers(record)
  File "/opt/indico/.venv/lib/python3.12/site-packages/sentry_sdk/integrations/logging.py", line 100, in sentry_patched_callhandlers
    return old_callhandlers(self, record)
Message: "[Errno 2] No such file or directory: '/opt/indico/.venv/lib/python3.12/site-packages/indico/web/static/dist/manifest.json'"
Arguments: ()
Traceback (most recent call last):
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/web/flask/util.py", line 80, in wrapper
    return obj().process()
           ^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/web/rh.py", line 307, in process
    res = self._do_process()
          ^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/web/rh.py", line 275, in _do_process
    rv = self._process()
         ^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/modules/bootstrap/controllers.py", line 40, in _process
    return render_template('bootstrap/bootstrap.html',
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask/templating.py", line 150, in render_template
    return _render(app, template, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask/templating.py", line 131, in _render
    rv = template.render(context)
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/jinja2/environment.py", line 1304, in render
    self.environment.handle_exception()
  File "/opt/indico/.venv/lib/python3.12/site-packages/jinja2/environment.py", line 939, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/modules/bootstrap/templates/bootstrap.html", line 3, in top-level template code
    {% from 'forms/_form.html' import form_header, form_row, form_rows, form_fieldset, form_footer %}
    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/web/templates/base.html", line 15, in top-level template code
    {{ webpack['exports.js'] }}
^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/web/flask/templating.py", line 316, in getitem
    rv = super().getitem(obj, argument)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/jinja2/environment.py", line 468, in getitem
    return obj[argument]
           ~~~^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/werkzeug/local.py", line 318, in __get__
    obj = instance._get_current_object()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/werkzeug/local.py", line 526, in _get_current_object
    return get_name(local())
                    ^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask_webpackext/proxies.py", line 22, in <lambda>
    lambda: current_app.extensions['flask-webpackext'].manifest)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask_webpackext/ext.py", line 87, in manifest
    return self.manifest_loader().load(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/core/webpack.py", line 25, in load
    key = (filepath, os.path.getmtime(filepath))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen genericpath>", line 67, in getmtime
FileNotFoundError: [Errno 2] No such file or directory: '/opt/indico/.venv/lib/python3.12/site-packages/indico/web/static/dist/manifest.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/indico/.venv/lib/python3.12/site-packages/sentry_sdk/integrations/flask.py", line 85, in sentry_patched_wsgi_app
    return SentryWsgiMiddleware(lambda *a, **kw: old_app(self, *a, **kw))(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/sentry_sdk/integrations/wsgi.py", line 115, in __call__
    reraise(*_capture_exception(hub))
  File "/opt/indico/.venv/lib/python3.12/site-packages/sentry_sdk/_compat.py", line 127, in reraise
    raise value
  File "/opt/indico/.venv/lib/python3.12/site-packages/sentry_sdk/integrations/wsgi.py", line 108, in __call__
    rv = self.app(
         ^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/sentry_sdk/integrations/flask.py", line 85, in <lambda>
    return SentryWsgiMiddleware(lambda *a, **kw: old_app(self, *a, **kw))(
                                                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask/app.py", line 1498, in __call__
    return self.wsgi_app(environ, start_response)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/werkzeug/middleware/proxy_fix.py", line 183, in __call__
    return self.app(environ, start_response)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask/app.py", line 1476, in wsgi_app
    response = self.handle_exception(e)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask/app.py", line 1473, in wsgi_app
    response = self.full_dispatch_request()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask/app.py", line 882, in full_dispatch_request
    rv = self.handle_user_exception(e)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask/app.py", line 772, in handle_user_exception
    return self.ensure_sync(handler)(e)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/web/flask/errors.py", line 122, in handle_exception
    return render_error(exc, _('Something went wrong'), message, 500)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/web/errors.py", line 36, in render_error
    return WPError(title, message).display(), code
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/web/views.py", line 271, in display
    custom_js = list(current_app.manifest['__custom.js'])
                     ^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/web/flask/wrappers.py", line 163, in manifest
    return current_webpack.manifest
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/flask_webpackext/ext.py", line 87, in manifest
    return self.manifest_loader().load(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/indico/.venv/lib/python3.12/site-packages/indico/core/webpack.py", line 25, in load
    key = (filepath, os.path.getmtime(filepath))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen genericpath>", line 67, in getmtime
FileNotFoundError: [Errno 2] No such file or directory: '/opt/indico/.venv/lib/python3.12/site-packages/indico/web/static/dist/manifest.json'

@tamasgal
Copy link
Author

Btw. the celery process gets killed every 2-3 minutes. Not sure if that helps ;) I am now trying to figure out about the missing manifest.json

 Fontconfig error: No writable cache directories
 Fontconfig error: No writable cache directories
 Fontconfig error: No writable cache directories
 Fontconfig error: No writable cache directories
 Fontconfig error: No writable cache directories
 Fontconfig error: No writable cache directories

  -------------- celery@092f90843f8c v5.4.0 (opalescent)
 --- ***** -----
 -- ******* ---- Linux-5.4.0-125-generic-x86_64-with-glibc2.36 2024-09-13 11:42:10
 - *** --- * ---
 - ** ---------- [config]
 - ** ---------- .> app:         indico:0x7f8c530933e0
 - ** ---------- .> transport:   redis://indico-redis:6379/0
 - ** ---------- .> results:     redis://indico-redis:6379/0
 - *** --- * --- .> concurrency: 128 (prefork)
 -- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
 --- ***** -----
  -------------- [queues]
                 .> celery           exchange=celery(direct) key=celery


 /opt/indico/run_celery.sh: line 16:     8 Killed                  indico celery ${1:-worker}

@tomasr8
Copy link
Member

tomasr8 commented Sep 13, 2024

I am using HAProxy and the configuration of the backend (with global settings for X-Forward options) is similar to all other service I have (which work fine, including X-Forward with the correct IP):

Some nginx logs would probaly be useful. What URL do you use to access Indico? It should be the same that you configured as BASE_URL. I am not able to reproduce this with the compose file. Even when I change the BASE_URL to another host, I can access Indico on that host with no issues.

and this is what Indico reports. Essentially a missing manifest.json and there is some timeout from SMTP:

That is pretty strange, the latest wheel which gets installed with the Dockerfile contains it.. Maybe try rebuilding the image?

From what you posted, nothing stands out as clearly wrong, but your setup is different to what we have here. I'd suggest testing it locally first. If you can come up with a compose file I can run that reproduces the issue I'd be happy to take a look, but like this it's difficult to help :/

@tamasgal
Copy link
Author

I sorted out all the nginx problems I guess, so currently it's really all about the missing manifest.json. I already rebuilt the image but the file is not there :/

@tamasgal
Copy link
Author

tamasgal commented Sep 13, 2024

I manage to access the site and fill out the very first adminstration form but I get an error and also the CSS is missing. I am pretty sure it's related to the volumes which are bind. I guess that mounting some of the external folders (which I want to keep persistent) overload some existing folders which have files in them. The static/ folder for example is an empty volume I mount to /opt/indico/static and clearly it should be populated with at least the manifest.json. That file also holds css file locations as I figured out.

Anyways, this is what i get now after I fill in the first admin credentials:

ecap-indico_indico-web.1.za5ttwiu5y6i@ecap-s021    | Message: '(psycopg2.errors.UndefinedColumn) column "accepted_terms_dt" of relation "users" does not exist\nLINE 1: ...em, is_admin, is_blocked, is_pending, is_deleted, accepted_t...\n                                                             ^\n\n[SQL: INSERT INTO users.users (first_name, last_name, title, affiliation, affiliation_id, phone, address, merged_into_id, is_system, is_admin, is_blocked, is_pending, is_deleted, accepted_terms_dt, signing_secret, picture, picture_metadata, picture_source, created_dt) VALUES (%(first_name)s, %(last_name)s, %(title)s, %(affiliation)s, %(affiliation_id)s, %(phone)s, %(address)s, %(merged_into_id)s, %(is_system)s, %(is_admin)s, %(is_blocked)s, %(is_pending)s, %(is_deleted)s, %(accepted_terms_dt)s, %(signing_secret)s, %(picture)s, %(picture_metadata)s, %(picture_source)s, %(created_dt)s) RETURNING users.users.id]\n[parameters: {\'first_name\': \'Tamas\', \'last_name\': \'Gal\', \'title\': <UserTitle.none: 0>, \'affiliation\': \'ECAP\', \'affiliation_id\': None, \'phone\': \'\', \'address\': \'\', \'merged_into_id\': None, \'is_system\': False, \'is_admin\': True, \'is_blocked\': False, \'is_pending\': False, \'is_deleted\': False, \'accepted_terms_dt\': None, \'signing_secret\': UUID(\'692506cb-2c59-46ea-b368-f18e0c8ca3f0\'), \'picture\': None, \'picture_metadata\': \'null\', \'picture_source\': <ProfilePictureSource.standard: 0>, \'created_dt\': datetime.datetime(2024, 9, 13, 12, 6, 9, 500640)}]\n(Background on this error at: https://sqlalche.me/e/14/f405)'
ecap-indico_indico-web.1.za5ttwiu5y6i@ecap-s021    | Arguments: ()
Screenshot 2024-09-13 at 14 06 34

@tamasgal
Copy link
Author

Sorry for the spam, but I cleared all the volumes and now I managed to create the adminstration account:

Indico is now ready and you are logged as Tamas Gal with administrative rights!
Don't forget to tweak [Indico's settings](https://indico.test.ecap.work/admin/) and update your [profile](https://indico.test.ecap.work/user/dashboard/).

The CSS is messed up however ;)

@tamasgal
Copy link
Author

tamasgal commented Sep 13, 2024

So now the difference boils down to the mapped volumes:

    volumes:
      - 'archive:/opt/indico/archive' # file storage
      - 'customization:/opt/indico/custom'
      - 'static-files:/opt/indico/static'
      - 'indico-logs:/opt/indico/log' # logs

The problem is that during the build of the worker, files are written to /opt/indico/static. In Docker Swarm, if you map a volume to the service, the folder will be "replaced", which means that /opt/indico/static in my case is simply an empty folder.

This means that the static folder should not be mapped in Docker Swarm.

The question is: is that folder manipulated during the livetime of Indico, so that it has to be persistent? If yes, we need to manually copy files to the persistent volume after building the image.
I think that only archive/ and custom/ are OK to be empty.

@tomasr8
Copy link
Member

tomasr8 commented Sep 13, 2024

You could try manually running indico setup create-symlinks /opt/indico after the folder is replaced to re-populate it.
Static assets are served by nginx directly so check that the static file volume is mounted on the nginx container as well.

@ThiefMaster
Copy link
Member

For the DB error you need to run indico db upgrade inside the container.

@tamasgal
Copy link
Author

Alright thanks! I currently manually copied the static/* stuff from the original worker image to a shared volume. Now nginx and indico see the same files (on the very same drive).

So far I think that Indico is running nicely. The only problem I have is that the Celery is killed constantly. I should probably open a new issue about that ;)

@ThiefMaster
Copy link
Member

Anything in the docker logs on how/why celery is getting killed?

@tamasgal
Copy link
Author

/opt/indico/run_celery.sh: line 16: 8 Killed indico celery ${1:-worker}

That's all I have now 😕

@tamasgal
Copy link
Author

Ah, I forgot the beat in

command: /opt/indico/run_celery.sh beat

@tamasgal
Copy link
Author

I now have

2024-09-13 12:47:26,874  INFO     0000000000000000  -       celery.beat               beat: Starting...
2024-09-13 12:50:00,007  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task event_reminders (event_reminders)
2024-09-13 12:50:00,016  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task indico_vc_zoom.task.refresh_token (indico_vc_zoom.task.refresh_token)

but Indico complains that

The Celery task scheduler does not seem to be running. This means that email sending and periodic tasks such as event reminders do not work.

I don't see any reporting about the Celery service in the worker. How do I debug this?

@tamasgal
Copy link
Author

The Celery service still gets killed every few minutes. I don't know why. The Celery beat is running though.

| /opt/indico/run_celery.sh: line 16:     8 Killed                  indico celery ${1:-worker}

I have this in my Docker Swarm config:

  indico-celery: &indico-celery
    <<: *indico-web
    command: /opt/indico/run_celery.sh
    networks:
      - backend

  indico-celery-beat:
    <<: *indico-celery
    command: /opt/indico/run_celery.sh beat

But wondering if the hostname (service name) is picked up correctly since in the indico.conf I have:

REDIS_CACHE_URL='redis://indico-redis:6379/1'
CELERY_BROKER='redis://indico-redis:6379/0'

@ThiefMaster
Copy link
Member

You need to have both celery itself and celery beat running. The latter just triggers the scheduled tasks.

@tamasgal
Copy link
Author

Yep, I do, I just somehow forgot(deleted) the beat on the beat service's command 😉

@tamasgal
Copy link
Author

This is what I see in an endless loop, every few minutes in the indico celery (the celery beat service is running, like posted above):

/opt/indico/run_celery.sh: line 16:     8 Killed                  indico celery ${1:-worker}
 count
-------
     0
(1 row)

Starting Celery...
Fontconfig error: No writable cache directories
Fontconfig error: No writable cache directories
...
...
Fontconfig error: No writable cache directories
Fontconfig error: No writable cache directories

 -------------- celery@7010aab4d38b v5.4.0 (opalescent)
--- ***** -----
-- ******* ---- Linux-5.4.0-121-generic-x86_64-with-glibc2.36 2024-09-13 13:19:28
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app:         indico:0x7f379497cdd0
- ** ---------- .> transport:   redis://indico-redis:6379/0
- ** ---------- .> results:     redis://indico-redis:6379/0
- *** --- * --- .> concurrency: 128 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
 -------------- [queues]
                .> celery           exchange=celery(direct) key=celery

@ThiefMaster
Copy link
Member

ThiefMaster commented Sep 13, 2024

that looks good. the fontconfig error is safe to ignore for now and just noise and should not be related to the container getting killed

@tamasgal
Copy link
Author

But I still have no Celery in Indico:

Screenshot 2024-09-13 at 15 24 24

@tamasgal
Copy link
Author

I still did not manage to get Indico reach the Celery task scheduler. How is this supposed to be debugged? I can resolve indico-redis (I used Python to check that):

(.venv) indico@9a27ec55a5b2:~$ python
Python 3.12.6 (main, Sep  9 2024, 18:06:01) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.gethostbyname("indico-redis")
'10.1.69.5'

and below are the logs of Redis. I assume there should be some logging that Indico or Celery are connected etc. but not sure (I am not a Redis expert at all).

Any help is appreciated, I think this is the last step to have a Docker Swarm deployable Indico.

 1:C 14 Sep 2024 05:25:32.938 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:C 14 Sep 2024 05:25:32.938 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 14 Sep 2024 05:25:32.938 * Redis version=7.4.0, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 14 Sep 2024 05:25:32.938 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
1:M 14 Sep 2024 05:25:32.939 * monotonic clock: POSIX clock_gettime
1:M 14 Sep 2024 05:25:32.942 * Running mode=standalone, port=6379.
1:M 14 Sep 2024 05:25:32.943 * Server initialized
1:M 14 Sep 2024 05:25:32.943 * Ready to accept connections tcp
1:M 14 Sep 2024 06:25:33.041 * 1 changes in 3600 seconds. Saving...
1:M 14 Sep 2024 06:25:33.042 * Background saving started by pid 21
21:C 14 Sep 2024 06:25:33.047 * DB saved on disk
21:C 14 Sep 2024 06:25:33.048 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
1:M 14 Sep 2024 06:25:33.144 * Background saving terminated with success
1:M 14 Sep 2024 07:25:34.042 * 1 changes in 3600 seconds. Saving...
1:M 14 Sep 2024 07:25:34.043 * Background saving started by pid 22
22:C 14 Sep 2024 07:25:34.048 * DB saved on disk
22:C 14 Sep 2024 07:25:34.049 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
1:M 14 Sep 2024 07:25:34.145 * Background saving terminated with success
1:M 14 Sep 2024 08:25:35.083 * 1 changes in 3600 seconds. Saving...
1:M 14 Sep 2024 08:25:35.084 * Background saving started by pid 23
23:C 14 Sep 2024 08:25:35.090 * DB saved on disk
23:C 14 Sep 2024 08:25:35.091 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
1:M 14 Sep 2024 08:25:35.186 * Background saving terminated with success
1:M 14 Sep 2024 09:25:36.006 * 1 changes in 3600 seconds. Saving...
1:M 14 Sep 2024 09:25:36.007 * Background saving started by pid 24
24:C 14 Sep 2024 09:25:36.013 * DB saved on disk
24:C 14 Sep 2024 09:25:36.013 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
1:M 14 Sep 2024 09:25:36.109 * Background saving terminated with success
1:M 14 Sep 2024 10:25:37.074 * 1 changes in 3600 seconds. Saving...
1:M 14 Sep 2024 10:25:37.075 * Background saving started by pid 25
25:C 14 Sep 2024 10:25:37.082 * DB saved on disk
25:C 14 Sep 2024 10:25:37.082 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
1:M 14 Sep 2024 10:25:37.177 * Background saving terminated with success

@tomasr8
Copy link
Member

tomasr8 commented Sep 16, 2024

Do you have the celery beat running?

indico-celery-beat:
<<: *indico-celery
command: /opt/indico/run_celery.sh beat

@tamasgal
Copy link
Author

Yes, the celery-beat is running, but it complained at the start about psql: error: could not translate host name "indico-postgres" to address: Name or service not known". After that, ti started the beat (and is running for 3 days now)

I attached to the running container and checked if I can resolve indico-postgres and it works fine:

# ping indico-postgres
PING indico-postgres (10.1.69.8) 56(84) bytes of data.
64 bytes from 10.1.69.8 (10.1.69.8): icmp_seq=1 ttl=64 time=0.087 ms
64 bytes from 10.1.69.8 (10.1.69.8): icmp_seq=2 ttl=64 time=0.076 ms
64 bytes from 10.1.69.8 (10.1.69.8): icmp_seq=3 ttl=64 time=0.076 ms

Here are the logs of the celery-beat (the last message blocks repeat until now):

psql: error: could not translate host name "indico-postgres" to address: Name or service not known
Waiting for DB to be ready...
 count
-------
     0
(1 row)

Starting Celery...
Fontconfig error: No writable cache directories
Fontconfig error: No writable cache directories
...
Fontconfig error: No writable cache directories
Fontconfig error: No writable cache directories
2024-09-13 13:18:05,644  INFO     0000000000000000  -       celery.beat               beat: Starting...
2024-09-13 13:20:00,007  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task indico_vc_zoom.task.refresh_token (indico_vc_zoom.task.refresh_token)
2024-09-13 13:20:00,019  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task event_reminders (event_reminders)
2024-09-13 13:25:00,000  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task event_reminders (event_reminders)
2024-09-13 13:25:00,006  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task indico_vc_zoom.task.refresh_token (indico_vc_zoom.task.refresh_token)
2024-09-13 13:30:00,000  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task survey_start_notifications (survey_start_notifications)
2024-09-13 13:30:00,007  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task heartbeat (heartbeat)
2024-09-13 13:30:00,011  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task indico_vc_zoom.task.refresh_token (indico_vc_zoom.task.refresh_token)
2024-09-13 13:30:00,015  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task event_reminders (event_reminders)
2024-09-13 13:35:00,000  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task indico_vc_zoom.task.refresh_token (indico_vc_zoom.task.refresh_token)
2024-09-13 13:35:00,005  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task event_reminders (event_reminders)
2024-09-13 13:40:00,000  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task event_reminders (event_reminders)
2024-09-13 13:40:00,006  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task indico_vc_zoom.task.refresh_token (indico_vc_zoom.task.refresh_token)
2024-09-13 13:45:00,000  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task indico_vc_zoom.task.refresh_token (indico_vc_zoom.task.refresh_token)
2024-09-13 13:45:00,007  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task event_reminders (event_reminders)
2024-09-13 13:50:00,000  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task event_reminders (event_reminders)
2024-09-13 13:50:00,006  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task indico_vc_zoom.task.refresh_token (indico_vc_zoom.task.refresh_token)
2024-09-13 13:55:00,000  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task indico_vc_zoom.task.refresh_token (indico_vc_zoom.task.refresh_token)
2024-09-13 13:55:00,005  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task event_reminders (event_reminders)
2024-09-13 14:00:00,000  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task event_reminders (event_reminders)
2024-09-13 14:00:00,006  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task heartbeat (heartbeat)
2024-09-13 14:00:00,010  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task indico_vc_zoom.task.refresh_token (indico_vc_zoom.task.refresh_token)
2024-09-13 14:00:00,015  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task survey_start_notifications (survey_start_notifications)
2024-09-13 14:05:00,000  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task indico_vc_zoom.task.refresh_token (indico_vc_zoom.task.refresh_token)
2024-09-13 14:05:00,006  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task event_reminders (event_reminders)
2024-09-13 14:10:00,000  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task event_reminders (event_reminders)
2024-09-13 14:10:00,006  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task indico_vc_zoom.task.refresh_token (indico_vc_zoom.task.refresh_token)
2024-09-13 14:15:00,000  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task indico_vc_zoom.task.refresh_token (indico_vc_zoom.task.refresh_token)
2024-09-13 14:15:00,006  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task event_reminders (event_reminders)
2024-09-13 14:20:00,000  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task event_reminders (event_reminders)
2024-09-13 14:20:00,006  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task indico_vc_zoom.task.refresh_token (indico_vc_zoom.task.refresh_token)
2024-09-13 14:25:00,000  INFO     0000000000000000  -       celery.beat               Scheduler: Sending due task indico_vc_zoom.task.refresh_token (indico_vc_zoom.task.refresh_token)
...
...

@tamasgal
Copy link
Author

tamasgal commented Sep 16, 2024

I am confused but I assume that after the psql error regarding the failed hostname resolve, it shows Waiting for DB to be ready... and if the DB is ready, it connected, at least I guess that the beat would not start otherwise?

So may I assume that the beat is working correctly?

Btw. I just saw a random cannot connect to indioc-redis:6379 when I registered a new user but after that I could not reproduce the error anymore. I was also able to create multiple new users. There are still no errors whatsoever in redis but this is what I found in the indico-web logs:

redis.exceptions.ConnectionError: Error while reading from indico-redis:6379 : (104, 'Connection reset by peer')

To me it's unclear why Indico cannot talk to indico-redis, although when attaching to the container, it works.

Here are more logs which indicate communication issues:

ecap-indico_indico-web.1.mlez1frei7wt@ecap-s027    |   File "/opt/indico/.venv/lib/python3.12/site-packages/redis/connection.py", line 520, in read_response
ecap-indico_indico-web.1.mlez1frei7wt@ecap-s027    |     raise ConnectionError(
ecap-indico_indico-web.1.mlez1frei7wt@ecap-s027    | redis.exceptions.ConnectionError: Error while reading from indico-redis:6379 : (104, 'Connection reset by peer')
ecap-indico_indico-web.1.mlez1frei7wt@ecap-s027    |
ecap-indico_indico-web.1.mlez1frei7wt@ecap-s027    | During handling of the above exception, another exception occurred:
ecap-indico_indico-web.1.mlez1frei7wt@ecap-s027    |
ecap-indico_indico-web.1.mlez1frei7wt@ecap-s027    | Traceback (most recent call last):
ecap-indico_indico-web.1.mlez1frei7wt@ecap-s027    |   File "/usr/local/lib/python3.12/smtplib.py", line 398, in getreply
ecap-indico_indico-web.1.mlez1frei7wt@ecap-s027    |     line = self.file.readline(_MAXLINE + 1)
ecap-indico_indico-web.1.mlez1frei7wt@ecap-s027    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ecap-indico_indico-web.1.mlez1frei7wt@ecap-s027    |   File "/usr/local/lib/python3.12/socket.py", line 720, in readinto
ecap-indico_indico-web.1.mlez1frei7wt@ecap-s027    |     return self._sock.recv_into(b)
ecap-indico_indico-web.1.mlez1frei7wt@ecap-s027    |            ^^^^^^^^^^^^^^^^^^^^^^^
ecap-indico_indico-web.1.mlez1frei7wt@ecap-s027    | TimeoutError: timed out

@tamasgal
Copy link
Author

tamasgal commented Sep 16, 2024

I connected to the Indico Web container and used redis-cli to connect to redis, and it works, so I can confirm that at least the communication from the command line works:

indico@2a004f1e324a:~$ redis-cli -h indico-redis
indico-redis:6379> set foo bar
OK
indico-redis:6379> get foo
"bar"

Update: I also tried the Python virtual environment inside the indico-web container and it works as well:

indico@2a004f1e324a:~$ . /opt/indico/.venv/bin/activate
(.venv) indico@2a004f1e324a:~$ python
Python 3.12.6 (main, Sep  9 2024, 18:06:01) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import redis
>>> r = redis.Redis(host='indico-redis', port=6379, decode_responses=True)
>>> r.get('foo')
'bar'
>>> r.keys()
['foo', 'celery', '_kombu.binding.celery']
>>> r.get("foo")
'bar'
>>> len(r.lrange("celery", 0, -1))
1682

As can be seen in the last command, the celery cache has entries.

@tomasr8
Copy link
Member

tomasr8 commented Sep 17, 2024

I am confused but I assume that after the psql error regarding the failed hostname resolve, it shows Waiting for DB to be ready... and if the DB is ready, it connected, at least I guess that the beat would not start otherwise?

Yeah that looks correct, it prints Waiting for DB to be ready.. in a loop until it manages to connect via psql.

Connection reset by peer is a strange error, maybe it's related to this? https://serverfault.com/a/954504

@tamasgal
Copy link
Author

I also found that but that should not be the problem. I am already running in dnsrr endpoint mode. Maybe that was just a rare hick-up (I have not seen it yet since then). I have no connection issues in the other (around 30) Docker Stacks I am running for years 😕

I really would like to know why the Celery is killed every few minutes:

/opt/indico/run_celery.sh: line 16:    10 Killed                  indico celery ${1:-worker}

but I don't see anything in the logs. I also don't know where to get anything out of Indico about the Celery connection issue. All I have is

The Celery task scheduler does not seem to be running. This means that email sending and periodic tasks such as event reminders do not work.

I will now dig into the source code of Indico to see when this message gets printed and try to trace it back.

@tamasgal
Copy link
Author

tamasgal commented Sep 17, 2024

I think I got it... I had 8GB reserved for the Celery service and just realised that it's running with 100% CPU and also 100% memory. I increased it to 16GB and now the CPU has dropped to around 5%, but memory is still full:

CONTAINER ID   NAME               CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O        PIDS
9062bee49c85   indico-celery  1.07%     16GiB / 16GiB         100.00%   82.5kB / 93.5kB   109MB / 0B       130

Ouf.... ok now I understand the "killed" message 🙈

It's still a bit weird that the memory usage is 100%, I guess it's allocating the full available memory?

Let me close this super long issue, I think we are good to go now ;)

Also let me know if you want to document the Docker Swarm setup.

...and of course many thanks for all the help!

@ThiefMaster
Copy link
Member

ThiefMaster commented Sep 17, 2024

and just realised that it's running with 100% CPU and also 100% memory

Interesting... the only case where I've seen this so far (at least for the CPU) was when running celery worker and celery beat in the same docker container (celery worker -B), which is caused by celery/celery#8306

@tamasgal
Copy link
Author

Ah OK, but that might be related to this dominating memory consumption behaviour? I mean, the 100% CPU and kill after 2min is very likely that 8GB was not enough to set "whatever up" 😉 I am just guessing, but I can imagine that such a process does not like other processes in the same container.

@tomasr8
Copy link
Member

tomasr8 commented Sep 17, 2024

Also let me know if you want to document the Docker Swarm setup.

If you want to write something up, I think that'd be great! What do you think @ThiefMaster ?

@ThiefMaster
Copy link
Member

Is DS widely used nowadays? I think it's nice to have something, but I wonder where to put it. I don't think it should be us maintaining this type of documentation.

What do you think about creating a forum post where you explain this? That way it's something that can be easily linked to, while at the same time it's not docs we have to maintain / keep up to date.

@tamasgal
Copy link
Author

Alright, I'll ping you guys when I put something up and you can link to if you like :)

I don't know about the usage of Docker Swarm nowadays, it's working for me really well and I deploy and maintain a lot of services for different big physics experiments at our institute. Including GitLab (+runners), XWiki, RocketChat, (now also) Indico 😉 Easy to maintain and very transparent setup.

Anyways, you'll here from me when I documented it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants