How to get into a joinable pthread state? #17984

jenshannoschwalm · 2024-12-10T19:33:33Z

EDIT after latest commits, see commit message about it ...

Currently the job dispatcher might have pending jobs. These are not processed any more if we test for dt_control_running() in job.c
We continue dispatching even in DT_CONTROL_STATE_CLEANUP to get all things done.
But how can we know if all jobs are finalized? This might be a way, we increase pending_jobs jobs-to-be-done and decrease for finalized jobs.
In control_shutdown we have to wait until everything is done.

jenshannoschwalm · 2024-12-10T19:37:15Z

@TurboGit @ralfbrown @dterrahe @zisoft i really need help here.

Is this the way to go?
Any other ideas?
The up/down-counter doesn't work correctly, can you have a look?

Anyway, see #17961 #17978 for existing problems.

ralfbrown · 2024-12-10T20:13:18Z

Looks reasonable, but I'm not sure it will actually fix the problems, because the way things are currently setup the shutdown should be working (provided dt_control_shutdown isn't called multiple times) - pthread_join blocks until the given thread exits (if it hasn't yet), and each worker thread exits as it finishes its current job or sleep and finds that an exit was requested before we called pthread_join.

As for the counter not working properly, might you have missed some path on which a job could be terminated? (Or started, for that matter, if you're seeing negative counts.)

jenshannoschwalm · 2024-12-10T20:59:20Z

The point "it should" but the worker threads are definitely not in a correct state to be joined if jobs are pending. Try the reproducers.

If I wait for pending jobs are done, everything is safe and good.

jenshannoschwalm · 2024-12-10T21:18:39Z

Edit: we just were not aware of this as the thread joining part was not processed. Thus late errors as @zisoft demonstrated.

zisoft · 2024-12-11T16:49:04Z

With this PR (compile error fixed) I can 100% reproduce an assertion failure:

select 10 images in lightroom
click the rotate 90 degrees button
wait a few seconds to see the first thumbnails rotate
quit

Assertion failed: (!res), function dt_pthread_rwlock_destroy, file dtpthread.h, line 255.

Stacktrace coming from dt_cache_cleanup(), line 65:

darktable/src/common/cache.c

Lines 48 to 70 in 0acaaab

    
           void dt_cache_cleanup(dt_cache_t *cache) 
        
           { 
        
             g_hash_table_destroy(cache->hashtable); 
        
             for(GList *l = cache->lru; l; l = g_list_next(l)) 
        
             { 
        
               dt_cache_entry_t *entry = l->data; 
        
               if(cache->cleanup) 
        
               { 
        
                 assert(entry->data_size); 
        
                 ASAN_UNPOISON_MEMORY_REGION(entry->data, entry->data_size); 
        
                 cache->cleanup(cache->cleanup_data, entry); 
        
               } 
        
               else 
        
                 dt_free_align(entry->data); 
        
               dt_pthread_rwlock_destroy(&entry->lock); 
        
               g_slice_free1(sizeof(*entry), entry); 
        
             } 
        
             g_list_free(cache->lru); 
        
             dt_pthread_mutex_destroy(&cache->lock); 
        
           }

called from dt_mipmap_cache_cleanup() in dt_cleanup()

jenshannoschwalm · 2024-12-13T07:20:11Z

@zisoft this is exactly the issue I was referring to.

dterrahe · 2024-12-13T14:38:33Z

In dt_cleanup dt_image_cache_cleanup is called before dt_control_shutdown, so there might still be threads wanting to use the caches? If properly mutexed, that might/should not be a problem, but at least when I move the dt_image_cache_cleanup and dt_mipmap_cache_cleanup calls to after the if(init_gui) surrounding dt_control_shutdown, the above crash is not (as easily) reproducable here. Hopefully you got this far, or further, already.

jenshannoschwalm · 2024-12-14T07:18:30Z

This has been very hard work the last days. Some things (debug logs ...) are not necessary, @TurboGit let me know if you want a stripped down version.

As it is possible that working jobs are continuing in the background after user-request to close darktable we could possibly avoid to quit for UI usability as we could use the pending jobs counter ...

@dterrahe thanks for your hint!

TurboGit · 2024-12-14T08:21:08Z

@jenshannoschwalm : Thanks and congrats! Of course I'm a bit worried so close to the release. I'll cut the sources in about 48h now. I don't want to put too much pressure on you after working so hard, so my question what would be a stripped down version? How much work for you?

jenshannoschwalm · 2024-12-14T08:46:12Z

Stripping down? No problem for me ...

You could just ignore the pending_jobs counter commit for now (Use that in 5.2 for UI feedback ...)
I would remove added dt_print() stuff for less code to review although not really relevant
I would prefer to keep the 's' to 'control' renaming in the first commit for readability and making stuff static also seems good&safe

jenshannoschwalm · 2024-12-14T09:01:04Z

Just re-force-pushed with extra checks and debug logs removed as not required any more while being very verbose ...

TurboGit · 2024-12-14T11:39:14Z

Well the job counter commit is pretty simple and looks safe. I'll test this PR.

TurboGit · 2024-12-14T11:45:45Z

I have a lock when I quit early and can reproduce 100% of the time.

Start dt from the console
As soon as I see the splash screen I hit Ctrl+Q multiple time until the UI is displayed
Darktable UI quit but the process does not give back the prompt on the console, you need a Ctrl+C to hard kill it.

jenshannoschwalm · 2024-12-14T12:49:29Z

I have a lock when I quit early and can reproduce 100% of the time.

Fixed with last commit ...

TurboGit · 2024-12-14T15:35:43Z

Still reproduce...

jenshannoschwalm · 2024-12-14T16:36:59Z

Hmm, couldn't reproduce even once. Could be a race condition so use atomics now.

Also added a simple commit for dtpthreads adding textual output for pthread creation and joining.

Tried on wayland and X11, in/excluded gphoto, cups and opencl ... no issue.

zisoft · 2024-12-14T17:33:39Z

Some tests on macOS:

Splash screen disabled:
- quit after GUI came up fully: no issues
- early quit as soon as the main window comes up: no issues
Splash screen enabled
- quit immediately after splash screen comes up: some strange effects with the main window but dt quits without issues
GUI fully started, quit, no issues
GUI fully started, quit while running the rotate 10 images test: This gives a 100% reproducible hard crash with EXC_BAD_ACCESS while accessing gui_module->flags in line 1374 of develop/pixelpipe_hb.c:

darktable/src/develop/pixelpipe_hb.c

Lines 1348 to 1378 in 50f0e50

    
           static gboolean _dev_pixelpipe_process_rec(dt_dev_pixelpipe_t *pipe, 
        
                                                      dt_develop_t *dev, 
        
                                                      void **output, 
        
                                                      void **cl_mem_output, 
        
                                                      dt_iop_buffer_dsc_t **out_format, 
        
                                                      const dt_iop_roi_t *roi_out, 
        
                                                      GList *modules, 
        
                                                      GList *pieces, 
        
                                                      const int pos) 
        
           { 
        
             if(dt_atomic_get_int(&pipe->shutdown)) 
        
               return TRUE; 
        
             dt_iop_roi_t roi_in = *roi_out; 
        
             char module_name[256] = { 0 }; 
        
             void *input = NULL; 
        
             void *cl_mem_input = NULL; 
        
             *cl_mem_output = NULL; 
        
             dt_iop_module_t *module = NULL; 
        
             dt_dev_pixelpipe_iop_t *piece = NULL; 
        
             const dt_dev_pixelpipe_type_t old_pipetype = pipe->type; 
        
             const dt_iop_module_t *gui_module = dt_dev_gui_module(); 
        
             // if a module is active, check if this module allow a fast pipe run 
        
             if(gui_module 
        
                && gui_module->flags() & IOP_FLAGS_ALLOW_FAST_PIPE 
        
                && pipe->type & DT_DEV_PIXELPIPE_BASIC 
        
                && dt_dev_modulegroups_test_activated(darktable.develop)) 
        
               pipe->type |= DT_DEV_PIXELPIPE_FAST; 
        
             else

This is new and introduced by moving the cache_cleanups after if(init_gui)

Note: I'm always testing with --build-type Debug

TurboGit · 2024-12-14T17:51:37Z

No sure if this will help, but:

(gdb) info threads 
  Id   Target Id                                           Frame 
* 1    Thread 0x7fffeea70f00 (LWP 101058) "darktable"      0x00007ffff769891e in __futex_abstimed_wait_common64 (private=0, 
    futex_word=0x555555558cf0 <darktable+3184>, expected=0, op=393, 
    abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
  2    Thread 0x7fffeea5c6c0 (LWP 101061) "pool-spawner"   syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  3    Thread 0x7fffee25b6c0 (LWP 101062) "gmain"          0x00007ffff770d3f6 in __GI_ppoll (fds=0x5555555aadf0, nfds=2, timeout=<optimized out>, sigmask=0x0)
    at ../sysdeps/unix/sysv/linux/ppoll.c:42
  5    Thread 0x7fffed2596c0 (LWP 101064) "gdbus"          0x00007ffff770d3f6 in __GI_ppoll (fds=0x7fff08000c70, nfds=4, timeout=<optimized out>, sigmask=0x0)
    at ../sysdeps/unix/sysv/linux/ppoll.c:42
  6    Thread 0x7fffe7fff6c0 (LWP 101066) "darktable"      syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  7    Thread 0x7fffe7dfe6c0 (LWP 101067) "darktable"      syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  8    Thread 0x7fffe7bfd6c0 (LWP 101068) "darktable"      syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  9    Thread 0x7fffe79fc6c0 (LWP 101069) "darktable"      syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  10   Thread 0x7fffe77fb6c0 (LWP 101070) "darktable"      syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  11   Thread 0x7fffe75fa6c0 (LWP 101071) "darktable"      syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  12   Thread 0x7fffe73f96c0 (LWP 101072) "darktable"      syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  13   Thread 0x7fffe71f86c0 (LWP 101073) "darktable"      syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  14   Thread 0x7fffe6ff76c0 (LWP 101074) "darktable"      syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  15   Thread 0x7fffe6df66c0 (LWP 101075) "darktable"      syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  16   Thread 0x7fffe6bf56c0 (LWP 101076) "darktable"      syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  17   Thread 0x7fffe69f46c0 (LWP 101077) "darktable"      syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  18   Thread 0x7fffe67f36c0 (LWP 101078) "darktable"      syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  19   Thread 0x7fffe65f26c0 (LWP 101079) "darktable"      syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  20   Thread 0x7fffe63f16c0 (LWP 101080) "darktable"      syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  21   Thread 0x7fffe61f06c0 (LWP 101081) "darktable"      syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  22   Thread 0x7fffe5fef6c0 (LWP 101082) "darktable"      syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  35   Thread 0x7fffccff96c0 (LWP 101098) "cuda-EvtHandlr" 0x00007ffff770ce9f in __GI___poll (fds=0x7fff28000c20, nfds=11, timeout=100)
    at ../sysdeps/unix/sysv/linux/poll.c:29
  36   Thread 0x7fffc17ff6c0 (LWP 101099) "worker 1"       0x00007ffff769891e in __futex_abstimed_wait_common64 (private=0, futex_word=0x7fff64006130, 
    expected=0, op=393, abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
  37   Thread 0x7fffc0ffe6c0 (LWP 101100) "worker 1"       0x00007ffff769891e in __futex_abstimed_wait_common64 (private=<optimized out>, 
    futex_word=0x7fff64711c58, expected=0, op=393, abstime=0x0, cancel=true)
    at ./nptl/futex-internal.c:57
  38   Thread 0x7fffb61ff6c0 (LWP 101101) "worker 1"       0x00007ffff769891e in __futex_abstimed_wait_common64 (private=<optimized out>, 
    futex_word=0x7fff64711c58, expected=0, op=393, abstime=0x0, cancel=true)
    at ./nptl/futex-internal.c:57
  39   Thread 0x7fffb59fe6c0 (LWP 101102) "worker 1"       0x00007ffff769891e in __futex_abstimed_wait_common64 (private=<optimized out>, 
    futex_word=0x7fff64711c58, expected=0, op=393, abstime=0x0, cancel=true)
    at ./nptl/futex-internal.c:57
  40   Thread 0x7fffb51fd6c0 (LWP 101103) "worker 1"       0x00007ffff769891e in __futex_abstimed_wait_common64 (private=<optimized out>, 
    futex_word=0x7fff64711c58, expected=0, op=393, abstime=0x0, cancel=true)
    at ./nptl/futex-internal.c:57
  41   Thread 0x7fffb49fc6c0 (LWP 101104) "worker 1"       0x00007ffff769891e in __futex_abstimed_wait_common64 (private=<optimized out>, 
    futex_word=0x7fff64712968, expected=0, op=393, abstime=0x0, cancel=true)
    at ./nptl/futex-internal.c:57
  44   Thread 0x7fffa9fff6c0 (LWP 101118) "darktable"      0x00007ffff7e7ae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  45   Thread 0x7fffeda5a6c0 (LWP 101119) "darktable"      0x00007ffff7e7ae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  46   Thread 0x7fffa8ffd6c0 (LWP 101120) "darktable"      0x00007ffff7e7ae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  47   Thread 0x7fff9dfff6c0 (LWP 101121) "darktable"      0x00007ffff7e7ae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  48   Thread 0x7fff9d7fe6c0 (LWP 101122) "darktable"      0x00007ffff7e7ae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  49   Thread 0x7fff9cffd6c0 (LWP 101123) "darktable"      0x00007ffff7e7ae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  50   Thread 0x7fff93fff6c0 (LWP 101124) "darktable"      0x00007ffff7e7ae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  51   Thread 0x7fff937fe6c0 (LWP 101125) "darktable"      0x00007ffff7e7ae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  52   Thread 0x7fff92ffd6c0 (LWP 101126) "darktable"      0x00007ffff7e7ae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  53   Thread 0x7fff927fc6c0 (LWP 101127) "darktable"      0x00007ffff7e7ae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  54   Thread 0x7fff91ffb6c0 (LWP 101128) "darktable"      0x00007ffff7e7ae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  55   Thread 0x7fff917fa6c0 (LWP 101129) "darktable"      0x00007ffff7e7ae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  56   Thread 0x7fff90ff96c0 (LWP 101130) "darktable"      0x00007ffff7e7ae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  57   Thread 0x7fff87fff6c0 (LWP 101131) "darktable"      0x00007ffff7e7ae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  58   Thread 0x7fff877fe6c0 (LWP 101132) "darktable"      0x00007ffff7e7ae8e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
  59   Thread 0x7fff613ff6c0 (LWP 101133) "darktable"      0x00007ffff769891e in __futex_abstimed_wait_common64 (private=0, futex_word=0x7fff6e9edde0, 
    expected=0, op=393, abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
  60   Thread 0x7fff60bfe6c0 (LWP 101134) "darktable"      0x00007ffff769891e in __futex_abstimed_wait_common64 (private=0, futex_word=0x7fff6e9ede60, 
    expected=0, op=393, abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
  61   Thread 0x7fff57fff6c0 (LWP 101135) "darktable"      0x00007ffff769891e in __futex_abstimed_wait_common64 (private=0, futex_word=0x7fff6e9edee0, 
    expected=0, op=393, abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
  62   Thread 0x7fff577fe6c0 (LWP 101136) "darktable"      0x00007ffff769891e in __futex_abstimed_wait_common64 (private=0, futex_word=0x7fff6e9edf60, 
    expected=0, op=393, abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
  63   Thread 0x7fff56ffd6c0 (LWP 101137) "darktable"      0x00007ffff769891e in __futex_abstimed_wait_common64 (private=0, futex_word=0x7fff6e9edfe0, 
    expected=0, op=393, abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
  64   Thread 0x7fff567fc6c0 (LWP 101138) "darktable"      0x00007ffff769891e in __futex_abstimed_wait_common64 (private=0, futex_word=0x7fff6e9ee060, 
    expected=0, op=393, abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
  65   Thread 0x7fff55ffb6c0 (LWP 101139) "darktable"      0x00007ffff769891e in __futex_abstimed_wait_common64 (private=0, futex_word=0x7fff6e9ee0e0, 
    expected=0, op=393, abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
  66   Thread 0x7fff557fa6c0 (LWP 101140) "darktable"      0x00007ffff769891e in __futex_abstimed_wait_common64 (private=0, futex_word=0x7fff6e9ee160, 
    expected=0, op=393, abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
  67   Thread 0x7fff54ff96c0 (LWP 101141) "darktable"      0x00007ffff769891e in __futex_abstimed_wait_common64 (private=0, futex_word=0x7fff6e9ee1e0, 
    expected=0, op=393, abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
  68   Thread 0x7fff4bfff6c0 (LWP 101142) "darktable"      0x00007ffff769891e in __futex_abstimed_wait_common64 (private=0, futex_word=0x7fff6e9ee260, 
    expected=0, op=393, abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
  69   Thread 0x7fff4b7fe6c0 (LWP 101143) "darktable"      0x00007ffff769891e in __futex_abstimed_wait_common64 (private=0, futex_word=0x7fff6e9ee2e0, 
    expected=0, op=393, abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
  70   Thread 0x7fff4affd6c0 (LWP 101144) "darktable"      0x00007ffff769891e in __futex_abstimed_wait_common64 (private=0, futex_word=0x7fff6e9ee360, 
    expected=0, op=393, abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
  71   Thread 0x7fff4a7fc6c0 (LWP 101145) "darktable"      0x00007ffff769891e in __futex_abstimed_wait_common64 (private=0, futex_word=0x7fff6e9ee3e0, 
    expected=0, op=393, abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
  72   Thread 0x7fff49ffb6c0 (LWP 101146) "darktable"      0x00007ffff769891e in __futex_abstimed_wait_common64 (private=0, futex_word=0x7fff6e9ee460, 
    expected=0, op=393, abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
  73   Thread 0x7fff497fa6c0 (LWP 101147) "darktable"      0x00007ffff769891e in __futex_abstimed_wait_common64 (private=0, futex_word=0x7fff6e9ee4e0, 
    expected=0, op=393, abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
  78   Thread 0x7fff48ff96c0 (LWP 101161) "lua thread"     0x00007ffff769891e in __futex_abstimed_wait_common64 (private=0, 
    futex_word=0x555555558cf4 <darktable+3188>, expected=0, op=393, 
    abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
  79   Thread 0x7fff2dfff6c0 (LWP 101162) "pool-darktable" syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  80   Thread 0x7fff2d7fe6c0 (LWP 101163) "pool-darktable" syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38

Maybe due to Lua:

(gdb) bt
#0  0x00007ffff769891e in __futex_abstimed_wait_common64
    (private=0, futex_word=0x555555558cf0 <darktable+3184>, expected=0, op=393, abstime=0x0, cancel=true) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common
    (futex_word=futex_word@entry=0x555555558cf0 <darktable+3184>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
#2  0x00007ffff769899b in __GI___futex_abstimed_wait_cancelable64
    (futex_word=futex_word@entry=0x555555558cf0 <darktable+3184>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007ffff769b1c0 in __pthread_cond_wait_common
    (cond=0x555555558cc8 <darktable+3144>, mutex=0x555555558ca0 <darktable+3104>, clockid=0, abstime=0x0) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait
    (cond=cond@entry=0x555555558cc8 <darktable+3144>, mutex=mutex@entry=0x555555558ca0 <darktable+3104>) at ./nptl/pthread_cond_wait.c:618
#5  0x00007ffff7ae819f in dt_pthread_cond_wait
    (cond=<optimized out>, mutex=<optimized out>)
    at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/common/dtpthread.h:396
#6  dt_lua_lock_internal
    (function=function@entry=0x7ffff7cc6c30 <__FUNCTION__.0> "dt_lua_finalize", file=file@entry=0x7ffff7c95f50 "/home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/lua/init.c", line=line@entry=265, silent=silent@entry=0)
    at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/lua/lua.c:146
#7  0x00007ffff7ae72bb in dt_lua_finalize ()
    at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/lua/init.c:265
#8  0x00007ffff78f2893 in dt_cleanup ()
    at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/common/darktable.c:2055
#9  0x00005555555551c8 in main (argc=<optimized out>, argv=<optimized out>)
    at /home/obry/dev/builds/c-darktable/x86_64-linux-gnu-default/src/src/main.c:148
(gdb)

TurboGit · 2024-12-14T18:03:48Z

Confirmed, it is due to Lua. Disabling Lua I cannot reproduce the issue.

TurboGit · 2024-12-14T21:17:37Z

I have this code that help exiting while Lua is still not fully initialized, at least the process doesn't lock but I get a crash one out of 20 runs or so:

diff --git a/src/lua/init.c b/src/lua/init.c
index 54e7f79397..ec61ef0afc 100644
--- a/src/lua/init.c
+++ b/src/lua/init.c
@@ -248,9 +248,10 @@ int luaopen_darktable(lua_State *L)
 void dt_lua_finalize_early()
 {
   darktable.lua_state.ending = true;
-  gboolean running = (darktable.lua_state.loop
-                      && g_atomic_int_get(&_lua_fully_initialized)
-                      && g_main_loop_is_running(darktable.lua_state.loop));
+
+  const gboolean running = (darktable.lua_state.loop
+                            && g_atomic_int_get(&_lua_fully_initialized)
+                            && g_main_loop_is_running(darktable.lua_state.loop));
   if(running)
   {
     dt_lua_lock();
@@ -258,6 +259,8 @@ void dt_lua_finalize_early()
     dt_lua_unlock();
     g_main_context_wakeup(darktable.lua_state.context);
   }
+  else
+    darktable.lua_state.exec_lock = false;
 }
 
 void dt_lua_finalize()

The Lua init/finalize code is complex, we may need help from @wpferguson to find a proper way to exit Lua while still not 100% initialized.

jenshannoschwalm · 2024-12-14T21:35:05Z

I got a crash while images exporting after ctrl-q

jenshannoschwalm · 2024-12-15T07:55:13Z

@TurboGit it's sort of a mess i/we all were not aware of the implications of all the things we now do in the background and wanting a safe darktable exit ...

I just force-pushed:

in 8891c6c there is an additional safety check prohibiting to create a new job if control isnot running any more
b064f9e is just plain renaming maintenance done while analyse and understand ...
7c5b153 fixes image file I/O problems in background jobs.

TurboGit · 2024-12-15T08:21:53Z

@zisoft : Can you check again on your side?

I have tested a bit, no issue on my side except the Lua one I have reported. Yet, the Lua issue is reproducible on master too, so not a regression on this PR.

jenshannoschwalm · 2024-12-15T08:51:48Z

Just added a fix for not waiting on image import if control is not running any more.

jenshannoschwalm · 2024-12-15T08:53:59Z

@TurboGit One of the first PR's i will do for 5.2 is "Only accept Ctrl-q if no jobs are pending" :-)

zisoft · 2024-12-15T11:34:22Z

It crashes in the first loop

  /* wait for kick_on_workers_thread */
  dt_pthread_join(control->kick_on_workers_thread);

  for(int k = 0; k < control->num_threads-1; k++)
    dt_pthread_join(control->thread[k]);          // <-- breakpoint here

  for(int k = 0; k < DT_CTL_WORKER_RESERVED; k++)
    dt_pthread_join(control->thread_res[k]);

If I set a breakpoint and single-step through the loop, most of the steps gives no log output. Then it comes to a step with lots of log output and that is the step which crashes. The last lines of the log (started with -d all):


59.7676 [iop_validate_params] `flip' validated data for type "dt_image_orientation_t", field: orientation (5)
59.7676 [iop_validate_params] `flip' validated data for type "dt_iop_flip_params_t"
59.7676 committed                     [thumbnail]      flip                   1700  enabled  piece hash=9df07eab6adab321
59.7676 synch all modules done        [thumbnail]                                   defaults 0.1505s, history 0.0106s
59.7676 [dt_imageio_export_with_flags]  modules: rawprepare temperature highlights demosaic flip exposure colorin channelmixerrgb sigmoid colorout gamma
59.7676 modify roi OUT                [thumbnail]      rawprepare              100 (   0/   0) 5344x3516 scale=1.0000 --> (   0/   0) 5184x3456 scale=1.0000  ID=37
59.7676 modify roi OUT                [thumbnail]      flip                   1700 (   0/   0) 5184x3456 scale=1.0000 --> (   0/   0) 3456x5184 scale=1.0000  ID=37
59.7676 [export] creating pixelpipe took 0.168 secs (0.330 CPU)
59.7677 [dt_imageio_export] thumbnail imgid 37, 3456x5184 --> 73x110 (scale=0.0212, maxscale=1.0000). upscale=no, hq=no
59.7677 pipe starting             CL1 [thumbnail]                                  (   0/   0)   73x 110 scale=0.0212 --> (   0/   0)   73x 110 scale=0.0212  ID=37, appleamdradeonpro5500mcomputeengine
59.7677 [memory] before pixelpipe process
        max address space (vmpeak):         unknown
        cur address space (vmsize):     41845448 kB
        max used memory   (vmhwm ):         unknown
        cur used memory   (vmrss ):      2067612 kB

crash in line 1374 of pixelpipe.hb

darktable/src/develop/pixelpipe_hb.c

Lines 1348 to 1378 in 50f0e50

    
           static gboolean _dev_pixelpipe_process_rec(dt_dev_pixelpipe_t *pipe, 
        
                                                      dt_develop_t *dev, 
        
                                                      void **output, 
        
                                                      void **cl_mem_output, 
        
                                                      dt_iop_buffer_dsc_t **out_format, 
        
                                                      const dt_iop_roi_t *roi_out, 
        
                                                      GList *modules, 
        
                                                      GList *pieces, 
        
                                                      const int pos) 
        
           { 
        
             if(dt_atomic_get_int(&pipe->shutdown)) 
        
               return TRUE; 
        
             dt_iop_roi_t roi_in = *roi_out; 
        
             char module_name[256] = { 0 }; 
        
             void *input = NULL; 
        
             void *cl_mem_input = NULL; 
        
             *cl_mem_output = NULL; 
        
             dt_iop_module_t *module = NULL; 
        
             dt_dev_pixelpipe_iop_t *piece = NULL; 
        
             const dt_dev_pixelpipe_type_t old_pipetype = pipe->type; 
        
             const dt_iop_module_t *gui_module = dt_dev_gui_module(); 
        
             // if a module is active, check if this module allow a fast pipe run 
        
             if(gui_module 
        
                && gui_module->flags() & IOP_FLAGS_ALLOW_FAST_PIPE 
        
                && pipe->type & DT_DEV_PIXELPIPE_BASIC 
        
                && dt_dev_modulegroups_test_activated(darktable.develop)) 
        
               pipe->type |= DT_DEV_PIXELPIPE_FAST; 
        
             else

jenshannoschwalm · 2024-12-15T14:35:36Z

If I set a breakpoint and single-step through the loop, most of the steps gives no log output. Then it comes to a step with lots of log output and that is the step which crashes. The last lines of the log (started with -d all):

@zisoft As there is just one of the threads involved it's random in what position in the loop it "happens", but as there is no imageio system running any more the flip might fail here. So this has been added to the jobs where ctrl-q is understood as "skip-the-rest'

BTW you run ARM or intel? I am absolutely not sure if there are some peculiar things in the _DEBUG mode, even less on MACs.

Could you re-test?

zisoft · 2024-12-15T15:24:58Z

First try, same crash :(

I'm on an Intel Mac.

As there is just one of the threads involved it's random in what position in the loop it "happens"

ah, ok. That's my experience here, the crash position in the loop is indeed random.

EDIT: ah, sorry, I missed your last commit... now testing again

ralfbrown · 2024-12-15T15:36:33Z

A Lua hang on very early exit is probably the same underlying issue that I encountered while writing the splash screen code, and kept me from displaying any further progress updates after starting up Lua. I have since been able to trace that hang to the async processing of the Lua startup script. No luarc, no hang.... Once the script finishes, it is again OK to call dt_gui_process_events().

zisoft · 2024-12-15T16:12:18Z

ok, some things have changed.
I now get two different crashes randomly. And successful quits as well.

I send screenshots here with log at the bottom and stacktrace on the left.

The first type is again an assertion failure:

And the second type is a EXC_BAD_ACCESS in the dbus routines:

I also removed my luarc but that does not make a difference as these are no early quit attempts.

TurboGit · 2024-12-15T16:21:39Z

At this stage, let's move this to 5.2 too risky for 5.0 despite all the hard work done here.

jenshannoschwalm · 2024-12-16T05:52:01Z

Making this draft for now, as there will be PR's doing maintenance and safety checks to be merged first and this will have to be rebased afterwards.

As we now have a lot of things in background jobs related to images and mipmaps we must keep those systems available until the control system has been fully closed. NOTE: for some reason we can't have dt_imageio_cleanup() after dt_control_shutdown()

1. For code readability use 'control' instead of 's' 2. Simplify dt_control_running() 3. Removed not needed checks on errors 4. Two functions are strictly internal so no exposing via .h files and be static - _control_get_threadid() - _control_job_set_synchronous() 5. Two function were not used at all (so were untested) so the were removed - dt_control_job_set_state_callback() - dt_control_job_wait() 6. gphoto joining after cleanup test 7. Never create a control job if control is not in running state

While running the job dispatchers we keep track on jobs being inserted and executed so we at least know about the number of unprocessed jobs via dt_control_jobs_pending() Also show pending jobs in the log if closing dt.

Strictly prohibit reentrant use of dt_control_quit() and avoid a race condition via atomic instruction.

As we have control jobs on "file actions" we have to make sure that we have 'dt_imageio` available for: export, import, hdr creation, flip and exif refresh inside the job. While working the list of images we test for dt_control_running() and only process the job specific action if so. Also we have some **very** time consuming jobs that can be canceled in the UI, it seems to be much better to understand a ctrl-q or click on close-darktable button as a request-to-cancel so there will be no lengthy background actions after the window has closed. While we aborted file import via ctrl-q we don't want to wait ...

jenshannoschwalm marked this pull request as draft December 10, 2024 19:33

jenshannoschwalm added bugfix pull request fixing a bug priority: high core features are broken and not usable at all, software crashes labels Dec 10, 2024

zisoft mentioned this pull request Dec 11, 2024

jobs maintenance (part 1) #17986

Closed

jenshannoschwalm force-pushed the my_checks branch from a9d0f10 to 0bde20e Compare December 14, 2024 06:40

jenshannoschwalm marked this pull request as ready for review December 14, 2024 07:08

jenshannoschwalm added this to the 5.0 milestone Dec 14, 2024

jenshannoschwalm force-pushed the my_checks branch 2 times, most recently from 71c3b5d to 2dc1dfb Compare December 14, 2024 07:27

jenshannoschwalm force-pushed the my_checks branch from 2dc1dfb to fdaf125 Compare December 14, 2024 09:00

jenshannoschwalm force-pushed the my_checks branch from a72b9ef to 872ac2c Compare December 14, 2024 16:34

jenshannoschwalm force-pushed the my_checks branch from e1ced0b to 7c5b153 Compare December 15, 2024 07:40

jenshannoschwalm force-pushed the my_checks branch from 7c5b153 to 1a287f4 Compare December 15, 2024 08:50

jenshannoschwalm added the difficulty: hard big changes across different parts of the code base label Dec 15, 2024

jenshannoschwalm force-pushed the my_checks branch from 1a287f4 to e7f554a Compare December 15, 2024 14:05

TurboGit modified the milestones: 5.0, 5.2 Dec 15, 2024

jenshannoschwalm marked this pull request as draft December 16, 2024 05:52

jenshannoschwalm mentioned this pull request Dec 16, 2024

Threads, jobs and safety maintenance. #18023

Closed

jenshannoschwalm added 5 commits December 16, 2024 19:02

Implement pending jobs counter

96d33fb

While running the job dispatchers we keep track on jobs being inserted and executed so we at least know about the number of unprocessed jobs via dt_control_jobs_pending() Also show pending jobs in the log if closing dt.

Only accept the first dt_control_quit()

1277650

Strictly prohibit reentrant use of dt_control_quit() and avoid a race condition via atomic instruction.

jenshannoschwalm force-pushed the my_checks branch from 428576f to ecaa37b Compare December 16, 2024 18:45

jenshannoschwalm closed this Dec 20, 2024

jenshannoschwalm deleted the my_checks branch December 20, 2024 04:52

jenshannoschwalm mentioned this pull request Dec 24, 2024

Safe darktable shutdown #18061

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get into a joinable pthread state? #17984

How to get into a joinable pthread state? #17984

jenshannoschwalm commented Dec 10, 2024 •

edited

Loading

jenshannoschwalm commented Dec 10, 2024

ralfbrown commented Dec 10, 2024 •

edited

Loading

jenshannoschwalm commented Dec 10, 2024

jenshannoschwalm commented Dec 10, 2024 •

edited

Loading

zisoft commented Dec 11, 2024 •

edited

Loading

jenshannoschwalm commented Dec 13, 2024

dterrahe commented Dec 13, 2024

jenshannoschwalm commented Dec 14, 2024 •

edited

Loading

TurboGit commented Dec 14, 2024

jenshannoschwalm commented Dec 14, 2024

jenshannoschwalm commented Dec 14, 2024

TurboGit commented Dec 14, 2024

TurboGit commented Dec 14, 2024

jenshannoschwalm commented Dec 14, 2024

TurboGit commented Dec 14, 2024

jenshannoschwalm commented Dec 14, 2024 •

edited

Loading

zisoft commented Dec 14, 2024 •

edited

Loading

TurboGit commented Dec 14, 2024

TurboGit commented Dec 14, 2024

TurboGit commented Dec 14, 2024

jenshannoschwalm commented Dec 14, 2024

jenshannoschwalm commented Dec 15, 2024

TurboGit commented Dec 15, 2024

jenshannoschwalm commented Dec 15, 2024

jenshannoschwalm commented Dec 15, 2024

zisoft commented Dec 15, 2024 •

edited

Loading

jenshannoschwalm commented Dec 15, 2024

zisoft commented Dec 15, 2024 •

edited

Loading

ralfbrown commented Dec 15, 2024

zisoft commented Dec 15, 2024

TurboGit commented Dec 15, 2024

jenshannoschwalm commented Dec 16, 2024

How to get into a joinable pthread state? #17984

How to get into a joinable pthread state? #17984

Conversation

jenshannoschwalm commented Dec 10, 2024 • edited Loading

jenshannoschwalm commented Dec 10, 2024

ralfbrown commented Dec 10, 2024 • edited Loading

jenshannoschwalm commented Dec 10, 2024

jenshannoschwalm commented Dec 10, 2024 • edited Loading

zisoft commented Dec 11, 2024 • edited Loading

jenshannoschwalm commented Dec 13, 2024

dterrahe commented Dec 13, 2024

jenshannoschwalm commented Dec 14, 2024 • edited Loading

TurboGit commented Dec 14, 2024

jenshannoschwalm commented Dec 14, 2024

jenshannoschwalm commented Dec 14, 2024

TurboGit commented Dec 14, 2024

TurboGit commented Dec 14, 2024

jenshannoschwalm commented Dec 14, 2024

TurboGit commented Dec 14, 2024

jenshannoschwalm commented Dec 14, 2024 • edited Loading

zisoft commented Dec 14, 2024 • edited Loading

TurboGit commented Dec 14, 2024

TurboGit commented Dec 14, 2024

TurboGit commented Dec 14, 2024

jenshannoschwalm commented Dec 14, 2024

jenshannoschwalm commented Dec 15, 2024

TurboGit commented Dec 15, 2024

jenshannoschwalm commented Dec 15, 2024

jenshannoschwalm commented Dec 15, 2024

zisoft commented Dec 15, 2024 • edited Loading

jenshannoschwalm commented Dec 15, 2024

zisoft commented Dec 15, 2024 • edited Loading

ralfbrown commented Dec 15, 2024

zisoft commented Dec 15, 2024

TurboGit commented Dec 15, 2024

jenshannoschwalm commented Dec 16, 2024

jenshannoschwalm commented Dec 10, 2024 •

edited

Loading

ralfbrown commented Dec 10, 2024 •

edited

Loading

jenshannoschwalm commented Dec 10, 2024 •

edited

Loading

zisoft commented Dec 11, 2024 •

edited

Loading

jenshannoschwalm commented Dec 14, 2024 •

edited

Loading

jenshannoschwalm commented Dec 14, 2024 •

edited

Loading

zisoft commented Dec 14, 2024 •

edited

Loading

zisoft commented Dec 15, 2024 •

edited

Loading

zisoft commented Dec 15, 2024 •

edited

Loading