Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency bug when calling an SDK function with multiple thread #398

Open
josiah-tesfu opened this issue Oct 16, 2023 · 5 comments
Open
Assignees
Labels
bug Something isn't working I2C priority/P2 SPI

Comments

@josiah-tesfu
Copy link

josiah-tesfu commented Oct 16, 2023

When calling an SDK function with multiple threads which use a shared SDK instance, one of the following exceptions occurs:

 /home/pi/.local/lib/python3.9/site-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning: Exception ignored in: <function SPI.__del__ at 0x7fb30ab8b0>
  
  Traceback (most recent call last):
    File "/home/pi/.local/lib/python3.9/site-packages/periphery/spi.py", line 209, in close
      os.close(self._fd)
  OSError: [Errno 9] Bad file descriptor
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/home/pi/.local/lib/python3.9/site-packages/periphery/spi.py", line 78, in __del__
      self.close()
    File "/home/pi/.local/lib/python3.9/site-packages/periphery/spi.py", line 211, in close
      raise SPIError(e.errno, "Closing SPI device: " + e.strerror)
  periphery.spi.SPIError: [Errno 9] Closing SPI device: Bad file descriptor
 /home/pi/.local/lib/python3.9/site-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning: Exception ignored in: <function I2C.__del__ at 0x7fb30b2670>
  
  Traceback (most recent call last):
    File "/home/pi/.local/lib/python3.9/site-packages/periphery/i2c.py", line 162, in close
      os.close(self._fd)
  OSError: [Errno 9] Bad file descriptor
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/home/pi/.local/lib/python3.9/site-packages/periphery/i2c.py", line 61, in __del__
      self.close()
    File "/home/pi/.local/lib/python3.9/site-packages/periphery/i2c.py", line 164, in close
      raise I2CError(e.errno, "Closing I2C device: " + e.strerror)
  periphery.i2c.I2CError: [Errno 9] Closing I2C device: Bad file descriptor
  
    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))

This specific stack trace arises from calling set_rtd in the ADC Service with multiple threads using a shared SDK instance, but other functions called also cause the same exceptions to be raised. The bug seems to arise when multiple SDK instances use a shared hardware resource -- in this case I2C and SPI devices.

When testing for this exception, I found that when the threads each have their own instance of the SDK the tests pass. The tests fail if the threads share an SDK instance, with the rate of failure increasing with more threads. Additionally, the tests pass if the threads share an SDK instance but the SDK function call is protected with a lock.

Test code to reproduce the issue:

Shared SDK instance (subset of the tests fail):

import threading
import pytest
from edgepi.adc.edgepi_adc import EdgePiADC
from edgepi.adc.adc_constants import ADCNum

class PropagatingThread(threading.Thread):
    """Propagating thread to raise exceptions in calling function"""
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.exc = None

    def run(self):
        self.exc = None
        try:
            super().run()
        except Exception as e:
            self.exc = e

    def join(self, *args, **kwargs):
        super().join(*args, **kwargs)
        if self.exc:
            raise self.exc
        
shared_sdk = EdgePiADC()

def set_rtd_client():
    """Reset the RTD for ADC"""
    shared_sdk.set_rtd(set_rtd=False, adc_num=ADCNum.ADC_1)

@pytest.mark.parametrize("iteration", range(100))
def test_adc_concurrency(iteration):
    """Test for ADC concurrency bug"""

    threads = [PropagatingThread(target=set_rtd_client) for _ in range(5)]
    for thread in threads: thread.start()
    for thread in threads: thread.join()

Shared SDK instance with lock (all tests pass):

import threading
import pytest
from edgepi.adc.edgepi_adc import EdgePiADC
from edgepi.adc.adc_constants import ADCNum

class PropagatingThread(threading.Thread):
    """Propagating thread to raise exceptions in calling function"""
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.exc = None

    def run(self):
        self.exc = None
        try:
            super().run()
        except Exception as e:
            self.exc = e

    def join(self, *args, **kwargs):
        super().join(*args, **kwargs)
        if self.exc:
            raise self.exc
        
shared_sdk = EdgePiADC()
lock = threading.Lock()

def set_rtd_client():
    """Reset the RTD for ADC"""
    with lock:
        shared_sdk.set_rtd(set_rtd=False, adc_num=ADCNum.ADC_1)

@pytest.mark.parametrize("iteration", range(100))
def test_adc_concurrency(iteration):
    """Test for ADC concurrency bug"""

    threads = [PropagatingThread(target=set_rtd_client) for _ in range(30)]
    for thread in threads: thread.start()
    for thread in threads: thread.join()

Non-shared instances (all tests pass):

import threading
import pytest
from edgepi.adc.edgepi_adc import EdgePiADC
from edgepi.adc.adc_constants import ADCNum

class PropagatingThread(threading.Thread):
    """Propagating thread to raise exceptions in calling function"""
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.exc = None

    def run(self):
        self.exc = None
        try:
            super().run()
        except Exception as e:
            self.exc = e

    def join(self, *args, **kwargs):
        super().join(*args, **kwargs)
        if self.exc:
            raise self.exc
        

def set_rtd_client():
    """Reset the RTD for ADC"""
    adc = EdgePiADC()
    adc.set_rtd(set_rtd=False, adc_num=ADCNum.ADC_1)

@pytest.mark.parametrize("iteration", range(100))
def test_adc_concurrency(iteration):
    """Test for ADC concurrency bug"""

    threads = [PropagatingThread(target=set_rtd_client) for _ in range(2)]
    for thread in threads: thread.start()
    for thread in threads: thread.join()
@josiah-tesfu josiah-tesfu added bug Something isn't working SPI I2C labels Oct 16, 2023
@farzadpanahi
Copy link
Collaborator

@josiah-tesfu include your test code please for both shared sdk and protected.

@farzadpanahi
Copy link
Collaborator

@sj608 this is a reproducible issue now. @josiah-tesfu has test codes (which should be added to this ticket soon) that can produce the error consistently.

@farzadpanahi
Copy link
Collaborator

farzadpanahi commented Oct 16, 2023

Non-shared instances (all tests pass)

regarding this test, I just want to add that I think the reason that it is passing is not because the sdk is non-shared but because sdk init is slow the threads do not have a good chance to overlap

@farzadpanahi
Copy link
Collaborator

@sjpark608 to make all tests fail just increase number of threads from 5 to 30 here:

    threads = [PropagatingThread(target=set_rtd_client) for _ in range(5)]

@sjpark608
Copy link
Collaborator

sjpark608 commented Oct 17, 2023

This comment is to sum up our brief discussion about thread safety of SDK

  • The root cause of the issue here is from the thread safety of character devices of spi and i2c being limited to file level. What could be happening during the shared sdk test is that one thread may have already closed the device after a transaction while another thread is trying to start the transfer. This throws a bad file descriptor error which indicates accessing of a file that's closed.
  • We've discussed implementing the thread-safe mutex on the SDK level, I will be investigating the feasibility and possible overheads before implementation.

reference_1, reference_2, spidev.h source, reference_3, i2c-dev.h source

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working I2C priority/P2 SPI
Projects
None yet
Development

No branches or pull requests

3 participants