Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle TooManyExecutions when more than 10 executions are running #261

Open
iMicknl opened this issue Sep 14, 2020 · 27 comments
Open

Handle TooManyExecutions when more than 10 executions are running #261

iMicknl opened this issue Sep 14, 2020 · 27 comments
Labels
bug Something isn't working contributions welcome help wanted Extra attention is needed

Comments

@iMicknl
Copy link
Owner

iMicknl commented Sep 14, 2020

Describe the bug
Originally reported on Home Assistant Community

Not all executions will succeed, when more than 10 executions are running at the same time.

To Reproduce
Steps to reproduce the behavior:

  1. Create a scene or automation
  2. Add more than 10 long running actions in a sequence (like movement of covers)

Expected behavior
All executions running in a sequence, with a pause.

Environment (please complete the following information):

  • ha-tahoma version: master
  • Home Assistant version: -
  • Platform: -

Additional context

Protokolldetails ( ERROR )
Logger: homeassistant.components.automation.rolladen_am_wochenende_feiertag_morgens_offen
Source: custom_components/tahoma/tahoma_device.py:155
Integration: Automatisierung (documentation,issues)
First occurred: 12. September 2020, 8:00:00 (8occurrences)
Last logged: 13. September 2020, 8:37:41

Rolladen am Wochenende/Feiertag Morgens öffen: Error executing script. Unexpected error for scene at pos 1: Server busy, please try again later. (Too many executions)
While executing automation automation.rolladen_am_wochenende_feiertag_morgens_offen
Rolladen am Wochenende/Feiertag Morgens öffen: Error executing script. Unexpected error for scene at pos 1: Execution queue is full on gateway: #xxxx-xxxx-xxxx (soft limit: 10)
Traceback (most recent call last): File “/usr/src/homeassistant/homeassistant/helpers/script.py”, line 191, in _async_step await getattr( File “/usr/src/homeassistant/homeassistant/helpers/script.py”, line 400, in _async_scene_step await self._hass.services.async_call( File “/usr/src/homeassistant/homeassistant/core.py”, line 1308, in async_call task.result() File “/usr/src/homeassistant/homeassistant/core.py”, line 1343, in _execute_service await handler.func(service_call) File “/usr/src/homeassistant/homeassistant/helpers/entity_component.py”, line 208, in handle_service await self.hass.helpers.service.entity_service_call( File “/usr/src/homeassistant/homeassistant/helpers/service.py”, line 454, in entity_service_call future.result() # pop exception if have File “/usr/src/homeassistant/homeassistant/helpers/entity.py”, line 583, in async_request_call await coro File “/usr/src/homeassistant/homeassistant/helpers/service.py”, line 485, in _handle_entity_call await result File “/usr/src/homeassistant/homeassistant/components/homeassistant/scene.py”, line 311, in async_activate await async_reproduce_state( File “/usr/src/homeassistant/homeassistant/helpers/state.py”, line 106, in async_reproduce_state await asyncio.gather( File “/usr/src/homeassistant/homeassistant/helpers/state.py”, line 100, in worker await platform.async_reproduce_states( # type: ignore File “/usr/src/homeassistant/homeassistant/components/cover/reproduce_state.py”, line 125, in async_reproduce_states await asyncio.gather( File “/usr/src/homeassistant/homeassistant/components/cover/reproduce_state.py”, line 86, in _async_reproduce_state await hass.services.async_call( File “/usr/src/homeassistant/homeassistant/core.py”, line 1308, in async_call task.result() File “/usr/src/homeassistant/homeassistant/core.py”, line 1343, in _execute_service await handler.func(service_call) File “/usr/src/homeassistant/homeassistant/helpers/entity_component.py”, line 208, in handle_service await self.hass.helpers.service.entity_service_call( File “/usr/src/homeassistant/homeassistant/helpers/service.py”, line 454, in entity_service_call future.result() # pop exception if have File “/usr/src/homeassistant/homeassistant/helpers/entity.py”, line 583, in async_request_call await coro File “/usr/src/homeassistant/homeassistant/helpers/service.py”, line 485, in _handle_entity_call await result File “/config/custom_components/tahoma/cover.py”, line 173, in async_set_cover_position await self.async_execute_command(command, position) File “/config/custom_components/tahoma/tahoma_device.py”, line 155, in async_execute_command exec_id = await self.coordinator.client.execute_command( File “/usr/local/lib/python3.8/site-packages/backoff/_async.py”, line 133, in retry ret = await target(*args, **kwargs) File “/usr/local/lib/python3.8/site-packages/pyhoma/client.py”, line 185, in execute_command return await self.execute_commands(device_url, [command], label) File “/usr/local/lib/python3.8/site-packages/backoff/_async.py”, line 133, in retry ret = await target(*args, **kwargs) File “/usr/local/lib/python3.8/site-packages/pyhoma/client.py”, line 208, in execute_commands response = await self.__post(“exec/apply”, payload) File “/usr/local/lib/python3.8/site-packages/pyhoma/client.py”, line 240, in __post await self.check_response(response) File “/usr/local/lib/python3.8/site-packages/pyhoma/client.py”, line 278, in check_response raise Exception(message if message else result) Exception: Server busy, please try again later. (Too many executions)
@iMicknl iMicknl added the bug Something isn't working label Sep 14, 2020
@iMicknl
Copy link
Owner Author

iMicknl commented Sep 14, 2020

@tetienne, @vlebourl what do you think:

  1. We could tackle this in https://github.com/iMicknl/python-tahoma-api with backoff, where we add a debounce time of x seconds with a few retries max.
  2. We could add retry logic in this integration based on the number of executions in the list. If the current number is higher than 10, we could delay the call until the list of running executions is lower than 10.

Any other suggestions?

@iMicknl
Copy link
Owner Author

iMicknl commented Oct 5, 2020

I am currently pursuing option 1 via iMicknl/python-overkiz-api#48. This is the easiest to implement, however it has the following limitations:

  • After 120 seconds (10 retries) it will still throw the error. This integration would need to do something with it.
  • If multiple commands for the same device are sent to the API when facing the TooManyExecutions exception, it could happen that they are executed out of sync. When both of them will be retried via backoff, it could happen that the last one is lucky and will be executed first.

@kaczkalolo
Copy link

I have 14 covers. What can i do to close/open them all at the same time? i getting exactly the same error :(

@iMicknl iMicknl reopened this Feb 6, 2021
@iMicknl
Copy link
Owner Author

iMicknl commented Feb 6, 2021

@kaczkalolo do you encounter this problem for a while, or is this a new issue? There was another report yesterday (#378).

For now, I would advice to add a delay in your automations in Home Assistant... This will be way more reliable, since this is an issue on the Somfy TaHoma side.

@kaczkalolo
Copy link

kaczkalolo commented Feb 6, 2021 via email

@mstuij
Copy link

mstuij commented Mar 21, 2021

Hi,

I have the same problem but for me there are no 10 long running operations necessary.

At 06:00 rollers at the front-side of the house are openend. This goes always good without errors and always exactly at 06:00.

At 07:00 my rollers at the side of the house are planned to open. This goes almost every time wrong. I get this error:

2021-03-17 07:00:00 INFO (MainThread) [backoff] Backing off execute_command(...) for 0.7s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:00 INFO (MainThread) [backoff] Backing off execute_command(...) for 0.6s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:00 INFO (MainThread) [backoff] Backing off execute_command(...) for 0.3s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:00 INFO (MainThread) [backoff] Backing off execute_command(...) for 0.3s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:00 INFO (MainThread) [backoff] Backing off execute_command(...) for 1.6s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:00 INFO (MainThread) [backoff] Backing off execute_command(...) for 3.7s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:00 INFO (MainThread) [backoff] Backing off execute_command(...) for 0.4s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:01 INFO (MainThread) [backoff] Backing off execute_command(...) for 0.2s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:01 INFO (MainThread) [backoff] Backing off execute_command(...) for 0.4s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:02 INFO (MainThread) [backoff] Backing off execute_command(...) for 11.3s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:02 INFO (MainThread) [backoff] Backing off execute_command(...) for 0.0s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:02 INFO (MainThread) [backoff] Backing off execute_command(...) for 0.2s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:02 INFO (MainThread) [backoff] Backing off execute_command(...) for 14.4s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:04 INFO (MainThread) [backoff] Backing off execute_command(...) for 5.7s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:10 INFO (MainThread) [backoff] Backing off execute_command(...) for 3.7s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:13 INFO (MainThread) [backoff] Backing off execute_command(...) for 6.3s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:14 INFO (MainThread) [backoff] Backing off execute_command(...) for 8.8s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:17 INFO (MainThread) [backoff] Backing off execute_command(...) for 27.1s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:19 INFO (MainThread) [backoff] Backing off execute_command(...) for 13.7s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:23 INFO (MainThread) [backoff] Backing off execute_command(...) for 4.1s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:27 INFO (MainThread) [backoff] Backing off execute_command(...) for 80.1s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:30 DEBUG (MainThread) [custom_components.tahoma] Finished fetching device events data in 0.037 seconds 2021-03-17 07:00:33 INFO (MainThread) [backoff] Backing off execute_command(...) for 116.1s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:44 INFO (MainThread) [backoff] Backing off execute_command(...) for 3.2s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:00:47 INFO (MainThread) [backoff] Backing off execute_command(...) for 76.8s (pyhoma.exceptions.TooManyExecutionsException: Server busy, please try again later. (Too many executions)) 2021-03-17 07:01:00 DEBUG (MainThread) [custom_components.tahoma] Finished fetching device events data in 0.111 seconds 2021-03-17 07:01:30 DEBUG (MainThread) [custom_components.tahoma.coordinator] EventName.DEVICE_STATE_CHANGED/None (device: io://0203-9950-4411/11186955, state: None -> None) 2021-03-17 07:01:30 DEBUG (MainThread) [custom_components.tahoma] Finished fetching device events data in 0.132 seconds

The rollers that are opened at 06:00 is done by Node-RED by openening a group:

`cover:

  • platform: template
    covers:
    voorkant_cover_group:
    friendly_name: "Voorkant"
    device_class: shutter
    optimistic: false
    open_cover:
    - service: tahoma.set_cover_position_low_speed
    data:
    entity_id: cover.01_master_bedroom, cover.02_voorraam_klein, cover.03_voorraam_groot
    position: 100
    close_cover:
    - service: tahoma.set_cover_position_low_speed
    data:
    entity_id: cover.01_master_bedroom, cover.02_voorraam_klein, cover.03_voorraam_groot
    position: 0
    stop_cover:
    - service: cover.stop_cover
    data:
    entity_id: cover.01_master_bedroom, cover.02_voorraam_klein, cover.03_voorraam_groot
    set_cover_position:
    - service: tahoma.set_cover_position_low_speed
    data:
    entity_id: cover.01_master_bedroom, cover.02_voorraam_klein, cover.03_voorraam_groot
    position: "{{position}}"
    position_template: "{{ state_attr('cover.01_master_bedroom', 'current_position' ) | int }}"
    icon_template: >-
    {% if is_state('sensor.voorkant_cover_group', 'open') %}
    mdi:window-shutter-open
    {% else %}
    mdi:window-shutter
    {% endif %}
    `

The rollers that are opened at 07:00 (which has the problems) is done by Node-RED by openening a group:

`cover:

  • platform: template
    covers:
    westkant_cover_group:
    friendly_name: "West kant"
    device_class: shutter
    optimistic: false
    open_cover:
    - service: tahoma.set_cover_position_low_speed
    data:
    entity_id: cover.04_computer_kamer, cover.05_hal_1e_etage, cover.06_zijraam
    position: 100
    close_cover:
    - service: tahoma.set_cover_position_low_speed
    data:
    entity_id: cover.04_computer_kamer, cover.05_hal_1e_etage, cover.06_zijraam
    position: 0
    stop_cover:
    - service: cover.stop_cover
    data:
    entity_id: cover.04_computer_kamer, cover.05_hal_1e_etage, cover.06_zijraam
    set_cover_position:
    - service: tahoma.set_cover_position_low_speed
    data:
    entity_id: cover.04_computer_kamer, cover.05_hal_1e_etage, cover.06_zijraam
    position: "{{position}}"
    position_template: "{{ state_attr('cover.04_computer_kamer', 'current_position' ) | int }}"
    icon_template: >-
    {% if is_state('sensor.westkant_cover_group', 'open') %}
    mdi:window-shutter-open
    {% else %}
    mdi:window-shutter
    {% endif %}

`

The rollers are openend a minute or 2 after the scheduled time. And also they are NOT openend at the same time. So you see roller 1 is opening, after that number 2 and then number 3.

@rvrignaud
Copy link

Hi,

I started to see this behavior for 13 shutters a few days back.
I now also have this issue:

Jul 07 10:50:49 pi hass[944]: 2021-07-07 10:50:49 INFO (MainThread) [backoff] Backing off execute_commands(...) for 0.2s (pyhoma.exceptions.NotAuthenticatedException: Not authenticated)
Jul 07 10:50:49 pi hass[944]: 2021-07-07 10:50:49 ERROR (MainThread) [custom_components.tahoma.tahoma_entity] Too many requests, try again later : login with REDACTED@gmail.com

I can't even login to https://www.tahomalink.com/ as I get the same error:

(Too many requests, try again later : login with REDACTED@gmail.com)

in a pop up message box.

I'm using Google Assistant to open close all the shutters at once. Not sure how to introduce delay on my side.
Any known workaround ?

@vlebourl
Copy link
Collaborator

vlebourl commented Jul 7, 2021

Hi, I don't think this error is linked to the first one. This one happens when you try to open too many connections to the API in a short period of time. If I'm not mistaken, HA keeps its connection open to reuse it for successive actions.

Waiting 10 minutes before retrying to connect is the only fix I know for that one.

@rvrignaud
Copy link

Hi @vlebourl. Thank you for clarification and sorry to mixed up issues.
I also have tons of Execution queue is full on gateway: #xxxx-xxxx-xxxx (soft limit: 10) and I now never have all my shutters closing and opening at the same time. Even though this has worked for quite some time before.
Any idea how to improve when not using an automation ?

@laszlojakab
Copy link

Is there a way to have an event if this kind of error happens (like in #573)? This exception catched here:

async def async_execute_command(self, command_name: str, *args: Any):
"""Execute device command in async context."""
try:
exec_id = await self.coordinator.client.execute_command(
self.device.deviceurl,
Command(command_name, list(args)),
"Home Assistant",
)
except Exception as exception: # pylint: disable=broad-except
_LOGGER.error(exception)
return

There is only a log message but can't handle this error from outside (eg: from an automation).
More general question: can we have an event (or events) if a command fails (eg: open cover, close cover, etc...)?

@tetienne
Copy link
Collaborator

@laszlojakab Can you try the master branch? I've added event in case of failure. More information here.

@laszlojakab
Copy link

laszlojakab commented Sep 30, 2021

@tetienne Sorry not exactly the same error as in the subject of the issue. I get this error:

2021-09-30 23:03:10 ERROR (MainThread) [custom_components.tahoma.executor] Execution queue is full on gateway: #XXXX-XXXX-XXXX (soft limit: 10)

No event for this kind of error in master branch. As I mentioned this error logged by line 74 in executor.py

@tetienne
Copy link
Collaborator

tetienne commented Oct 4, 2021

@laszlojakab I was able to reproduce your error spamming a script where I open and close many times a cover.

It seems an event is well triggered when a command is not correctly send:

2021-10-04 15:34:13 DEBUG (MainThread) [custom_components.tahoma.coordinator] Event(timestamp=1633354453054, name=<EventName.EXECUTION_STATE_CHANGED: 'ExecutionStateChangedEvent'>, setupoid=741ca89f-a47b-4ad****-894d-d225c06922b1, owner_key=741ca89f-a47b-4ad****-894d-d225c06922b1, type=1, sub_type=1, time_to_next_state=0, failed_commands=None, failure_type_code=None, failure_type=None, condition_groupoid=None, placeoid=None, label=None, metadata=None, camera_id=None, deleted_raw_devices_count=None, protocol_type=None, gateway_id=None, exec_id='4b83103c-ac10-3e01-18e7-2cb65d75c706', deviceurl=None, device_states=[], old_state=<ExecutionState.INITIALIZED: 'INITIALIZED'>, new_state=<ExecutionState.NOT_TRANSMITTED: 'NOT_TRANSMITTED'>)

Or

2021-10-04 15:34:14 DEBUG (MainThread) [custom_components.tahoma.coordinator] Event(timestamp=1633354453894, name=<EventName.EXECUTION_STATE_CHANGED: 'ExecutionStateChangedEvent'>, setupoid=741ca89f-a47b-4ad****-894d-d225c06922b1, owner_key=741ca89f-a47b-4ad****-894d-d225c06922b1, type=1, sub_type=1, time_to_next_state=-1, failed_commands=[{'deviceurl': 'io://****-*****-*****/5978000', 'rank': 0, 'failure_type': 'CMDCANCELLED'}], failure_type_code=<FailureType.CMDCANCELLED: 106>, failure_type='CMDCANCELLED', condition_groupoid=None, placeoid=None, label=None, metadata=None, camera_id=None, deleted_raw_devices_count=None, protocol_type=None, gateway_id=None, exec_id='4b830acf-ac10-3e01-18e7-2cb6cde8f991', deviceurl=None, device_states=[], old_state=<ExecutionState.IN_PROGRESS: 'IN_PROGRESS'>, new_state=<ExecutionState.FAILED: 'FAILED'>)

You can see these logs if you enabled the debug log.

@iMicknl
Copy link
Owner Author

iMicknl commented Oct 4, 2021

@tetienne perhaps his case will broadcast NOT_TRANSMITTED and we only create an HA event when the ExecutionState is failed?

@tetienne
Copy link
Collaborator

tetienne commented Oct 4, 2021

I’m sure if the events are linked. But, we can forward more events if needed indeed.

@rvrignaud
Copy link

Besides the event generated, is there any intention to have a proper fix to this issue ?
This is really introducing lots of problems for me

@tetienne
Copy link
Collaborator

tetienne commented Oct 4, 2021

@rvrignaud That’s not something we can fix easily, if feasible. It means we will have to handle a kind of queue, or something similar. In your case, why not use scenario created on the tahomalink.com side?

@rvrignaud
Copy link

Because I'm using Google Assistant voice queries that are triggering native devices command (for 13 devices for now, soon much more) and not a specific routine. Like "OKG close all the shutters."
I would prefer as much as possible to divert from native GHA interactions

@iMicknl
Copy link
Owner Author

iMicknl commented Oct 4, 2021

@tetienne I wonder why the backoff is not working. Perhaps we need to retry for a longer time.

I am not a huge fan of building a queue in the Data Update Coordinator...

@tetienne
Copy link
Collaborator

tetienne commented Oct 4, 2021

@iMicknl Probably yes, we can add some delay to check if it helps.

@rvrignaud
Copy link

If you need any help in testing new behavior, I can consistently reproduce this one. Happy to help

@laszlojakab
Copy link

laszlojakab commented Mar 6, 2022

Any updates on that issue? Can we get an event (like the errors in overkiz.event) for this event to do some workaround with automations?

@downdrown
Copy link

I'm having the same issue when trying to control 15 blinds simultaneously when using the Overkiz integration. Does anybody know what would be the required delay between the calls for Overkiz to accept the requests? (I mean if i split the calls in 10 / 5) calls for example.

@cesc86
Copy link

cesc86 commented Apr 17, 2022

Same story here trying to control 14 Somfy blinds at once with Overkiz. All I could do was split them in 7 and 7...

@laszlojakab
Copy link

Hi again. The problem still exists. I know the repo is deprecated but I don't want to open a new issue in core for the same error. As I see overkiz.event is not migrated to core so I can't do any workaround in automation for the soft limit error. There is a continue_on_error flag for scripts. It would be nice if we could "catch" the soft limit error with this. With this ability after moving the cover I could check if it is in the expected position or if it is moving. If none of them is true then there was an error so I should set the cover position again. The continue_on_error works only with exceptions.HomeAssistantError type. See here: https://github.com/home-assistant/core/blob/ba8a530d19dc3e93602275961de46d700611f4a1/homeassistant/helpers/script.py#LL493C24-L493C24

Could you please modify the core version of the integartion to raise HomeAssistantError in case of soft limit error?

@iMicknl
Copy link
Owner Author

iMicknl commented May 30, 2023

@laszlojakab would be good to raise this on core repo! Happy to see if we can implement it.

@laszlojakab
Copy link

Added to core.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working contributions welcome help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

9 participants