-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JBPM-10197] Improving performance of getTimerByName #2369
Conversation
9a9fa42
to
b76d4d3
Compare
This is not really a good approach IMO you are going to cause more problem that anything and it is completely unnecesary. As performance problems was fixed storing the ejb timer info in the table. It is standard and and supported by Java EE environments.Also it supports clustering properly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above.
I agree except for the necesity (if it was not necessary, I wont be coding this ;)). The reason we are proposing this hack is because the EJBTimerMapping solution is far from bulletproof (there are users seeing lot of EJBTimerMapping entryies pointing to a null external timer) and, in that situation, the system collapses performing the serial search when timers are registered at startup (since there is an exception processing the EJBTimerMapping, the system performs the linear seach for all the id null entries). Any other solution that avoid that search (see other closed PRs for this JIRA) will potentially lead to duplicate timers, so breaking encapsulation (which we can hide anyway by using service loader) seems one possibiilty (the other option being manually rebuilding EJBTimerMappingInfo cache offline) Anyway, Im not in favour of commiting this, it is just a branch in case the same situation with EJBTimerMappingInfo arise again and we can do a specific patch to address it |
We are aware of the problem but the solution is not following up with a hack but fix properly the null (just an effect of another problem). If that happens means there is something wrong with it (root cause not clear in here). Walking this path it is not a solution for other app servers or even a real thing as there configuration in the eap could make this not work (like using a different DB or other persistent solutions) within the ejb timer subsystem. So this solution will work in a very specific cases. @martinweiler @porcelli ^^ be aware of this as this solution should not be merged. |
I agree with @elguardian that introducing a solution specific to only a subset of the supported databases is not meeting the supportability requirements of the downstream product, and is thus problematic to be merged even upstream. |
Solution is already provided: https://issues.redhat.com/browse/EAPSUP-1336 Keep an index of timers that can be lost is an smell coding. That shouldn't be a problem if traverse all timers is not that slow, but it is: For each N active timer, due to column info is not cached, JBPM needs to do N select queries to get that info (really slow) just to read into info column the index JBPM knows. That issue propose to be able to search for that know index (the usual concept of an externalid) as EJB timers is just a service to provide timers to external apps, it should manage the concept of real index the external app uses for timers. This hack is just the evidence that TImerMappingInfo should be removed, and that externaId should be added to EJB timers spec The goal here is not to avoid getTimersByName (basically what TimerMappingInfo does) but point out that we cannot have a method that slown down JBPM because we have millions of timers. I tried to convince engineers of EJB timers about this obvious change on spec (the hack just prove it) but there's no progress on it |
@elguardian @martinweiler @porcelli We are not planning to merge this. As I already mention (twice) it is a hack that can be used as specific patch for specific users, nothing more than that. |
SonarCloud Quality Gate failed. 0 Bugs 43.0% Coverage Catch issues before they fail your Quality Gate with our IDE extension SonarLint |
@martinweiler @elguardian |
@fjtirado my concern about changing from BMT to CMT was related to transaction demarcation. There is nothing in my observations related to inheriting anything (I am talking about tx demarcation). I did never said that. I would suggest to stay on the topic instead answering to something that noone was putting in question. The main problem of CMT is that you will lose control of retry mechanism and wildly does have a strategy in place to deal with problems (two policies actually) one coming from the spec (immediate automatic retry) and the other coming from the app server itself (several retries after some time). Actually that functionality is lost or broken at this point. Please read again my observation again (#2362 (comment)) which is the first sentence. Related to TimerMappingInfo is a solution compliant with the ejb spec and respected the design of ejb. It is a way to introduce a way to exchange information among cluster members so it is not a cache |
@elguardian Switching to BMT suspend the transaction that schedules the timer. That causes the timer to be scheduled before the transaction that schedule it is commited (if the timer delay is short enough). The broken Wildfly retry mechanism can be avoided (using CMT or BMT) by not propagating any exception back to the container (so the container thinks the timer execution has been completed successfully) |
@fjtirado The tx related to the timer is the one in flight when the timer times out/consumed is operated, so not very certain what is related to suspend tx you are mentioned. You can look into the code yourself Related to the @timeout method invocation. The callout will invoke interceptors including tx if there is any. If the bean is not BMT it won't invoke any tx so BMT won't suspend any transaction in a timeout call. if that is what you are referring. In any case the timer service can span it's own tx If the timeout fails and you don't propagate the error, you will lose the timer for sure. |
@elguardian Im not referring to the timeout method. Im referring to this one https://github.com/kiegroup/jbpm/blob/main/jbpm-services/jbpm-services-ejb/jbpm-services-ejb-timer/src/main/java/org/jbpm/services/ejb/timer/EJBTimerScheduler.java#L202 |
JIRA:
link
A huge hack, because it made certain assumptions that should not be made if TimerService API was functional for the usage we are giving it (which is even a huger hack in my opinion ;)), but maybe worthy for users having performance issues with
getTimerByName()
.If Timer implementation is wildfly and it is using DB as persistence mechanism and if DB is oracle or postgresql, we can actually try to reduce the number of timers to be checked in getTimerByName using a bit of reflection and a couple of native queries. The hacky part is isolated into a single class.
The Wildfly hacky assumptions are:
persistence
If assumptions 1), 2) and 3) are changed by a future wildfly release, linear search will be done (so we are covered on that regard). 4) and 5) are trickier, since the exception will get propagated.
DISCLAIMER
I did not want to do this, but I consider this approach better than potentially creating duplicates (see closed PRs over the same JIRA) or performing a linear search over thousands of timers once we know how Wildfly is working. If Application Server is not widlfly, or the persistence is not on DB or widfly change the class name, we will revert to linear search, so it is just a hack that boost performance of a certain wildfly setup and a couple of DBs, it does not imply losing any existing functionality if the setup is not matched .
PS
If we are ok with the possibility of assuming Wildfly, my idea, before moving into "ready for review" and once this has been tested with an existing problematic setup, is to switch to ServiceLoader mechanism and try to add as much DBs as possible, so the classes related with Wildfly are not in the EJB timer module, but in a separate one that might be added only for wildfly setup (this will allow including the Wildfly dependencues, the usage of instanceof and even calling the deSerialize method through direct call). Before that I want to verify that this approach really boost performance.