Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Case - Streaming Rules Engine in SQL #5

Open
kurtmaile opened this issue May 12, 2023 · 0 comments
Open

Use Case - Streaming Rules Engine in SQL #5

kurtmaile opened this issue May 12, 2023 · 0 comments

Comments

@kurtmaile
Copy link

HI Timo!

I love your recent talk at Data Council 2023, thanks so much, have listened twice as very relevant for the solution we are embarking on with a new client in a challenging data env.

My CTO is actually speaking with confluent at Kafka summit London next Tuesday on this (immerok were/are being acquired right?) Anyway.....an interesting use case below I need to architect out and thought to reach out for any thoughts / guidance from someone so knowledgeable in the industry.

So....we are using Flink actively, but a new use case is challenging! A recent use case is in the Betting & Gaming industry around being able to protect gaming players from depositing too much / beyond their mean, which there are regulatory consequences for breaching.

Betting and gaming is a highly real-time / event driven environment, with strict and hard to achieve compliance requirements for player safety. this use case is around ensuring deposit limits are satisfied in terms of directly with the customer, but also trying to detect anomalies where they have managed to get around this.

e.g

Stream 1: Customer/Player Deposit Limit Setting Change Events e.g daily limit = 10, weekly = 50, monthly = 150 - I should not be allowed to deposit more then these amounts in these periods.

These are slowly changing compared to the payment transactions, and are set for 'periods of time' (daily, weekly, monthly - could me more timeframes added) , but typically only looking at deposit trans as far back as 6 months. These are hard and fast limits when applied against the actual deposit transactions made.......brings me to stream 2.

Stream 2: Deposit (and Withdrawals but ignore for now re: net depositis) payment transactions

This is a much more high paced stream in B&G. The front end 'should' calculate and catch any requests to deposit that breach, but there are plenty of ways to subverse the system due to the typical delay in detection computatons.

Req 1: Async Breach/Anomaly Detection over stream of events (stream - stream join):

When a given deposit transaction event is received in a particular state (e.g processed, successful), historic aggregates are needed to be calculated back in time for that customer e.g (24 hours sliding, 7 days sliding, 30 days sliding) and check if any of those bucketed trans breach the limit specified.

As mentioned, we expect that the online/sync process would stop most of this happening, but, bad actors, with gambling issues, can subvert the system and try to deposit more then they should. We would want to raise such events to kafka.

This brings me onto Key Re 2:

Req 2: Sync calculation on-demand when customer requests

Imagine the customer has the ability not only to set the limits in the UI, but, also at deposit time, understand their realtime (sync) position on demand before they can execute a deposit. How much CAN they deposit, derived from calculating the 'left over' deposit amount considering the limit settings and historic deposits / withdrawals, per time period, as per above logic.

So the logic is similar to above, but as its 'on demand', triggered by an API request, not the arrival of the actual deposit event itself (post the fact) - it is almost batch like being on demand of an end user, but all the logic should remain similar?

Req 3: How can we generalise a set of pipeline processing logic 'per rule', and where a related set of rules may need to fire, and then a series of actions to happen. Almost a generalised form of above.

I think an old school 'BRMS', but over streams!

In some instances, CEP seemed like it could be used, but I read it didnt scale so well (imagine in peak time 500k customers depositing and changing limits potentially concurrently?) - not sure if this is a true statement though!. In some instances, CEP would be overkill anyway.

Seems to be a generic pattern of

Rule Input Feature Set (scalar v aggregated) -> Rule Computation & Result (bool + extra derived measures) -> Rule Actions

Some rule 'input feature' are simple, scalar ones, found on the entity easily mapped (deposit limit measure), others, may beed to be derived as an aggregate computation to derive the feature value (SUM(deposits) over timeframe. In this example:

Rule 1: Check Daily Limits Availability
if SUM(deposits) - SUM(withdrawals) OVER 24 HOUR window >= DepositLimit.Daily then "NotAvailableToDeposit"

Rule 2: Check Weekly Limits Availability
if SUM(deposits) - SUM(withdrawals) OVER 7 Day window >= depositLimit.Daily then "NotAvailableToDeposit"

Rule 3: Check Monthly Limits Availability
if SUM(deposits) - SUM(withdrawals) OVER 30 Day window >= depositLimit.Daily then "NotAvailableToDeposit"

(note: above rules could be reused independently other across composite rules)

Rule 4: Check Any Available of above

Rule 5: Check when next available deposit date and amount

....type of computations needed.

Req 4: Ideally its declarative in SQL!

So in summary, interestingly, there is a version of this which needs to be on-demand, responding to a 'depositRequest' from an end user and return the overall state, the other is executing almost of the same logic over the stream to check if a customer managed to subverse the system somehow and actually deposit despite the rules saying they shouldnt - and raise an event.

I am in the process of architecting this out in Flink, would LOVE to hear your initial thoughts on the this, and happy to have a deeper dive session. Not sure if youd have anytime before next tuesday for some thoughts, since we will be speaking to confluent team there. They have said they would get us in touch with some Flink product people at the conference.

Regardless, I love your presso and level of detail, and being Immerok too, well, could be great synergies in the future!

Thanks heaps, if I dont hear from you, no problems, know you would be busy :)

Cheers
Kurt

Req: To build SQL based pipelines in flink

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant