Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update EbeanLocalAccess batchGetUnion SQL query from "UNION ALL" to "IN" #386

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

hongliu2024
Copy link
Contributor

@hongliu2024 hongliu2024 commented Jun 10, 2024

Summary

Based on previous TMS issues and slow query analysis, we identified the UNION ALL type of queries are taking a long time, exhausts DB resources, and can cause slow response times. One sample query is:

SELECT urn, a_clusterdatasetsla, lastmodifiedon, lastmodifiedby FROM metadata_entity_dataset WHERE urn = 'urn:li:dataset:(urn:li:dataPlatform:kafka,METRICS.RtfActivityEvent,CORP)' AND JSON_EXTRACT(a_clusterdatasetsla, '$.gma_deleted') IS NULL 
UNION ALL 
SELECT urn, a_clusterdatasetsla, lastmodifiedon, lastmodifiedby FROM metadata_entity_dataset WHERE urn = 'urn:li:dataset:(urn:li:dataPlatform:kafka,METRICS.rtf-store-war-service_call,CORP)' AND JSON_EXTRACT(a_clusterdatasetsla, '$.gma_deleted') IS NULL 
UNION ALL 
SELECT urn, a_clusterdatasetsla, lastmodifiedon, lastmodifiedby FROM metadata_entity_dataset WHERE urn = 'urn:li:dataset:(urn:li:dataPlatform:kafka,METRICS.redliner-log_event,CORP)' AND JSON_EXTRACT(a_clusterdatasetsla, '$.gma_deleted') IS NULL 
UNION ALL …

This PR is to:

  1. Convert those long UNION ALL logic into using IN, such as:
SELECT urn, a_clusterdatasetsla, lastmodifiedon, lastmodifiedby 
FROM metadata_entity_dataset 
WHERE JSON_EXTRACT(a_clusterdatasetsla, '$.gma_deleted') IS NULL 
AND urn IN (
'urn:li:dataset:(urn:li:dataPlatform:kafka,METRICS.RtfActivityEvent,CORP)',
'urn:li:dataset:(urn:li:dataPlatform:kafka,METRICS.rtf-store-war-service_call,CORP)',
'urn:li:dataset:(urn:li:dataPlatform:kafka,METRICS.redliner-log_event,CORP)',
...
)
  1. Revert the previous changes in PR Added querycount to getWithExtraInfo #367 which was to divide UNION ALL queries into small batches. When the change was released, it caused downtime instead because all smaller queries were executed at the same time and caused memory exhaustion. After the issue, the TMS side code was reverted, but the datahub-gma changes wasn't.

Testing Done

Details testings are documented into this document. I'm briefly listing the testing and results here. Please refer to the doc for some analysis and detailed test steps.

  1. Directly run UNION ALL and IN queries on MySQL DB (metagalaxy_stg) to compare execution time.
    Result
Test Time
UNION ALL over 5k urns 2.630s
IN over 5k urns 0.700s
UNION ALL over 10k urns 6.561s
IN over 10k urns 0.880s
  1. Local Testing
  2. EI Testing
  3. DC Testing

Checklist

@codecov-commenter
Copy link

codecov-commenter commented Jun 10, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.

Project coverage is 67.75%. Comparing base (c79500a) to head (005e5f2).
Report is 70 commits behind head on master.

Files with missing lines Patch % Lines
...linkedin/metadata/dao/utils/SQLStatementUtils.java 33.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #386      +/-   ##
============================================
+ Coverage     67.74%   67.75%   +0.01%     
- Complexity     1343     1374      +31     
============================================
  Files           132      136       +4     
  Lines          5276     5409     +133     
  Branches        546      563      +17     
============================================
+ Hits           3574     3665      +91     
- Misses         1484     1515      +31     
- Partials        218      229      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hongliu2024 hongliu2024 marked this pull request as ready for review June 28, 2024 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants