Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup restore patch #15367

Closed
wants to merge 77 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
fb9d6dd
Revert "12142 Task 4 - Add support for adding entities as part of Dat…
pmbrull Aug 23, 2023
b21411b
Revert "12142.task3 Add support for inheriting domain from parent to …
pmbrull Aug 23, 2023
007a368
Align Revert "12142.task3 Add support for inheriting domain from pare…
pmbrull Aug 23, 2023
c30cabf
Revert "12142.task2 - Add domain and data product properties to entit…
pmbrull Aug 23, 2023
7e2ec57
Align Revert "12142.task2 - Add domain and data product properties to…
pmbrull Aug 23, 2023
ffcddb8
Revert "12142.task1 - Add support for Domains (#12143)"
pmbrull Aug 23, 2023
3c24254
Align Revert "12142.task1 - Add support for Domains (#12143)"
pmbrull Aug 23, 2023
d71ea54
Remove migrations for next releases
pmbrull Aug 23, 2023
33f1fdc
Format and allow manual dispatch
pmbrull Aug 23, 2023
d8dd5b9
Run maven CI on workflow dispatch
pmbrull Aug 23, 2023
ba47bfc
Remove Data Product methods
pmbrull Aug 23, 2023
15bada3
Revert "Fix #12779: Add support for SearchIndexes for ElasticSearch a…
pmbrull Aug 23, 2023
8980417
Remove Data Product from migrations
pmbrull Aug 23, 2023
94b4a74
Fix revert issues
pmbrull Aug 23, 2023
0b723b5
Fix revert issues
pmbrull Aug 23, 2023
96040d4
Remove dangling search schemas
pmbrull Aug 23, 2023
f25a165
- LowerCase the UserNames and Email in user_entity (#12942)
mohityadav766 Aug 23, 2023
18f2998
Revert Fixes #10740 Inherited ownership must not overwrite manually e…
mohityadav766 Aug 23, 2023
9d72594
Revert Fixes 11287 Task - Add retention period and customer propertie…
mohityadav766 Aug 23, 2023
495f72d
Add verbosity for validation
pmbrull Aug 23, 2023
4219a4e
Merge remote-tracking branch 'upstream/1.1.2' into 1.1.2
pmbrull Aug 23, 2023
cc4535e
doc(ui): whats new for 1.1.2 (#12962)
Sachin-chaurasiya Aug 23, 2023
02d6b8b
NameHash is unavailable in 1.1.0
mohityadav766 Aug 23, 2023
4e87a6a
[WIP] - Add testSuite query param for listing Ingestion Pipelines (#1…
pmbrull Aug 23, 2023
0e66ae3
Clean Search Service
pmbrull Aug 24, 2023
1f4e6cc
[Feat] Version-bump: Prepare 1.1.2-release (#12985)
dhruvinmaniar123 Aug 24, 2023
99a364b
ui: updated schedule interval to send undefine when none is selected …
ShaileshParmar11 Aug 24, 2023
a792c40
fix(chore): Remove OPENMETADATA_DEBUG Param (#12997)
akash-jain-10 Aug 24, 2023
00d6acc
Name should be quoted before hashing
mohityadav766 Aug 24, 2023
9cb1dd9
fix(release): Version bump to 1.1.3 (#13012)
akash-jain-10 Aug 29, 2023
9176aa5
fix(CI): Workflow for Server Docker Release
akash-jain-10 Aug 29, 2023
d93e031
fix(CI): Dockerfiles version
akash-jain-10 Aug 29, 2023
8f4791c
fix(ui): ingestion deploy failed on 1.1.3 (#13036)
chirag-madlani Aug 30, 2023
6e91ad7
Fix test suite migrations for 1.1.x (#13037)
harshach Aug 30, 2023
c0c135d
Bump version to 1.1.4
pmbrull Aug 30, 2023
51bf175
Fix test suite migrations to 1.1.4 (#13042)
harshach Aug 31, 2023
e8c63a5
rmv fqn field tableau (#13064)
OnkarVO7 Sep 2, 2023
b9770de
Issue 8930 - Update profiler timestamp from seconds to milliseconds (…
TeddyCr Aug 25, 2023
1a1a5ff
Issue-12914 -- Split profiler, data quality and data insight table fr…
TeddyCr Aug 28, 2023
523e957
fix: removed single relationship test between testCase and testSuite …
TeddyCr Aug 31, 2023
5f7bdb0
Issue 12297 bis -- Delete Insert logic in the DI workflow (#13058)
TeddyCr Sep 1, 2023
c3d3aea
fix: summary for logical test suite (#13005)
TeddyCr Sep 1, 2023
2199a2a
fix: flaky report data tests (#13066)
TeddyCr Sep 4, 2023
0b9ced7
Issue 13080 - Added logic to update Summary on test case deletion (#1…
TeddyCr Sep 6, 2023
3495f29
fix: added upsert logic back for system metrics (#13092)
TeddyCr Sep 6, 2023
bde23ac
Merge remote-tracking branch 'upstream/1.1.5' into 1.1.5
TeddyCr Sep 6, 2023
2ba5eaa
fix: alerting for entiyFQN filter and testResult filter for test case…
TeddyCr Sep 7, 2023
4544a1a
fix(#12736): OIDC Pulling Keyset multiple times per minute. (#13118)
Sachin-chaurasiya Sep 8, 2023
26eeb25
ui: updated testCaseResult api from put to patch (#13117)
ShaileshParmar11 Sep 8, 2023
2a1efb2
chore(ui): fix signup page styling (#13086)
Ashish8689 Sep 6, 2023
5b12bb7
Fix #13001: Fix query not populating to all tables (#13004)
ulixius9 Sep 5, 2023
fae4a3d
Fix usage count issue (#13109)
ulixius9 Sep 8, 2023
102c9a3
Add used by field in query entity (#13096)
ulixius9 Sep 11, 2023
b4870b2
fix conflict issue
ulixius9 Sep 12, 2023
d210d28
Minor fix: Do not re-run migration for properly formed native test su…
harshach Sep 12, 2023
2d14660
Fix hive metastore testconnection (#13157)
ulixius9 Sep 12, 2023
d2197b3
Upgrade: airflow base image from 2.5.3 to 2.6.3 (#13151)
Anuj359 Sep 12, 2023
fb427e5
Creating make script for code freeze automation (#12976)
Anuj359 Sep 13, 2023
e8eb957
chore(release): Version bump to 1.1.5
akash-jain-10 Sep 13, 2023
bf87def
fix: set engine for each project ID (#13176)
TeddyCr Sep 13, 2023
f6085a5
Merge remote-tracking branch 'upstream/1.1.5' into 1.1.5
TeddyCr Sep 13, 2023
b984fbb
fix: implement percentile computation logic for SingleStore (#13170)
TeddyCr Sep 13, 2023
b9ab88c
Use Collate SQLLineage Package for lineage (#13173)
ulixius9 Sep 13, 2023
7f236cb
Bigquery: Add Table Level Tags, fix dataset issue (#13098)
ayush-shah Sep 13, 2023
110f871
fix: move testSuite summary state update to preDelete (#13180)
TeddyCr Sep 14, 2023
2f239a4
Fix Bigquery Typo and Import Erro
ayush-shah Sep 14, 2023
7707c76
fix: updated method signature to match 1.1.5 parent
TeddyCr Sep 14, 2023
229ce1a
fix: use entityLink FQN vs entityFQN when checking against entity FQN…
TeddyCr Sep 14, 2023
c974f19
updated dq timestemp from second to mili
ShaileshParmar11 Sep 14, 2023
7476f81
Merge branch '1.1.5' of https://github.com/open-metadata/OpenMetadata…
ShaileshParmar11 Sep 14, 2023
29973a1
fix: add test case result extension for ts migration (#13195)
TeddyCr Sep 14, 2023
c7f5023
check importlib setup (#13200)
ayush-shah Sep 15, 2023
84a09cc
only add collation to hash columns (#13201)
harshach Sep 15, 2023
92d51c4
fix(#13204): after changing the team type action buttons are not disa…
Sachin-chaurasiya Sep 15, 2023
3ac8d77
[FIX] Dockerfile label version to 1.1.5 (#13216)
dhruvinmaniar123 Sep 15, 2023
811c773
fixing dockerfile arg variable (#13217)
dhruvinmaniar123 Sep 15, 2023
077a194
add profiler_data_time_series TABLES_DUMP_ALL
sushi30 Feb 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .github/workflows/cypress-integration-tests-mysql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
name: MySQL Cypress Integration Tests

on:
workflow_dispatch:
push:
branches:
- main
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
name: PostgreSQL Cypress Integration Tests

on:
workflow_dispatch:
push:
branches:
- main
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/docker-openmetadata-db.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
steps:
- name: Check trigger type
if: ${{ env.input == '' }}
run: echo "input=1.2.0" >> $GITHUB_ENV
run: echo "input=1.1.5" >> $GITHUB_ENV

- name: Check out the Repo
uses: actions/checkout@v3
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/docker-openmetadata-ingestion-base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
steps:
- name: Check trigger type
if: ${{ env.input == '' }}
run: echo "input=1.2.0" >> $GITHUB_ENV
run: echo "input=1.1.5" >> $GITHUB_ENV

- name: Check out the Repo
uses: actions/checkout@v3
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/docker-openmetadata-ingestion.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
steps:
- name: Check trigger type
if: ${{ env.input == '' }}
run: echo "input=1.2.0" >> $GITHUB_ENV
run: echo "input=1.1.5" >> $GITHUB_ENV

- name: Check out the Repo
uses: actions/checkout@v3
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/docker-openmetadata-postgres.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
steps:
- name: Check trigger type
if: ${{ env.input == '' }}
run: echo "input=1.2.0" >> $GITHUB_ENV
run: echo "input=1.1.5" >> $GITHUB_ENV

- name: Check out the Repo
uses: actions/checkout@v3
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/docker-openmetadata-server.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ jobs:
steps:
- name: Check trigger type
id: check_trigger
run: echo "DOCKER_RELEASE_TAG=1.2.0" >> $GITHUB_OUTPUT
run: echo "DOCKER_RELEASE_TAG=1.1.5" >> $GITHUB_OUTPUT

- name: Download application from Artifiact
uses: actions/download-artifact@v2
Expand Down Expand Up @@ -128,7 +128,7 @@ jobs:
- name: Check trigger type
id: check_trigger
if: ${{ env.DOCKER_RELEASE_TAG == '' }}
run: echo "DOCKER_RELEASE_TAG=1.2.0" >> $GITHUB_ENV
run: echo "DOCKER_RELEASE_TAG=1.1.5" >> $GITHUB_ENV

- name: Check out the Repo
uses: actions/checkout@v3
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/maven-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
name: Maven MySQL Tests CI

on:
workflow_dispatch:
push:
branches:
- main
Expand Down Expand Up @@ -116,7 +117,7 @@ jobs:
- name: Build with Maven
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
if: ${{ github.event_name == 'push' }}
if: ${{ github.event_name == 'push' || github.event_name == 'workflow_dispatch' }}
run: mvn -Dsonar.login=${{ secrets.SONAR_TOKEN }} clean test

- name: Clean Up
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/openmetadata-airflow-apis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,6 @@ jobs:
run: |
make install_dev install_apis
cd openmetadata-airflow-apis; \
python setup.py install sdist bdist_wheel; \
python setup.py build sdist bdist_wheel; \
twine check dist/*; \
twine upload dist/* --verbose
81 changes: 80 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -246,4 +246,83 @@ generate-schema-docs: ## Generates markdown files for documenting the JSON Sche
@echo "Generating Schema docs"
python -m pip install "jsonschema2md"
python scripts/generate_docs_schemas.py


#Upgrade release automation scripts below
.PHONY: update_all
update_all: ## To update all the release related files run make update_all RELEASE_VERSION=2.2.2 PY_RELEASE_VERSION=2.2.2.2
@echo "The release version is: $(RELEASE_VERSION)" ; \
echo "The python metadata release version: $(PY_RELEASE_VERSION)" ; \
$(MAKE) update_maven ; \
$(MAKE) update_github_action_paths ; \
$(MAKE) update_python_release_paths ; \
$(MAKE) update_dockerfile_version ; \
$(MAKE) update_ingestion_dockerfile_version ; \

#remove comment and use the below section when want to use this sub module "update_all" independently to update github actions
#make update_all RELEASE_VERSION=2.2.2 PY_RELEASE_VERSION=2.2.2.2

.PHONY: update_maven
update_maven: ## To update the common and pom.xml maven version
@echo "Updating Maven projects to version $(RELEASE_VERSION)..."; \
mvn versions:set -DnewVersion=$(RELEASE_VERSION)
#remove comment and use the below section when want to use this sub module "update_maven" independently to update github actions
#make update_maven RELEASE_VERSION=2.2.2


.PHONY: update_github_action_paths
update_github_action_paths: ## To update the github action ci docker files
@echo "Updating docker github action release version to $(RELEASE_VERSION)... "; \
file_paths="docker/docker-compose-quickstart/Dockerfile \
.github/workflows/docker-openmetadata-db.yml \
.github/workflows/docker-openmetadata-ingestion-base.yml \
.github/workflows/docker-openmetadata-ingestion.yml \
.github/workflows/docker-openmetadata-postgres.yml \
.github/workflows/docker-openmetadata-server.yml"; \
for file_path in $$file_paths; do \
python3 scripts/update_version.py 1 $$file_path -s $(RELEASE_VERSION) ; \
done; \
file_paths1="docker/docker-compose-quickstart/Dockerfile"; \
for file_path in $$file_paths1; do \
python3 scripts/update_version.py 4 $$file_path -s $(RELEASE_VERSION) ; \
done

#remove comment and use the below section when want to use this sub module "update_github_action_paths" independently to update github actions
#make update_github_action_paths RELEASE_VERSION=2.2.2

.PHONY: update_python_release_paths
update_python_release_paths: ## To update the setup.py files
file_paths="ingestion/setup.py \
openmetadata-airflow-apis/setup.py"; \
echo "Updating Python setup file versions to $(PY_RELEASE_VERSION)... "; \
for file_path in $$file_paths; do \
python3 scripts/update_version.py 2 $$file_path -s $(PY_RELEASE_VERSION) ; \
done
# Commented section for independent usage of the module update_python_release_paths independently to update github actions
#make update_python_release_paths PY_RELEASE_VERSION=2.2.2.2

.PHONY: update_dockerfile_version
update_dockerfile_version: ## To update the dockerfiles version
@file_paths="docker/docker-compose-ingestion/docker-compose-ingestion-postgres.yml \
docker/docker-compose-ingestion/docker-compose-ingestion.yml \
docker/docker-compose-openmetadata/docker-compose-openmetadata.yml \
docker/docker-compose-quickstart/docker-compose-postgres.yml \
docker/docker-compose-quickstart/docker-compose.yml"; \
echo "Updating docker github action release version to $(RELEASE_VERSION)... "; \
for file_path in $$file_paths; do \
python3 scripts/update_version.py 3 $$file_path -s $(RELEASE_VERSION) ; \
done
#remove comment and use the below section when want to use this sub module "update_dockerfile_version" independently to update github actions
#make update_dockerfile_version RELEASE_VERSION=2.2.2

.PHONY: update_ingestion_dockerfile_version
update_ingestion_dockerfile_version: ## To update the ingestion dockerfiles version
@file_paths="ingestion/Dockerfile \
ingestion/operators/docker/Dockerfile"; \
echo "Updating ingestion dockerfile release version to $(PY_RELEASE_VERSION)... "; \
for file_path in $$file_paths; do \
python3 scripts/update_version.py 4 $$file_path -s $(PY_RELEASE_VERSION) ; \
done
#remove comment and use the below section when want to use this sub module "update_ingestion_dockerfile_version" independently to update github actions
#make update_ingestion_dockerfile_version PY_RELEASE_VERSION=2.2.2.2

#Upgrade release automation scripts above
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,4 @@ SET json = JSON_INSERT(
'$.connection.config.authType.password',
JSON_EXTRACT(json, '$.connection.config.password'))
where serviceType = 'Trino'
AND JSON_EXTRACT(json, '$.connection.config.password') IS NOT NULL;
AND JSON_EXTRACT(json, '$.connection.config.password') IS NOT NULL;

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
START TRANSACTION;
-- We'll rank all the runs (timestamps) for every day, and delete all the data but the most recent one.
DELETE FROM report_data_time_series WHERE JSON_EXTRACT(json, '$.id') IN (
select ids FROM (
SELECT
(json ->> '$.id') AS ids,
DENSE_RANK() OVER(PARTITION BY `date` ORDER BY `timestamp` DESC) as denseRank
FROM (
SELECT
*
FROM report_data_time_series rdts
WHERE json ->> '$.reportDataType' = 'WebAnalyticEntityViewReportData'
) duplicates
ORDER BY `date` DESC, `timestamp` DESC
) as dense_ranked
WHERE denseRank != 1
);

DELETE FROM report_data_time_series WHERE JSON_EXTRACT(json, '$.id') IN (
select ids FROM (
SELECT
(json ->> '$.id') AS ids,
DENSE_RANK() OVER(PARTITION BY `date` ORDER BY `timestamp` DESC) as denseRank
FROM (
SELECT
*
FROM report_data_time_series rdts
WHERE json ->> '$.reportDataType' = 'EntityReportData'
) duplicates
ORDER BY `date` DESC, `timestamp` DESC
) as dense_ranked
WHERE denseRank != 1
);

DELETE FROM report_data_time_series WHERE JSON_EXTRACT(json, '$.id') IN (
select ids FROM (
SELECT
(json ->> '$.id') AS ids,
DENSE_RANK() OVER(PARTITION BY `date` ORDER BY `timestamp` DESC) as denseRank
FROM (
SELECT
*
FROM report_data_time_series rdts
WHERE json ->> '$.reportDataType' = 'WebAnalyticUserActivityReportData'
) duplicates
ORDER BY `date` DESC, `timestamp` DESC
) as dense_ranked
WHERE denseRank != 1
);
COMMIT;
Original file line number Diff line number Diff line change
@@ -1,9 +1,83 @@
-- Update table and column profile timestamps to be in milliseconds
UPDATE entity_extension_time_series
SET json = JSON_INSERT(
JSON_REMOVE(json, '$.timestamp'),
'$.timestamp',
JSON_EXTRACT(json, '$.timestamp') * 1000
)
WHERE
extension in ('table.tableProfile', 'table.columnProfile', 'testCase.testCaseResult');
;

START TRANSACTION;
-- Create report data time series table and move data from entity_extension_time_series
CREATE TABLE IF NOT EXISTS report_data_time_series (
entityFQNHash VARCHAR(768) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
extension VARCHAR(256) NOT NULL,
jsonSchema VARCHAR(256) NOT NULL,
json JSON NOT NULL,
timestamp BIGINT UNSIGNED GENERATED ALWAYS AS (json ->> '$.timestamp') NOT NULL,
date DATE GENERATED ALWAYS AS (FROM_UNIXTIME((json ->> '$.timestamp') DIV 1000)) NOT NULL,
INDEX report_data_time_series_point_ts (timestamp),
INDEX report_data_time_series_date (date)
);

INSERT INTO report_data_time_series (entityFQNHash,extension,jsonSchema,json)
SELECT entityFQNHash, extension, jsonSchema, json
FROM entity_extension_time_series WHERE extension = 'reportData.reportDataResult';

DELETE FROM entity_extension_time_series
WHERE extension = 'reportData.reportDataResult';
COMMIT;

START TRANSACTION;
-- Create profiler data time series table and move data from entity_extension_time_series
CREATE TABLE IF NOT EXISTS profiler_data_time_series (
entityFQNHash VARCHAR(768) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
extension VARCHAR(256) NOT NULL,
jsonSchema VARCHAR(256) NOT NULL,
json JSON NOT NULL,
operation VARCHAR(256) GENERATED ALWAYS AS (json ->> '$.operation') NULL,
timestamp BIGINT UNSIGNED GENERATED ALWAYS AS (json ->> '$.timestamp') NOT NULL,
UNIQUE profiler_data_time_series_unique_hash_extension_ts (entityFQNHash, extension, operation, timestamp),
INDEX profiler_data_time_series_combined_id_ts (extension, timestamp)
);

INSERT INTO profiler_data_time_series (entityFQNHash,extension,jsonSchema,json)
SELECT entityFQNHash, extension, jsonSchema, json
FROM entity_extension_time_series
WHERE extension IN ('table.columnProfile', 'table.tableProfile', 'table.systemProfile');

DELETE FROM entity_extension_time_series
WHERE extension IN ('table.columnProfile', 'table.tableProfile', 'table.systemProfile');
COMMIT;

START TRANSACTION;
-- Create data quality data time series table and move data from entity_extension_time_series
CREATE TABLE IF NOT EXISTS data_quality_data_time_series (
entityFQNHash VARCHAR(768) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
extension VARCHAR(256) NOT NULL,
jsonSchema VARCHAR(256) NOT NULL,
json JSON NOT NULL,
timestamp BIGINT UNSIGNED GENERATED ALWAYS AS (json ->> '$.timestamp') NOT NULL,
UNIQUE data_quality_data_time_series_unique_hash_extension_ts (entityFQNHash, extension, timestamp),
INDEX data_quality_data_time_series_combined_id_ts (extension, timestamp)
);

INSERT INTO data_quality_data_time_series (entityFQNHash,extension,jsonSchema,json)
SELECT entityFQNHash, extension, jsonSchema, json
FROM entity_extension_time_series
WHERE extension = 'testCase.testCaseResult';

DELETE FROM entity_extension_time_series
WHERE extension = 'testCase.testCaseResult';
COMMIT;

ALTER TABLE automations_workflow MODIFY COLUMN nameHash VARCHAR(256) COLLATE ascii_bin,MODIFY COLUMN workflowType VARCHAR(256) COLLATE ascii_bin, MODIFY COLUMN status VARCHAR(256) COLLATE ascii_bin;
ALTER TABLE entity_extension MODIFY COLUMN extension VARCHAR(256) COLLATE ascii_bin;
ALTER TABLE entity_extension_time_series MODIFY COLUMN entityFQNHash VARCHAR(768) COLLATE ascii_bin, MODIFY COLUMN jsonSchema VARCHAR(50) COLLATE ascii_bin, MODIFY COLUMN extension VARCHAR(100) COLLATE ascii_bin,
ADD CONSTRAINT entity_extension_time_series_constraint UNIQUE (entityFQNHash, extension, timestamp);
ALTER TABLE field_relationship MODIFY COLUMN fromFQNHash VARCHAR(768) COLLATE ascii_bin, MODIFY COLUMN toFQNHash VARCHAR(768) COLLATE ascii_bin;
ALTER TABLE thread_entity MODIFY COLUMN entityLink VARCHAR(3072) GENERATED ALWAYS AS (json ->> '$.about') NOT NULL, MODIFY COLUMN createdBy VARCHAR(256) GENERATED ALWAYS AS (json ->> '$.createdBy') STORED NOT NULL COLLATE ascii_bin;
ALTER TABLE thread_entity MODIFY COLUMN entityLink VARCHAR(3072) GENERATED ALWAYS AS (json ->> '$.about') NOT NULL;
ALTER TABLE event_subscription_entity MODIFY COLUMN nameHash VARCHAR(256) COLLATE ascii_bin;
ALTER TABLE ingestion_pipeline_entity MODIFY COLUMN fqnHash VARCHAR(768) COLLATE ascii_bin;
ALTER TABLE bot_entity MODIFY COLUMN nameHash VARCHAR(256) COLLATE ascii_bin;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
BEGIN;
-- We'll rank all the runs (timestamps) for every day, and delete all the data but the most recent one.
DELETE FROM report_data_time_series WHERE (json ->> 'id') IN (
select ids FROM (
SELECT
(json ->> 'id') AS ids,
DENSE_RANK() OVER(PARTITION BY date ORDER BY timestamp DESC) as denseRank
FROM (
SELECT
*,
DATE(TO_TIMESTAMP((json ->> 'timestamp')::bigint/1000)) as date
FROM report_data_time_series rdts
WHERE json ->> 'reportDataType' = 'WebAnalyticEntityViewReportData'
) duplicates
ORDER BY date DESC, timestamp DESC
) as dense_ranked
WHERE denseRank != 1
);

DELETE FROM report_data_time_series WHERE (json ->> 'id') IN (
select ids FROM (
SELECT
(json ->> 'id') AS ids,
DENSE_RANK() OVER(PARTITION BY date ORDER BY timestamp DESC) as denseRank
FROM (
SELECT
*,
DATE(TO_TIMESTAMP((json ->> 'timestamp')::bigint/1000)) as date
FROM report_data_time_series rdts
WHERE json ->> 'reportDataType' = 'EntityReportData'
) duplicates
ORDER BY date DESC, timestamp DESC
) as dense_ranked
WHERE denseRank != 1
);

DELETE FROM report_data_time_series WHERE (json ->> 'id') IN (
select ids FROM (
SELECT
(json ->> 'id') AS ids,
DENSE_RANK() OVER(PARTITION BY date ORDER BY timestamp DESC) as denseRank
FROM (
SELECT
*,
DATE(TO_TIMESTAMP((json ->> 'timestamp')::bigint/1000)) as date
FROM report_data_time_series rdts
WHERE json ->> 'reportDataType' = 'WebAnalyticUserActivityReportData'
) duplicates
ORDER BY date DESC, timestamp DESC
) as dense_ranked
WHERE denseRank != 1
);
COMMIT;
Loading
Loading