FIX: Load _history tables from temp table #29

rymarczy · 2024-11-08T13:46:04Z

This change modifies the loading of _history table.

Rather than load a CSV directly into the _histroy table, we will first load records into the _load table and then INSERT into the _history table with a ON CONFLICT query.

This will make _history table loads idempotent, in case the same history record is loaded more than once.

grejdi-mbta · 2024-11-08T14:02:13Z

src/cubic_loader/qlik/ods_qlik.py

            cdc_df = dataframe_from_merged_csv(merge_csv, dfm_object)

-            key_columns = [col["name"].lower() for col in self.etl_status.last_schema if col["primaryKeyPos"] > 0]
+            # Load records into _history table


update comment here.
q: is _load a new type of table at this point? we need to communicate to folks who are using the database what it is, so they can ignore if need be.

If it is just a table to load stuff, maybe truncating at the end as well? I want it to be obvious that it shouldn't be used by data analysts.

It's sort of a "temp" table. It gets created by the dmap_import application when it starts to load a QLIK table and then gets dropped at the end. So DB users would only ever see it while the ETL operations is actively running.

rymarczy · 2024-11-08T14:56:35Z

src/cubic_loader/qlik/ods_qlik.py

@@ -81,7 +81,7 @@ def get_cdc_gz_csvs(etl_status: TableStatus, table: str) -> List[str]:
    cdc_csvs = s3_list_cdc_gz_objects(S3_ARCHIVE, snapshot_prefix, min_ts=etl_status.last_cdc_ts)

    # filter error files from table folder
-    for csv_file in s3_list_cdc_gz_objects(S3_ERROR, table, min_ts=etl_status.last_cdc_ts):


Added the fix for the S3 Permission error we were seeing as well

load history table from temp

8a79735

rymarczy requested a review from grejdi-mbta November 8, 2024 13:46

grejdi-mbta reviewed Nov 8, 2024

View reviewed changes

fix error prefix

4e021ef

rymarczy commented Nov 8, 2024

View reviewed changes

rymarczy requested a review from grejdi-mbta November 8, 2024 14:56

grejdi-mbta approved these changes Nov 8, 2024

View reviewed changes

rymarczy merged commit a02fdea into main Nov 8, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Load _history tables from temp table #29

FIX: Load _history tables from temp table #29

rymarczy commented Nov 8, 2024

grejdi-mbta Nov 8, 2024

grejdi-mbta Nov 8, 2024

rymarczy Nov 8, 2024

rymarczy Nov 8, 2024

FIX: Load _history tables from temp table #29

FIX: Load _history tables from temp table #29

Conversation

rymarczy commented Nov 8, 2024

grejdi-mbta Nov 8, 2024

Choose a reason for hiding this comment

grejdi-mbta Nov 8, 2024

Choose a reason for hiding this comment

rymarczy Nov 8, 2024

Choose a reason for hiding this comment

rymarczy Nov 8, 2024

Choose a reason for hiding this comment