add_missing_column_to_table fails as the quotation marks mismatch #469

k-schmid · 2023-10-06T10:04:43Z

Description of the issue

When downloading (and extracting) the latest XML using the bulk download a stumble in to an error regarding new columns. The method add_missing_column_to_table tries to split the psycopg2 undefined column error message such that a column name is found. However the splitting argument "»" and "«" cannot be found.

See the full traceback:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2100, in _exec_insertmany_context
    dialect.do_execute(cursor, sub_stmt, sub_params, context)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 747, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.UndefinedColumn: column "DatumEndgueltigeStilllegung" of relation "gas_producer" does not exist
LINE 1: ...gDatum", "DatumBeginnVoruebergehendeStilllegung", "DatumEndg...
                                                             ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/open_mastr/xml_download/utils_write_to_database.py", line 206, in add_table_to_database
    df.to_sql(
  File "/usr/local/lib/python3.10/site-packages/pandas/util/_decorators.py", line 333, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/generic.py", line 3008, in to_sql
    return sql.to_sql(
  File "/usr/local/lib/python3.10/site-packages/pandas/io/sql.py", line 788, in to_sql
    return pandas_sql.to_sql(
  File "/usr/local/lib/python3.10/site-packages/pandas/io/sql.py", line 1958, in to_sql
    total_inserted = sql_engine.insert_records(
  File "/usr/local/lib/python3.10/site-packages/pandas/io/sql.py", line 1507, in insert_records
    raise err
  File "/usr/local/lib/python3.10/site-packages/pandas/io/sql.py", line 1498, in insert_records
    return table.insert(chunksize=chunksize, method=method)
  File "/usr/local/lib/python3.10/site-packages/pandas/io/sql.py", line 1059, in insert
    num_inserted = exec_insert(conn, keys, chunk_iter)
  File "/usr/local/lib/python3.10/site-packages/pandas/io/sql.py", line 951, in _execute_insert
    result = conn.execute(self.table.insert(), data)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1414, in execute
    return meth(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 485, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1638, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1837, in _execute_context
    return self._exec_insertmany_context(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2103, in _exec_insertmany_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2325, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2100, in _exec_insertmany_context
    dialect.do_execute(cursor, sub_stmt, sub_params, context)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 747, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn) column "DatumEndgueltigeStilllegung" of relation "gas_producer" does not exist
LINE 1: ...gDatum", "DatumBeginnVoruebergehendeStilllegung", "DatumEndg...
                                                             ^

[SQL: INSERT INTO gas_producer ("EinheitMastrNummer", "DatumLetzteAktualisierung", "LokationMastrNummer", "NetzbetreiberpruefungStatus", "AnlagenbetreiberMastrNummer", "Land", "Bundesland", "Landkreis", "Gemeinde", "Gemeindeschluessel", "Postleitzahl", "Or ... 368380 characters truncated ... lllegung__326)s, %(DatumEndgueltigeStilllegung__326)s, %(DatenQuelle__326)s, %(DatumDownload__326)s)]
[parameters: {'Land__0': 'Deutschland', 'NetzbetreiberpruefungDatum__0': None, 'NameGaserzeugungseinheit__0': 'KGE-Erdgasspeicher', 'DatumBeginnVoruebergehendeStilllegung__0': None, 'StrasseNichtGefunden__0': None, 'LokationMastrNummer__0': 'GEL903981660167', 'NichtVorhandenInMigriertenEinheiten__0': 0, 'Adresszusatz__0': None, 'Gemeinde__0': 'Gronau (Westf.)', 'Bundesland__0': 'Nordrhein-Westfalen', 'Hausnummer_nv__0': None, 'Postleitzahl__0': '48599', 'DatumLetzteAktualisierung__0': datetime.datetime(2019, 10, 22, 7, 51, 34, 20038), 'Ort__0': 'Gronau', 'AnlagenbetreiberMastrNummer__0': 'ABR920698896597', 'DatumRegistrierungDesBetreiberwechsels__0': None, 'Breitengrad__0': None, 'Gemeindeschluessel__0': '05554020', 'EinheitSystemstatus__0': 'Aktiviert', 'NetzbetreiberpruefungStatus__0': True, 'Landkreis__0': 'Borken', 'GeplantesInbetriebnahmedatum__0': None, 'Technologie__0': None, 'Gemarkung__0': None, 'DatumEndgueltigeStilllegung__0': None, 'Inbetriebnahmedatum__0': datetime.datetime(2012, 10, 1, 0, 0), 'Laengengrad__0': None, 'DatenQuelle__0': 'bulk', 'Hausnummer__0': None, 'DatumDesBetreiberwechsels__0': None, 'EinheitBetriebsstatus__0': 'In Betrieb', 'SpeicherMastrNummer__0': 'GSE917820127322', 'DatumDownload__0': datetime.datetime(2023, 10, 6, 0, 0), 'FlurFlurstuecknummern__0': None, 'Strasse__0': None, 'Erzeugungsleistung__0': None, 'Registrierungsdatum__0': datetime.datetime(2019, 5, 20, 0, 0), 'EinheitMastrNummer__0': 'GEE995046355477', 'HausnummerNichtGefunden__0': None, 'Land__1': 'Deutschland', 'NetzbetreiberpruefungDatum__1': None, 'NameGaserzeugungseinheit__1': 'Trianel Gasspeicher Epe', 'DatumBeginnVoruebergehendeStilllegung__1': None, 'StrasseNichtGefunden__1': None, 'LokationMastrNummer__1': 'GEL963638601943', 'NichtVorhandenInMigriertenEinheiten__1': 0, 'Adresszusatz__1': None, 'Gemeinde__1': 'Gronau (Westf.)', 'Bundesland__1': 'Nordrhein-Westfalen', 'Hausnummer_nv__1': None ... 12653 parameters truncated ... 'Hausnummer__325': None, 'DatumDesBetreiberwechsels__325': None, 'EinheitBetriebsstatus__325': 'In Betrieb', 'SpeicherMastrNummer__325': None, 'DatumDownload__325': datetime.datetime(2023, 10, 6, 0, 0), 'FlurFlurstuecknummern__325': '334', 'Strasse__325': None, 'Erzeugungsleistung__325': 2750.0, 'Registrierungsdatum__325': datetime.datetime(2023, 9, 1, 0, 0), 'EinheitMastrNummer__325': 'GEE937644281314', 'HausnummerNichtGefunden__325': 0.0, 'Land__326': 'Deutschland', 'NetzbetreiberpruefungDatum__326': None, 'NameGaserzeugungseinheit__326': 'Bioraffinerie Kanalhafen', 'DatumBeginnVoruebergehendeStilllegung__326': None, 'StrasseNichtGefunden__326': 0.0, 'LokationMastrNummer__326': None, 'NichtVorhandenInMigriertenEinheiten__326': 0, 'Adresszusatz__326': None, 'Gemeinde__326': 'Rheine', 'Bundesland__326': 'Nordrhein-Westfalen', 'Hausnummer_nv__326': 0.0, 'Postleitzahl__326': '48432', 'DatumLetzteAktualisierung__326': datetime.datetime(2023, 10, 5, 10, 5, 10, 900993), 'Ort__326': 'Rheine', 'AnlagenbetreiberMastrNummer__326': 'ABR928065335663', 'DatumRegistrierungDesBetreiberwechsels__326': None, 'Breitengrad__326': 52.297882, 'Gemeindeschluessel__326': '05566076', 'EinheitSystemstatus__326': 'Aktiviert', 'NetzbetreiberpruefungStatus__326': False, 'Landkreis__326': 'Steinfurt', 'GeplantesInbetriebnahmedatum__326': datetime.datetime(2023, 10, 16, 0, 0), 'Technologie__326': 'Biomethan-Erzeugung', 'Gemarkung__326': None, 'DatumEndgueltigeStilllegung__326': None, 'Inbetriebnahmedatum__326': None, 'Laengengrad__326': 7.498307, 'DatenQuelle__326': 'bulk', 'Hausnummer__326': '111', 'DatumDesBetreiberwechsels__326': None, 'EinheitBetriebsstatus__326': 'In Planung', 'SpeicherMastrNummer__326': None, 'DatumDownload__326': datetime.datetime(2023, 10, 6, 0, 0), 'FlurFlurstuecknummern__326': None, 'Strasse__326': 'Kanalstr.', 'Erzeugungsleistung__326': 15420.0, 'Registrierungsdatum__326': datetime.datetime(2023, 10, 5, 0, 0), 'EinheitMastrNummer__326': 'GEE904412023018', 'HausnummerNichtGefunden__326': 1.0}]
(Background on this error at: https://sqlalche.me/e/20/f405)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/project/Pipelines/mastr_to_db/main.py", line 104, in <module>
    raise e
  File "/opt/project/Pipelines/mastr_to_db/main.py", line 101, in <module>
    main()
  File "/opt/project/Pipelines/mastr_to_db/main.py", line 71, in main
    db.download(data=[
  File "/usr/local/lib/python3.10/site-packages/open_mastr/mastr.py", line 231, in download
    write_mastr_xml_to_database(
  File "/usr/local/lib/python3.10/site-packages/open_mastr/xml_download/utils_write_to_database.py", line 60, in write_mastr_xml_to_database
    add_table_to_database(
  File "/usr/local/lib/python3.10/site-packages/open_mastr/xml_download/utils_write_to_database.py", line 218, in add_table_to_database
    add_missing_column_to_table(err, engine, xml_tablename)
  File "/usr/local/lib/python3.10/site-packages/open_mastr/xml_download/utils_write_to_database.py", line 324, in add_missing_column_to_table
    missing_column = err.args[0].split("»")[1].split("«")[0]
IndexError: list index out of range

Ideas of solution

I think the string splitting should be replaced by a more robust approach. E.g. check which columns are already in the table and which are assumed to be there prior to writing to the table. There are some approach listed here:
https://www.geeksforgeeks.org/get-column-names-from-postgresql-table-using-psycopg2/

Context and Environment

Version used:

open-mastr==0.13.2
SQLAlchemy==2.0.3

Operating system: Docker on iOS
Environment setup and (python) version: 3.10.8

Workflow checklist

I am aware of the workflow in CONTRIBUTING.md

The text was updated successfully, but these errors were encountered:

FlorianK13 · 2023-10-06T11:04:23Z

@k-schmid I remember when writing that code that I was definetly not proud of this 'quick fix' solution - it had to break some day.
I'll look into it.

Instead of getting the column name from the error message, the missing columns are now received by comparing the existing database table with the new dataframe.

FlorianK13 · 2023-10-06T12:33:13Z

@k-schmid #470 should solve the error

k-schmid · 2023-10-06T12:46:35Z

@FlorianK13 thanks a lot for the fast response.

Better have a for loop with a large number which will still stop at some point.

Additionally rename function and change while True loop to for loop

k-schmid · 2023-11-14T18:56:29Z

@FlorianK13 any updates on this?

FlorianK13 · 2023-11-16T07:47:03Z

@chrwm Are we ready for the 0.14.0 release?

…g_column_to_table Alter way to catch missing columns error #469

FlorianK13 · 2023-11-22T13:10:33Z

@k-schmid We have just release v0.14.0 and it should be fixed now

FlorianK13 self-assigned this Oct 6, 2023

FlorianK13 added the 🐛 bug Something isn't working label Oct 6, 2023

FlorianK13 added a commit that referenced this issue Oct 6, 2023

Alter way to catch missing columns error #469

0be8d9f

Instead of getting the column name from the error message, the missing columns are now received by comparing the existing database table with the new dataframe.

FlorianK13 added a commit that referenced this issue Oct 6, 2023

Delete while True loop #469

ea3bbb5

Better have a for loop with a large number which will still stop at some point.

FlorianK13 added a commit that referenced this issue Oct 6, 2023

Move add_missing_columns outside of loop #469

5163caa

Additionally rename function and change while True loop to for loop

FlorianK13 added a commit that referenced this issue Oct 10, 2023

Pass columns list instead of df #469

ef418eb

FlorianK13 linked a pull request Nov 2, 2023 that will close this issue

Alter way to catch missing columns error #469 #470

Merged

5 tasks

chrwm added a commit that referenced this issue Nov 20, 2023

Merge pull request #470 from OpenEnergyPlatform/bugfix-469-add_missin…

4fc6d35

…g_column_to_table Alter way to catch missing columns error #469

FlorianK13 closed this as completed Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add_missing_column_to_table fails as the quotation marks mismatch #469

add_missing_column_to_table fails as the quotation marks mismatch #469

k-schmid commented Oct 6, 2023

FlorianK13 commented Oct 6, 2023

FlorianK13 commented Oct 6, 2023

k-schmid commented Oct 6, 2023

k-schmid commented Nov 14, 2023

FlorianK13 commented Nov 16, 2023

FlorianK13 commented Nov 22, 2023

add_missing_column_to_table fails as the quotation marks mismatch #469

add_missing_column_to_table fails as the quotation marks mismatch #469

Comments

k-schmid commented Oct 6, 2023

Description of the issue

Ideas of solution

Context and Environment

Workflow checklist

FlorianK13 commented Oct 6, 2023

FlorianK13 commented Oct 6, 2023

k-schmid commented Oct 6, 2023

k-schmid commented Nov 14, 2023

FlorianK13 commented Nov 16, 2023

FlorianK13 commented Nov 22, 2023