Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_missing_column_to_table fails as the quotation marks mismatch #469

Closed
1 task
k-schmid opened this issue Oct 6, 2023 · 6 comments · Fixed by #470
Closed
1 task

add_missing_column_to_table fails as the quotation marks mismatch #469

k-schmid opened this issue Oct 6, 2023 · 6 comments · Fixed by #470
Assignees
Labels
🐛 bug Something isn't working

Comments

@k-schmid
Copy link

k-schmid commented Oct 6, 2023

Description of the issue

When downloading (and extracting) the latest XML using the bulk download a stumble in to an error regarding new columns. The method add_missing_column_to_table tries to split the psycopg2 undefined column error message such that a column name is found. However the splitting argument "»" and "«" cannot be found.

See the full traceback:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2100, in _exec_insertmany_context
    dialect.do_execute(cursor, sub_stmt, sub_params, context)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 747, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.UndefinedColumn: column "DatumEndgueltigeStilllegung" of relation "gas_producer" does not exist
LINE 1: ...gDatum", "DatumBeginnVoruebergehendeStilllegung", "DatumEndg...
                                                             ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/open_mastr/xml_download/utils_write_to_database.py", line 206, in add_table_to_database
    df.to_sql(
  File "/usr/local/lib/python3.10/site-packages/pandas/util/_decorators.py", line 333, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/generic.py", line 3008, in to_sql
    return sql.to_sql(
  File "/usr/local/lib/python3.10/site-packages/pandas/io/sql.py", line 788, in to_sql
    return pandas_sql.to_sql(
  File "/usr/local/lib/python3.10/site-packages/pandas/io/sql.py", line 1958, in to_sql
    total_inserted = sql_engine.insert_records(
  File "/usr/local/lib/python3.10/site-packages/pandas/io/sql.py", line 1507, in insert_records
    raise err
  File "/usr/local/lib/python3.10/site-packages/pandas/io/sql.py", line 1498, in insert_records
    return table.insert(chunksize=chunksize, method=method)
  File "/usr/local/lib/python3.10/site-packages/pandas/io/sql.py", line 1059, in insert
    num_inserted = exec_insert(conn, keys, chunk_iter)
  File "/usr/local/lib/python3.10/site-packages/pandas/io/sql.py", line 951, in _execute_insert
    result = conn.execute(self.table.insert(), data)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1414, in execute
    return meth(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 485, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1638, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1837, in _execute_context
    return self._exec_insertmany_context(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2103, in _exec_insertmany_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2325, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2100, in _exec_insertmany_context
    dialect.do_execute(cursor, sub_stmt, sub_params, context)
  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 747, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn) column "DatumEndgueltigeStilllegung" of relation "gas_producer" does not exist
LINE 1: ...gDatum", "DatumBeginnVoruebergehendeStilllegung", "DatumEndg...
                                                             ^

[SQL: INSERT INTO gas_producer ("EinheitMastrNummer", "DatumLetzteAktualisierung", "LokationMastrNummer", "NetzbetreiberpruefungStatus", "AnlagenbetreiberMastrNummer", "Land", "Bundesland", "Landkreis", "Gemeinde", "Gemeindeschluessel", "Postleitzahl", "Or ... 368380 characters truncated ... lllegung__326)s, %(DatumEndgueltigeStilllegung__326)s, %(DatenQuelle__326)s, %(DatumDownload__326)s)]
[parameters: {'Land__0': 'Deutschland', 'NetzbetreiberpruefungDatum__0': None, 'NameGaserzeugungseinheit__0': 'KGE-Erdgasspeicher', 'DatumBeginnVoruebergehendeStilllegung__0': None, 'StrasseNichtGefunden__0': None, 'LokationMastrNummer__0': 'GEL903981660167', 'NichtVorhandenInMigriertenEinheiten__0': 0, 'Adresszusatz__0': None, 'Gemeinde__0': 'Gronau (Westf.)', 'Bundesland__0': 'Nordrhein-Westfalen', 'Hausnummer_nv__0': None, 'Postleitzahl__0': '48599', 'DatumLetzteAktualisierung__0': datetime.datetime(2019, 10, 22, 7, 51, 34, 20038), 'Ort__0': 'Gronau', 'AnlagenbetreiberMastrNummer__0': 'ABR920698896597', 'DatumRegistrierungDesBetreiberwechsels__0': None, 'Breitengrad__0': None, 'Gemeindeschluessel__0': '05554020', 'EinheitSystemstatus__0': 'Aktiviert', 'NetzbetreiberpruefungStatus__0': True, 'Landkreis__0': 'Borken', 'GeplantesInbetriebnahmedatum__0': None, 'Technologie__0': None, 'Gemarkung__0': None, 'DatumEndgueltigeStilllegung__0': None, 'Inbetriebnahmedatum__0': datetime.datetime(2012, 10, 1, 0, 0), 'Laengengrad__0': None, 'DatenQuelle__0': 'bulk', 'Hausnummer__0': None, 'DatumDesBetreiberwechsels__0': None, 'EinheitBetriebsstatus__0': 'In Betrieb', 'SpeicherMastrNummer__0': 'GSE917820127322', 'DatumDownload__0': datetime.datetime(2023, 10, 6, 0, 0), 'FlurFlurstuecknummern__0': None, 'Strasse__0': None, 'Erzeugungsleistung__0': None, 'Registrierungsdatum__0': datetime.datetime(2019, 5, 20, 0, 0), 'EinheitMastrNummer__0': 'GEE995046355477', 'HausnummerNichtGefunden__0': None, 'Land__1': 'Deutschland', 'NetzbetreiberpruefungDatum__1': None, 'NameGaserzeugungseinheit__1': 'Trianel Gasspeicher Epe', 'DatumBeginnVoruebergehendeStilllegung__1': None, 'StrasseNichtGefunden__1': None, 'LokationMastrNummer__1': 'GEL963638601943', 'NichtVorhandenInMigriertenEinheiten__1': 0, 'Adresszusatz__1': None, 'Gemeinde__1': 'Gronau (Westf.)', 'Bundesland__1': 'Nordrhein-Westfalen', 'Hausnummer_nv__1': None ... 12653 parameters truncated ... 'Hausnummer__325': None, 'DatumDesBetreiberwechsels__325': None, 'EinheitBetriebsstatus__325': 'In Betrieb', 'SpeicherMastrNummer__325': None, 'DatumDownload__325': datetime.datetime(2023, 10, 6, 0, 0), 'FlurFlurstuecknummern__325': '334', 'Strasse__325': None, 'Erzeugungsleistung__325': 2750.0, 'Registrierungsdatum__325': datetime.datetime(2023, 9, 1, 0, 0), 'EinheitMastrNummer__325': 'GEE937644281314', 'HausnummerNichtGefunden__325': 0.0, 'Land__326': 'Deutschland', 'NetzbetreiberpruefungDatum__326': None, 'NameGaserzeugungseinheit__326': 'Bioraffinerie Kanalhafen', 'DatumBeginnVoruebergehendeStilllegung__326': None, 'StrasseNichtGefunden__326': 0.0, 'LokationMastrNummer__326': None, 'NichtVorhandenInMigriertenEinheiten__326': 0, 'Adresszusatz__326': None, 'Gemeinde__326': 'Rheine', 'Bundesland__326': 'Nordrhein-Westfalen', 'Hausnummer_nv__326': 0.0, 'Postleitzahl__326': '48432', 'DatumLetzteAktualisierung__326': datetime.datetime(2023, 10, 5, 10, 5, 10, 900993), 'Ort__326': 'Rheine', 'AnlagenbetreiberMastrNummer__326': 'ABR928065335663', 'DatumRegistrierungDesBetreiberwechsels__326': None, 'Breitengrad__326': 52.297882, 'Gemeindeschluessel__326': '05566076', 'EinheitSystemstatus__326': 'Aktiviert', 'NetzbetreiberpruefungStatus__326': False, 'Landkreis__326': 'Steinfurt', 'GeplantesInbetriebnahmedatum__326': datetime.datetime(2023, 10, 16, 0, 0), 'Technologie__326': 'Biomethan-Erzeugung', 'Gemarkung__326': None, 'DatumEndgueltigeStilllegung__326': None, 'Inbetriebnahmedatum__326': None, 'Laengengrad__326': 7.498307, 'DatenQuelle__326': 'bulk', 'Hausnummer__326': '111', 'DatumDesBetreiberwechsels__326': None, 'EinheitBetriebsstatus__326': 'In Planung', 'SpeicherMastrNummer__326': None, 'DatumDownload__326': datetime.datetime(2023, 10, 6, 0, 0), 'FlurFlurstuecknummern__326': None, 'Strasse__326': 'Kanalstr.', 'Erzeugungsleistung__326': 15420.0, 'Registrierungsdatum__326': datetime.datetime(2023, 10, 5, 0, 0), 'EinheitMastrNummer__326': 'GEE904412023018', 'HausnummerNichtGefunden__326': 1.0}]
(Background on this error at: https://sqlalche.me/e/20/f405)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/project/Pipelines/mastr_to_db/main.py", line 104, in <module>
    raise e
  File "/opt/project/Pipelines/mastr_to_db/main.py", line 101, in <module>
    main()
  File "/opt/project/Pipelines/mastr_to_db/main.py", line 71, in main
    db.download(data=[
  File "/usr/local/lib/python3.10/site-packages/open_mastr/mastr.py", line 231, in download
    write_mastr_xml_to_database(
  File "/usr/local/lib/python3.10/site-packages/open_mastr/xml_download/utils_write_to_database.py", line 60, in write_mastr_xml_to_database
    add_table_to_database(
  File "/usr/local/lib/python3.10/site-packages/open_mastr/xml_download/utils_write_to_database.py", line 218, in add_table_to_database
    add_missing_column_to_table(err, engine, xml_tablename)
  File "/usr/local/lib/python3.10/site-packages/open_mastr/xml_download/utils_write_to_database.py", line 324, in add_missing_column_to_table
    missing_column = err.args[0].split("»")[1].split("«")[0]
IndexError: list index out of range

Ideas of solution

I think the string splitting should be replaced by a more robust approach. E.g. check which columns are already in the table and which are assumed to be there prior to writing to the table. There are some approach listed here:
https://www.geeksforgeeks.org/get-column-names-from-postgresql-table-using-psycopg2/

Context and Environment

  • Version used:
  • open-mastr==0.13.2
  • SQLAlchemy==2.0.3
  • Operating system: Docker on iOS
  • Environment setup and (python) version: 3.10.8

Workflow checklist

@FlorianK13
Copy link
Member

@k-schmid I remember when writing that code that I was definetly not proud of this 'quick fix' solution - it had to break some day.
I'll look into it.

@FlorianK13 FlorianK13 self-assigned this Oct 6, 2023
@FlorianK13 FlorianK13 added the 🐛 bug Something isn't working label Oct 6, 2023
FlorianK13 added a commit that referenced this issue Oct 6, 2023
Instead of getting the column name from the
error message, the missing columns are now received by comparing
the existing database table with the new dataframe.
@FlorianK13
Copy link
Member

@k-schmid #470 should solve the error

@k-schmid
Copy link
Author

k-schmid commented Oct 6, 2023

@FlorianK13 thanks a lot for the fast response.

FlorianK13 added a commit that referenced this issue Oct 6, 2023
Better have a for loop with a large number which will still stop
at some point.
FlorianK13 added a commit that referenced this issue Oct 6, 2023
Additionally rename function and change
while True loop to for loop
FlorianK13 added a commit that referenced this issue Oct 10, 2023
@FlorianK13 FlorianK13 linked a pull request Nov 2, 2023 that will close this issue
5 tasks
@k-schmid
Copy link
Author

@FlorianK13 any updates on this?

@FlorianK13
Copy link
Member

@chrwm Are we ready for the 0.14.0 release?

chrwm added a commit that referenced this issue Nov 20, 2023
…g_column_to_table

Alter way to catch missing columns error #469
@FlorianK13
Copy link
Member

@k-schmid We have just release v0.14.0 and it should be fixed now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working
Projects
None yet
2 participants