Correct usage in athena.to_iceberg for keep_files option #2756
-
I would like to understand what the keep_files parameter is used for in the athena.to_iceberg(). The parameter is set to True by default and when using the append mode it is creating duplicate records in the target iceberg table. So with each insert, the old temp records are inserted again. Is this expected behaviour or a bug? I'll also check for the overwrite mode but my guess is, it is the same. Should the keep_files rather be false by default? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @aockel Yes, in most cases you would want that set to |
Beta Was this translation helpful? Give feedback.
Hi @aockel
keep_files
controls whether staging files produced by Athena are retained in the temp path. Default isTrue
which is consistent with other SDK for pandas Athena calls such as read_sql_query.Yes, in most cases you would want that set to
False
but we want users to explicitly opt-in for cleaning the temp path to avoid unintended impact, for example, if the same temp path is used by another process.