Skip to content

Commit

Permalink
Merge pull request #149 from george0st/change
Browse files Browse the repository at this point in the history
Use parquets with compression
  • Loading branch information
george0st authored Apr 16, 2024
2 parents 62b1521 + 5a024b8 commit 1ae4029
Show file tree
Hide file tree
Showing 32 changed files with 572 additions and 581 deletions.
Binary file modified 02-data/01-size-100/01-basic-party.csv.gz
Binary file not shown.
Binary file modified 02-data/01-size-100/01-basic-party.parquet
Binary file not shown.
Binary file modified 02-data/01-size-100/02-basic-contact.csv.gz
Binary file not shown.
Binary file modified 02-data/01-size-100/02-basic-contact.parquet
Binary file not shown.
Binary file modified 02-data/01-size-100/03-basic-relation.csv.gz
Binary file not shown.
Binary file modified 02-data/01-size-100/03-basic-relation.parquet
Binary file not shown.
Binary file modified 02-data/01-size-100/04-basic-account.csv.gz
Binary file not shown.
Binary file modified 02-data/01-size-100/04-basic-account.parquet
Binary file not shown.
Binary file modified 02-data/01-size-100/05-basic-transaction.csv.gz
Binary file not shown.
Binary file modified 02-data/01-size-100/05-basic-transaction.parquet
Binary file not shown.
Binary file modified 02-data/01-size-100/06-basic-event.csv.gz
Binary file not shown.
Binary file modified 02-data/01-size-100/06-basic-event.parquet
Binary file not shown.
Binary file modified 02-data/01-size-100/07-basic-communication.csv.gz
Binary file not shown.
Binary file modified 02-data/01-size-100/07-basic-communication.parquet
Binary file not shown.
Binary file modified 02-data/02-size-1K/01-basic-party.csv.gz
Binary file not shown.
Binary file modified 02-data/02-size-1K/01-basic-party.parquet
Binary file not shown.
Binary file modified 02-data/02-size-1K/02-basic-contact.csv.gz
Binary file not shown.
Binary file modified 02-data/02-size-1K/02-basic-contact.parquet
Binary file not shown.
Binary file modified 02-data/02-size-1K/03-basic-relation.csv.gz
Binary file not shown.
Binary file modified 02-data/02-size-1K/03-basic-relation.parquet
Binary file not shown.
Binary file modified 02-data/02-size-1K/04-basic-account.csv.gz
Binary file not shown.
Binary file modified 02-data/02-size-1K/04-basic-account.parquet
Binary file not shown.
Binary file modified 02-data/02-size-1K/05-basic-transaction.csv.gz
Binary file not shown.
Binary file modified 02-data/02-size-1K/05-basic-transaction.parquet
Binary file not shown.
Binary file modified 02-data/02-size-1K/06-basic-event.csv.gz
Binary file not shown.
Binary file modified 02-data/02-size-1K/06-basic-event.parquet
Binary file not shown.
Binary file modified 02-data/02-size-1K/07-basic-communication.csv.gz
Binary file not shown.
Binary file modified 02-data/02-size-1K/07-basic-communication.parquet
Binary file not shown.
566 changes: 279 additions & 287 deletions 03-test/01-size-100.json

Large diffs are not rendered by default.

582 changes: 291 additions & 291 deletions 03-test/02-size-1k.json

Large diffs are not rendered by default.

3 changes: 1 addition & 2 deletions docs/todo_list.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,4 @@

The list of expected/future improvements:

1. Change 'target' in project to 'adapter' or 'replace'
. - Motivation: Ability to use for Target or Source also
1. Add ability to select type of output format (CSV or parquet) in commandline)
2 changes: 1 addition & 1 deletion generator/base_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ def save(self, path, append: bool, dir: str, compress: bool):
# write parquet
table = pa.Table.from_pandas(df)
if self._parquet_writer is None:
self._parquet_writer = pq.ParquetWriter(os.path.join(path, f"{self.name}.parquet"), table.schema)
self._parquet_writer = pq.ParquetWriter(os.path.join(path, f"{self.name}.parquet"), table.schema, compression=compression_opts)
self._parquet_writer.write_table(table=table)

# free memory
Expand Down

0 comments on commit 1ae4029

Please sign in to comment.