-
Notifications
You must be signed in to change notification settings - Fork 7
Home
Sam Webster edited this page Sep 22, 2022
·
13 revisions
Clone these repos next to this one:
- https://github.com/samwebster/times-ireland-model.git (a clone of the original with a few small changes to make the output better mtach that in the next repo)
- https://github.com/MaREI-EPMG/times-ireland-model_gams.git
Currently, the main script expects the ground truth to be in CSV format so run python utils/dd_to_csv.py ../times-ireland-model_gams/model ground_truth
Now you should be able to run python times_excel_reader.py
to generate output.
NOTE: the script currently uses a pickle file
raw_tables.pkl
to skip the slow reading of Excel files during development. If the input files change in any way, remember to delete the pickle to force a re-read.
To view dataframe contents when debugging:
- Evaluate the dataframe variable in the immediate window/debug console
To compare output csv to ground truth:
- Use
output/*_missing.csv
andoutput/*_additional.csv
files for each table generated when the tool is run - Use Beyond Compare, with Rules > Alignment > Sorted, Rules > Columns > gear icon > Key
To search within a single excel file:
- In Excel, Ctrl+F then change Within: Workbook, Look in: Values
To search in a folder of excel files:
- Search
output/raw_tables.txt
(the raw dataframes extracted from the Excel files) andoutput/merged_tables.txt
(the dataframes just after merging) using any text editor. The extraction code is quite stable and believed to be complete so it's very likely that the data from the Excel files is at least in the raw_tables file - Use dnGrep