Skip to content

Python Model Project Notes July 7 2021

colleenskemp edited this page Jul 7, 2021 · 2 revisions

Key quicklinks: PD / Solution List

PD Solution List co-work

We are going to be updating all the data sets. TAM and Adoption will be updated next week. Need to design how the integrations will look in Python. This will not look the same as the excel. Design needed. To follow hackathon. Evaluate completeness of existing models by comparing key results to excel outcomes.

Team Meeting Notes

Could Dev help with creating the new hackathon location for the models? Dev - can you make this directory? ACTION Colleen.

https://github.com/ProjectDrawdown/solutions/issues/297 There is a test suite already. This is something that Denise is working on. It takes the excel spreadsheet and dumps the data into a zip and then compares with the python. We don't have it documented how to run it. Don't recommend adding to what we have.

Neil has run into some issues relating to how the first cost is treated in excel. First off, the start dates are misaligned. Additionally, there is a jump after 30 years when equipment needs to be replaced that doesn't seem quite right. Adjsutments may be needed to the excel model. This should be undertaken later down the road in collaboration with the PDD team. 2 Notes being added to

Lots of architecture insights arising in the work on Seaweed Farms. Meeting with Denise to discuss possible design updates informed by these changes next week.

Denise: https://github.com/ProjectDrawdown/solutions/issues/295 going to talk to Nabil about this at the cowork tomorrow.

From Denise Slack: When I first tried to set up the python code on my Windows machine, way back in March or so, I hit roadblocks. The details are misty, but I know that some software package or packages were unavailable. I tried updating code to newer versions, and it all worked pretty well. There was one package that needed to be changed: xlrd, a package for reading Excel files, refused to open macro-enabled Excel files in the newer versions --- and in fact it is deprecated entirely according to its author. So I swapped it out for the recommended replacement openpyxl. It didn't seem to be too hard; I had to change a bit of code here and there, right? I knew there were some gotchas lurking in the depths. I hadn't touched the extraction code....until last week. First time through, tedious but doable. Change all the syntax; replace hundreds of occurrences of tab.cell_value(row, col) with hundreds of occurrences of tab.cell(row, col).value. Annoying, but doable. But then I discovered that xlrd uses 0-based row and column indexing, while openpyxl uses 1-based row and column indexing. . I went through and changed all those references, one by one. So I started this weekend happily debugging that effort and ran into another issue: openpyxl returns None for empty cells, where xlrd had returned an empty string. So there were places in the code that had tests like if str(cell_value) == ''... Which doesn't do what you want it to do if cell_value is None. One final hurdle, one I really didn't expect: pd.read_excel behaves very differently with the two different libraries. In particular, it interprets the parameters header and skiprows differently. With xlrd, it first skips skiprows forward, then an additional header rows to find the header, then reads the data. With openpyxl, it skips header rows forward to find the header, then skiprows after the header to find the data. Not the same at all. It took me awhile to believe my own eyes --- I was so sure I was looking for off-by-one errors, after all. Approximately three line-by-line passes through the solution extraction code, plus debugging, took me pretty much the entire (long) weekend. If I had known, I'd have tried harder to keep using the old library --- but that would probably have caused problems down the road eventually, anyway. Or so I tell myself :-)