Initial script for issue #23 health & education electricity cluster #208

Sunishchal · 2020-09-06T11:26:51Z

Not intending to merge yet, just making my initial attempt visible to field Denton's feedback on the file structure & approach

…icity cluster

Sunishchal · 2020-09-06T12:42:13Z

Questions on how to structure this solution:

Should "cluster" sheet have their own subdirectory within healthandeducation? (ex: healthandeducation/electricity_cluster/init.py)
Unit tests for each cluster can exist within each subdirectory (ex: healthandeducation/electricity_cluster/test_electricity.py)
Should each class still be called Scenario()? Or should they be named after their cluster?
Can the population tables or emissions factors be reused from elsewhere in the repo?

Please excuse poor code hygiene & long lines. I plan to format this before the final commit. Which autoformatter do you recommend, if any?

DentonGentry

I hope we can find a way for there to be just one solution/health_and_education/__init__.py file and one Scenario class, because for the overall system we'd like One Drawdown Solution == One Python Class.

We can have Python modules within solution/health_and_education/electricity and so on, and have solution/health_and_education/__init__.py import those modules, but ideally those submodules would not be visible outside of solution/health_and_education/__init__.py

Can the population tables or emissions factors be reused from elsewhere in the repo?

The other solutions use data/unitadoption_pds_population.csv and data/unitadoption_ref_population.csv. From what Chad mentioned in email, Health and Education has different data than the rest of the solutions currently use. I'm not clear on whether that will be the case forever, or is just a temporary condition that Health and Education has updated data which the rest of the solutions have not incorporated yet.

Depending on that answer we'll either want the population data to live in solution/health_and_education/data, or put it in the toplevel data directory with a name that makes it clear it is currently only used in some of the solutions.

DentonGentry · 2020-09-07T22:39:00Z

solution/healthandeducation/__init__.py

+THISDIR = pathlib.Path(__file__).parents[0]
+
+name = 'Health and Education - Electricity Cluster'
+# solution_category = ac.SOLUTION_CATEGORY.REDUCTION #TODO: Confirm this is a reduction solution


I think Health and Education, like the other special solutions for Food Waste and Plant Rich Diet and so on, is neither a Reduction, Replacement, or Land Use solution. It is its own thing. We might want to add a SOLUTION_CATEGORY of SPECIAL to cover it.

If we find that each should be its own SOLUTION_CATEGORY we can add more.

It is a reduction solution. It has a direct impact on the functional demand of... everything.

If I were to uncomment this line, are there any technical reasons REDUCTION would be incompatible with health & education?
I'm unfamiliar with the advanced_controls.py module, but I don't see SOLUTION_CATEGORY mentioned much in that script, so I think it would be safe to leave as REDUCTION.

It would be safe to leave as REDUCTION. I'd expected to have these be their own special category, but it doesn't have to.

DentonGentry · 2020-09-07T22:46:04Z

solution/healthandeducation/__init__.py

+
+ # Population scenarios
+ # Population_Tables!C2:L49
+ self.ref1_population = pd.DataFrame(ref1_population_list[1:],


For unit tests in solution/health_and_education/tests, it would absolutely make sense to have things like ref1_population_list embedded in the solution/health_and_education/tests/something_test.py file. The list would most likely hold data which originated in the Excel file and is being used for a test.

For this actual solution file, I don't think embedding the data in the __init__.py file in the form of a ref1_population_list is a good idea. Our UI and workflows for researchers to add or adjust data will expect to be able to write out data files, not Python files. Recommend that these be added as CSV files in solution/health_and_education/data, and read them in using pd.read_csv().

Sounds good. So any data that researchers interact with should be in CSV files (Population_Tables). But for intermediate tables, those should exist as lists in the unit test files (tables 3-14 in Electricity_cluster)?

What about Table 2 (REF 2 TAM)? That one should be coming from other electricity solutions, so I assume it's okay to leave in python lists for now until we identify how to source the electricity solution data from somewhere like solarpvutil.Scenario().tm.ref_tam_per_region()? This relates to question #2 from my email last week.

I think it is fine for unit tests to use inline data.

I'd prefer that the actual solutions not do so, unless we have no other alternative for something. I don't expect the UI to be able to modify *.py files, anything inline in Python source will require someone with an editor to modify the Python source.

I decided to depart from the in line data and just copy paste the entire electricity_cluster Excel sheet into a CSV for unit testing: https://github.com/ProjectDrawdown/solutions/blob/395ed46819ea6f90bcb29b04c8f3bd03bd1a46a9/solution/health_and_education/tests/expected_elec_cluster.csv

Let me know if this approach is kosher. I saw test_excel_integration.py uses similar methods except the CSV lives inside a zip file. I can follow that approach here too if you prefer.

DentonGentry · 2020-09-07T22:50:55Z

solution/healthandeducation/__init__.py

@@ -0,0 +1,447 @@
+"""Health & Education solution model for Electricity Cluster


healthandeducation is difficult to understand if you don't already know what it is. I keep seeing the word "hand" in it. Recommend naming the subdirectory health_and_education instead.

Agreed. I was initially following the same "no snake case" convention as the other solutions, but I think it makes sense to add underscores here. I doubt that "hand education" would be an effective climate solution ✋😂

lol... that is pretty good, Dev

DentonGentry · 2020-09-07T22:53:22Z

solution/healthandeducation/__init__.py

+DATADIR = pathlib.Path(__file__).parents[2].joinpath('data')
+THISDIR = pathlib.Path(__file__).parents[0]
+
+name = 'Health and Education - Electricity Cluster'


I'm hoping that solution/health_and_education can handle all of the clusters, such that this __init__.py would handle all of them. Its name would be 'Health and Education' without a specific cluster.

I've moved the bulk of this code out to a new electricity_cluster.py file and simply instantiate the object in __init__py. Let me know if this is the structure you had in mind. We'll probably need slightly different abstractions later to handle different complexities as I implement more clusters.

I think that will be fine. If necessary it can be put into electricity_cluster/__init__.py and imported as a module, if we find that there need to be more files encapsulated within electricity_cluster/*

DentonGentry · 2020-09-07T22:54:23Z

solution/healthandeducation/__init__.py

+# % impact of educational attainment on uptake of Family Planning:
+fixed_weighting_factor = None
+pct_impact = 0.50
+use_fixed_weight = 'N'


If these need to vary by scenario they'll need to be in advanced_controls.py instead of here. It is fine to have them here for now until we figure out whether they vary by scenario.

Noted this down as a comment so I remember to revisit it later. For now, will keep them local to each cluster's .py file.

DentonGentry · 2020-09-07T22:58:25Z

solution/healthandeducation/__init__.py

+ 'Asia (Sans Japan)': 'Y', 
+ 'Middle East and Africa': 'Y', 
+ 'Latin America': 'N'
+}


This is something we'll need to revisit: we're trying very hard to minimize the number of places in the codebase which know the names of the regions, like 'Middle East and Africa'. Right now we've gotten it down to just model/dd.py and model/emissions_factors.py, which has a notion of how clean the electric grid is in each region.

The intent is to let this model implementation function at the level of the whole world, and also be able to support regionalization to model the United States or India or the EU as the top level and have regions within it which are sensible for that purpose.

It may be that lldc_high_nrr_config only makes sense at the level of the entire world, and that if modelling the US or India or the EU that all of the regions within would either be 'Y' or 'N'

Shall I plan to add this lldc_high_nrr dictionary as a variable in dd.py (similar to dd.REGION)?

Sure, that would be fine. It will need a good description of what it is, since it won't be as widely used as the other things in that file.

DentonGentry · 2020-09-07T23:00:50Z

solution/healthandeducation/__init__.py

+ # Population_Tables!O2:X49
+ self.ref2_population = pd.DataFrame(ref2_population_list[1:],
+ columns=ref2_population_list[0],
+ index=list(range(2014, 2061)), dtype=np.float64)


Making the data come from a CSV file means one of the columns should be the Year, and use .set_index() on the DataFrame.

The danger of using range(2014, 2061) for the index is that the data is decoupled from the year. Early on we had a bunch of bugs dealing with shifting years out from under the data, and adopted a policy of keeping the year indexes together with the data.

Agreed, it's safer to have an explicit Year column. I saw places in the codebase like test_tam.py which use the range(2014, 0261) method, so I mimicked that.
I've now abstracted this into the CSV file. I didn't use the .set_index() method you mentioned, but achieved the same effect by passing the index_col='Year' argument into pd.read_csv().

DentonGentry · 2020-09-07T23:02:24Z

solution/healthandeducation/__init__.py

+ if lldc_high_nrr_config['Asia (Sans Japan)'] == 'N':
+ ref1_elec_gen.loc[:, 'MDC + EE + LAC with higher educational attainment'] = ref1_elec_gen.loc[:, 'MDC + EE + LAC with higher educational attainment'] - self.ref1_tam_high_edu.loc[:, 'China']
+
+ ref1_elec_gen.loc[:, 'Total Electricity Demand in Countries with Low Educational Attainment (TWh), exluding China'] = ref1_elec_gen.loc[:, 'LLDC with low educational attainment, excluding China'] + ref1_elec_gen.loc[:, 'MDC + EE + LAC with low educational attainment, excluding China']


A big reason for wanting these to come from data files is that embedded within the Python code will make these more difficult to change as the researchers add data.

Does this mean I should remove the Asia (Sans Japan) reference in line 136? How might I abstract that into a data file?

DentonGentry · 2020-09-07T23:05:27Z

solution/healthandeducation/__init__.py

+# Population_Tables!C2:L49
+ref1_population_list = [
+ ['World', 'OECD90', 'Eastern Europe', 'Asia (Sans Japan)', 'Middle East and Africa', 'Latin America', 'China', 'India', 'EU', 'USA'],
+ [7349.47210, 929.27447, 407.26543, 3957.23614, 1421.29910, 634.39696, 1376.04894, 1311.05053, 738.44207, 321.77363],


Recommend these go into a data subdirectory within solution/health_and_education.

Note that there is a data/health directory at the toplevel which we should rename. That holds information about the health of the models, like how many use Custom PDS Adoption and whether they have regional data. At the time, there wasn't a Health and Education sector there was Women and Girls, so we didn't see the name conflict.

I've moved them to CSVs inside solution/health_and_education/data.

Would you like me to rename the data/health directory, or were you just mentioning this for my understanding?

Just mentioning, I wouldn't rename data/health in this same set of commits.

Sunishchal · 2020-09-12T07:57:23Z

I used many country/region references in the electricity_cluster implementation because I wanted it to be as explicit and similar to the Excel as possible for easy debugging. Shall I plan on changing the column naming convention or using something like .iloc[] which doesn't rely on strings of column names?

Sunishchal · 2020-09-14T10:20:54Z

I've added an initial stab at a unit testing file. I went the route of adding a huge CSV file with all of the intermediate tables from Chad's Excel file to use as the "expected df" (can turn this into a zip file later). Only added a single unit test so far. Let me know if this setup will work and I'll flesh out the rest of the tests.

I already see that the test_electricity.py file has failed the build. This is because I had to resort to a sys.path.append() to get my electricity_cluster module to import, since I was unable to get it to import naturally like the rest of the modules in this repo. Could you help me work around this?

…un successfully

DentonGentry · 2020-09-20T06:00:05Z

Shall I plan on changing the column naming convention or using something like .iloc[] which doesn't rely on strings of column names?

iloc tends to be fragile, especially when using it to address rows. For example, if two researchers both add a line at the end of the file and one of them gets a merge conflict, the resolution will move the new row and any iloc[] trying to reference it will be off-by-one.

We've tried to mostly only use iloc[0], meaning "the first data point no matter what the year is."

DentonGentry · 2020-09-20T06:01:36Z

This is because I had to resort to a sys.path.append() to get my electricity_cluster module to import, since I was unable to get it to import naturally like the rest of the modules in this repo. Could you help me work around this?

This may end up being a reason to add it as electricity_cluster/__init__.py instead of electricity_cluster.py, so it will import as a regular module.

DentonGentry · 2020-09-20T06:02:04Z

Oops. Didn't intend to close it, reopened.

Sunishchal · 2020-09-26T08:28:26Z

This may end up being a reason to add it as electricity_cluster/init.py instead of electricity_cluster.py, so it will import as a regular module.

I made the requested change in this commit. However, I opted to name the directory "clusters" rather than "electricity_cluster" so it can house .py files from all the clusters and share the same init. Let me know if this structure is suitable.

However, I am still unable to import upper level modules like model.dd without having a sys.path.append, as you'll see in this commit. Is this okay to keep or is there some other convention we prefer to follow?

Sunishchal · 2020-09-26T08:31:59Z

iloc tends to be fragile, especially when using it to address rows. For example, if two researchers both add a line at the end of the file and one of them gets a merge conflict, the resolution will move the new row and any iloc[] trying to reference it will be off-by-one.

Good to know. In that case I will keep the verbose strings for column indexing. Point noted about using iloc in case I need to access the first row regardless of year.

brodavi · 2021-06-15T22:12:53Z

@Sunishchal is this PR still valid, or is it obsolete? I see some tests failed, is that something we should try to address?

Sunishchal · 2021-06-15T22:58:59Z

@Sunishchal is this PR still valid, or is it obsolete? I see some tests failed, is that something we should try to address?

@brodavi It's still valid but not ready to merge for a while until we figure out how to integrate this model with the rest of the codebase. I just had a chat with @denised and it sounds like we'll need to spend some time understanding how to do that since this model is quite different from other solutions.

Initial script for issue ProjectDrawdown#23 health & education electr…

6ac8160

…icity cluster

Testing TAM output from solarpvutil, commenting due to discrepancy

30cabdf

DentonGentry reviewed Sep 7, 2020

View reviewed changes

Sunishchal added 2 commits September 12, 2020 01:11

Modified folder structure and added CSV data files

81fbfd4

Added unit testing for electricity cluster

0dce4eb

Sunishchal requested a review from DentonGentry September 14, 2020 10:21

Sunishchal added 2 commits September 19, 2020 02:34

Made formatting adjustments to validation data to ensure unit tests r…

e818bdb

…un successfully

Added unit tests for all implemented electricity cluster tables (1-6)

395ed46

DentonGentry closed this Sep 20, 2020

DentonGentry reopened this Sep 20, 2020

Sunishchal added 2 commits September 26, 2020 01:21

Modifying structure to support multiple clusters

45a20eb

Adding path to support upper level module imports

e2a64df

Sunishchal added 10 commits September 26, 2020 01:40

Moving LLDC config to model.dd

f99b4ee

Implemented tables 7 and 8 in electricity cluster

96a02a6

Implemented code & tests for tables 9 - 14 in electricity cluster

1f08fa0

Implemented total CO2 reduction readouts

c714901

Test for final CO2 readouts

9ba2439

Generalized class structure and implemented space heating cluster

e3839b2

Implemented space cooling cluster

c71af0b

Formatting updates

578d892

Instantiation of clean cookstoves cluster

78d9773

Completed clean cookstoves cluster

532bbad

Sunishchal added 10 commits December 29, 2020 01:53

Working implementation for commercial light cluster

c3c8b08

Residential light cluster working implementation

14cc7e5

Working water heating cluster implementation

75d58aa

Working paper cluster implementation

13da3be

Assumptions cleanup

9edcd16

Working implementation for water cluster

918c84b

Working implementation of plastics cluster

ee17959

Modified test spreadsheet

af2ebaa

Removing old init

b5ff5fb

Changing to use newer version of python

aeb111d

denised marked this pull request as draft June 16, 2021 05:22

denised linked an issue Jun 26, 2021 that may be closed by this pull request

Health & Education solutions #23

Open

		@@ -0,0 +1,447 @@
		"""Health & Education solution model for Electricity Cluster

Initial script for issue #23 health & education electricity cluster #208

Are you sure you want to change the base?

Initial script for issue #23 health & education electricity cluster #208

Conversation

Sunishchal commented Sep 6, 2020

Sunishchal commented Sep 6, 2020

DentonGentry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sunishchal Sep 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sunishchal commented Sep 12, 2020

Sunishchal commented Sep 14, 2020 • edited Loading

DentonGentry commented Sep 20, 2020

DentonGentry commented Sep 20, 2020

DentonGentry commented Sep 20, 2020

Sunishchal commented Sep 26, 2020 • edited Loading

Sunishchal commented Sep 26, 2020

brodavi commented Jun 15, 2021

Sunishchal commented Jun 15, 2021

Sunishchal Sep 8, 2020 •

edited

Loading

Sunishchal commented Sep 14, 2020 •

edited

Loading

Sunishchal commented Sep 26, 2020 •

edited

Loading