Merge pull request #408 from singularity-energy/development

v0.6.0
singularity-energy · Dec 24, 2024 · 6820554 · 6820554
2 parents ea2bf19 + 33057b5
commit 6820554
Show file tree

Hide file tree

Showing 43 changed files with 7,406 additions and 6,119 deletions.
diff --git a/CITATION.cff b/CITATION.cff
@@ -22,6 +22,6 @@ authors:
 identifiers:
   - type: doi
     value: 'https://zenodo.org/doi/10.5281/zenodo.7062459'
-version: 0.5.0
+version: 0.6.0
 license: MIT
-date-released: '2024-08-01'
+date-released: '2024-12-24'
diff --git a/Pipfile b/Pipfile
@@ -8,6 +8,7 @@ cvxopt = "*"
 cvxpy = "*"
 dask = "< 2024.3.0"
 osqp = "*"
+ipynb = "*"
 ipykernel = "*"
 notebook = "*"
 numpy = "*"

diff --git a/Pipfile.lock b/Pipfile.lock
diff --git a/README.md b/README.md
@@ -43,7 +43,7 @@ pip install .
 The pipeline can be run as follows:
 ```bash
 cd src/oge
-python data_pipeline.py --year 2022
+python data_pipeline.py --year 2023
 ```
 independently of the installation method you chose.
 
@@ -56,6 +56,16 @@ Parts of the input data used for the Open Grid Emissions dataset is released by
 
 Updated datasets will also be published whenever a new version of the open-grid-emissions repository is released.
 
+### Running the pipeline with early release data
+The OGE pipeline can be used to generate data using Early Release EIA data as soon as it is integrated into the PUDL nightly builds. In order to do that, `constants.current_early_release_year` must be updated to the current early release year (such that `current_early_release_year` is 1 year greater than `latest_validated_year`). Early release data is typically available from EIA in June/July of the following year, and is integrated into PUDL shortly thereafter.
+
+In addition, you will need to download and use the pudl nightly build data until the data becomes available through a stable release. To do so, you need to set your `PUDL_BUILD` environment variable to "nightly". You can do this through the command line using `set PUDL_BUILD=nightly` (for Windows), or by adding the following to the `__init__.py` file in `src/oge`:
+```python
+import os
+
+os.environ["PUDL_BUILD"] = "nightly"
+```
+
 ## Contribute
 There are many ways that you can contribute!
  - Tell us how you are using the dataset or python tools

diff --git a/docs/docs/Data Validation/Comparing Data to eGRID.md b/docs/docs/Data Validation/Comparing Data to eGRID.md
@@ -1,7 +1,7 @@
 ---
 stoplight-id: egrid_comparison
 ---
-Although the OGE methodology is based on the EPA's eGRID methodology, there are some important differences. Thus, if comparing OGEI data to eGRID data, it is important to keep the following differences in mind:
+Although the OGE methodology is based on the EPA's eGRID methodology, there are some important differences. Thus, if comparing OGE data to eGRID data, it is important to keep the following differences in mind:
 
 
 <table>

diff --git a/...igning Hourly Profiles to Monthly Data/Shaping Using Fleet-Specific Profiles.md b/...igning Hourly Profiles to Monthly Data/Shaping Using Fleet-Specific Profiles.md
@@ -2,18 +2,18 @@
 stoplight-id: shaping_fleet_data
 ---
 
-## Shaping EIA-only without plant-specific profiles
+## Shaping EIA-only without generator-specific profiles
 
 One of the primary innovations of the Open Grid Emissions Initiative is its approach to assigning an hourly profile to monthly generation and fuel data reported in EIA-923. This is accomplished using hourly regional generation fleet data reported in EIA-930 (also known as the “Hourly Electric Grid Monitor”). For each regional balancing authority, EIA-930 reports the hourly net generation from all generators of each fuel category (e.g. coal, natural gas, hydro, solar, etc) in that region (which we will refer to as a “fleet”). Since we know the total net generation profile of each fleet, as well as the net generation profile reported to CEMS, we can calculate a residual profile that should theoretically reflect the aggregate profile of all generators in a fleet that do not report to CEMS. This fleet-specific profile can then be used to shape the monthly total data reported to EIA-923. Although this method still has several issues (which will be discussed later), we believe this to be the best currently available method for imputing these shapes, since it is based on observed data.
 
-> Definition: a group of all plants in a balancing authority that have the same fuel category (e.g. coal plants in MISO) are referred to as a "fleet"
+> Definition: a group of all subplants in a balancing authority that have the same fuel category (e.g. coal subplants in MISO) are referred to as a "fleet"
 
 
 ## Calculating a residual profile
 
 At its most basic, calculating a residual hourly profile that reflects the generation profile of the part of each fleet that does not report to CEMS involves subtracting the hourly CEMS net generation profile for that fleet from the hourly total EIA-930 profile for that fleet.
 
-To prepare the EIA-930 data for this calculation, it is cleaned and reconciled using a process described elsewhere in this documentation. To prepare the CEMS data for this calculation, first all previously-shaped data (CEMS, partial CEMS subplant, and partial CEMS plant) is added together, and aggregated by balancing authority and fuel category. Each plant may be assigned one of 41 unique primary fuels, but EIA-930 only reports generation totals for 8 broader fuel categories (solar, wind, hydro, nuclear, natural gas, coal, petroleum, and other), each specific energy source type is mapped to one of these broader categories for aggregation.
+To prepare the EIA-930 data for this calculation, it is cleaned and reconciled using a process described elsewhere in this documentation. To prepare the CEMS data for this calculation, first all previously-shaped data (CEMS, partial CEMS subplant, and partial CEMS plant) is added together, and aggregated by balancing authority and fuel category. Each subplant may be assigned one of 41 unique primary fuels, but EIA-930 only reports generation totals for 8 broader fuel categories (solar, wind, hydro, nuclear, natural gas, coal, petroleum, and other), each specific energy source type is mapped to one of these broader categories for aggregation.
 
 In some cases, the CEMS profile for a fleet will be larger than the EIA-930 fleet total. This may be a result of several inconsistencies in the way that generation data is reported to EIA-930 versus EIA-860 and EIA-923. These issues are discussed in more depth below, but include instances when generation is reported to a different balancing authority, is categorized under a different fuel category, or not reported to EIA-930 because the plant it is associated with is connected to the distribution grid, rather than the transmission grid.
 
@@ -35,7 +35,7 @@ If no data is available from neighboring BAs, we instead use a national average
 
 If neither a good-quality residual profile or shifted residual profile are available for a given fleet, we first fall back to using the total EIA-930 fleet profile to estimate the profile. If a complete EIA-930 fleet profile is not available, we fall back to using the fleet profile for those plants that report to CEMS. Both of these approaches assume that plants of a certain fuel type in a given region generally operate in similar patterns (which may not always be the case). The EIA-930 profile is preferred to the CEMS profile because we assume that the average profile of the entire fleet better represents any single member of the fleet than the profile of a non-random sample of the fleet (CEMS data only represents relatively large generators > 25MW).
 
-If all other attempts at imputing a reasonable hourly profile fail, we assign a flat hourly profile to the data (which is functionally the same as using a monthly average value. For certain fuels, like geothermal, nuclear, biomass, and waste, which tend to run as baseload resources, this may be a reasonable assumption.
+If all other attempts at imputing a reasonable hourly profile fail, we assign a flat hourly profile to the data (which is functionally the same as using a monthly average value). For certain fuels, like geothermal, nuclear, biomass, and waste, which tend to run as baseload resources, this may be a reasonable assumption.
 
 
 ## Known issues with using EIA-930 data for hourly profiles
@@ -70,16 +70,12 @@ The primary fuel codes assigned to each plant in the OGE pipeline may not match
 Our understanding is that the data published in EIA-930 only reflects plants have metered telemetry that communicates with the grid operator. In many cases, this may exclude plants that are connected to distribution grids, rather than directly interconnected at the transmission level. Thus, it is possible that the set of plants reported in EIA-930 may not reflect the full set of plants that report to EIA-923 (which includes plants that are connected to distribution grids as well).
 
 ## Shaping the Monthly data
-Once an hourly profile for each fleet-month has been calculated, it is converted to a percentage of the monthly total value. These percentages are then multiplied by each monthly total value to get the hourly profile for each fleet. This approach ensures that the shaped hourly values, when aggregated back to the monthly level, will equal the total reported monthly value that was shaped.
+Once an hourly profile for each fleet-month has been calculated, it is converted to a percentage of the monthly total value. These percentages are then multiplied by each monthly total value to get the hourly profile for each subplant. This approach ensures that the shaped hourly values, when aggregated back to the monthly level, will equal the total reported monthly value that was shaped.
 
-Currently, we only report hourly data for these EIA-only plants at the fleet level, rather than the individual plant level. We do this for several reasons:
-1. Although the hourly CEMS data represents approximately 90% of all electricity-related emissions, the EIA data that is being shaped accounts for approximately 80% of all subplant operating hours in the dataset, meaning that there is a large amount of small plants that need to be shaped. Thus, creating an hourly value for each of these plants would result in a huge dataset that many computers do not have the ability to store in their memory (RAM). Thus, shaping fleet-level data helps keep the size of this dataset managable.
-2. Because the method used to shape the data relies on fleet-level observations (rather than plant-specific imputation), we feel that providing a plant-level hourly value may create a sense of false precision in the end result.
-
-Because aggregating the plant-level data to the fleet level removes the `plant_id_eia` identifier, we create a synthetic `shaped_plant_id` to represent these aggregated plants. These synthetic ids are 6-digit identifiers that follow the format `9BBBFF` where `BBB` is the three-digit ba number identified in [this table](https://github.com/singularity-energy/open-grid-emissions/blob/main/data/manual/ba_reference.csv), and `FF` represents a two-digit number unique to each fuel category, and defined [here](https://github.com/singularity-energy/open-grid-emissions/blob/afb3ddec0dc93003c21f655b90300c17344107f8/src/impute_hourly_profiles.py#L11). A mapping of `plant_id_eia` to `shaped_plant_id` for each plant can be found in the `plant_static_attributes` table that is included in the dataset.
+When exporting hourly, plant-level data, we export hourly data for each individual plant. However, this data is too large to hold in memory, so this data is created for each BA and then exported. When calculating hourly, fleet-level outputs, we first aggregate the EIA-923 data to the fleet level before assigning a shape. 
 
 ## Future Work, Known Issues, and Open Questions
-- Infer missing hourly profiles for hydro generation ([details](https://github.com/singularity-energy/open-grid-emissions/issues/37)
-- Infer hourly profiles for energy storage charge and discharge ([details](https://github.com/singularity-energy/open-grid-emissions/issues/59)
-- Should we model hourly shapes for missing peaker or load following genration? ([details](https://github.com/singularity-energy/open-grid-emissions/issues/96)
-- Improve imputation of missing wind and solar generation profiles ([details](https://github.com/singularity-energy/open-grid-emissions/issues/171)
+- Infer missing hourly profiles for hydro generation ([details](https://github.com/singularity-energy/open-grid-emissions/issues/37))
+- Infer hourly profiles for energy storage charge and discharge ([details](https://github.com/singularity-energy/open-grid-emissions/issues/59))
+- Should we model hourly shapes for missing peaker or load following genration? ([details](https://github.com/singularity-energy/open-grid-emissions/issues/96))
+- Improve imputation of missing wind and solar generation profiles ([details](https://github.com/singularity-energy/open-grid-emissions/issues/171))