Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep track of historical load shedding #466

Open
9 tasks
beyarkay opened this issue Aug 31, 2023 · 7 comments
Open
9 tasks

Keep track of historical load shedding #466

beyarkay opened this issue Aug 31, 2023 · 7 comments

Comments

@beyarkay
Copy link
Owner

Keeping track of historical loadshedding is technically feasible at the moment, but it isn't easy to accomplish.

Basically all the information is in the git log for the file manually_specified.yaml, but extracting and compiling it would be a pain.

Probably the easiest way to make historical loadshedding data available would be to have a CI/CD script that runs every time the calendars get built. This script should calculate the historical loadshedding (either by updating the previously calculated data or by recalculating everything from scratch) and emit a file containing that information.

For parsing, it would be easiest if that file were formatted in the same way as manually_specified.yaml:

changes:
- stage: 4
  start: 2023-08-31T14:00:00
  finsh: 2023-09-02T05:00:00
  source: https://twitter.com/Eskom_SA/status/1697210092179935262
  exclude: coct
- stage: 2
  start: 2023-09-02T05:00:00
  finsh: 2023-09-02T16:00:00
  source: https://twitter.com/Eskom_SA/status/1697210092179935262
  exclude: coct
...

Keeping the format the same would mean the main codebase is equally able to calculate historical loadshedding and future loadshedding. However, it shouldn't be too much work to parse some different format, if that format provided some benefits.

Note that YAML is a superset of JSON, so the below snippet is valid YAML, while requiring fewer characters:

changes:
- { stage: 4, start: 2023-08-31T14:00:00, finsh: 2023-09-02T05:00:00, source: https://twitter.com/Eskom_SA/status/1697210092179935262, exclude: coct }
- { stage: 2, start: 2023-09-02T05:00:00, finsh: 2023-09-02T16:00:00, source: https://twitter.com/Eskom_SA/status/1697210092179935262, exclude: coct }

Keeping the format the same is not a hard requirement, but alternatives should be properly motivated.

Here's a high level checklist:

  • Write a script (python/rust) that can run locally and create one file containing the entire history of loadshedding.
  • Add tests for the above script.
    • It should fail gracefully and handle edge cases/date boundaries properly.
    • Ensure that it handles the fact that Cape Town and Eskom often have different schedules.
    • If your script requires network access, ensure it fails gracefully if it doesn't have that
    • Often the VMs that run the GH actions only download a portion of the repo (since full git history isn't really required). Ensure your script handles this properly by either downloading the full history or otherwise making a plan.
  • Ask @beyarkay for access to the eskom-calendar-dev repo. This is a private mirror of eskom-calendar, used to test CI/CD things. You can also set up your own, but getting the private GitHub keys setup (which allow GH actions to run faster) can be a pain.
  • Integrate your script into the publish-calendars workflow. You'll probably want to add a step after the 'upload to pastebin' step (here). You'll also need to make sure your script writes to the calendars/ directory, as that's the only directory that gets uploaded to GitHub releases.
  • Manually test out the script a few times, updating manually_specified.yaml and asserting that the updated changes get properly integrated.

If the above are done, then all should be good! @beyarkay will check things over and merge.

@keeganwhite
Copy link

What do you think of the following for the Historical Data Format (following a similar pattern to manually_specified.yml):

historical_changes:
  - stage: 4
    start: 2023-08-31T14:00:00
    finish: 2023-09-02T05:00:00
    source: https://twitter.com/Eskom_SA/status/1697210092179935262
    exclude: coct
    historical: false
  - stage: 2
    start: 2023-09-02T05:00:00
    finish: 2023-09-02T16:00:00
    source: https://twitter.com/Eskom_SA/status/1697210092179935262
    exclude: coct
    historical: true

where historical: It is a boolean field which will be true for entries that are coming from historical data and false for new entries.

@beyarkay
Copy link
Owner Author

beyarkay commented Sep 7, 2023

I'm actually wondering if there are any disadvantages to keeping the format identical, so future changes look the same as historical changes:

historical_changes:
  - stage: 4
    start: 2023-08-31T14:00:00
    finish: 2023-09-02T05:00:00
    source: https://twitter.com/Eskom_SA/status/1697210092179935262
    exclude: coct
  - stage: 2
    start: 2023-09-02T05:00:00
    finish: 2023-09-02T16:00:00
    source: https://twitter.com/Eskom_SA/status/1697210092179935262
    exclude: coct

Very happy to hear feedback/opinions on this, but my reasoning is:

  • Keeping the format the same will mean no change is required to parse historical loadshedding. For example, figuring out the loadshedding schedule for western-cape-stellenbosch for the upcoming week would be identical to figuring out the loadshedding schedule for western-cape-stellenbosch for the past week.
  • From a technical perspective, no information is lost (I think?) since historical changes will always be in the past and future changes will always be in the future. It's possible there's an edge case here that makes this not true, but I can't think of one.

Here's a link to the struct that defines the loadshedding change. Removing the rust-specific details, it looks like:

struct Change {
    start: String,
    finsh: String,
    stage: unsigned 8-bit integer,
    source: String,
    include_regex: Option<String>,
    exclude_regex: Option<String>,
    include: Option<String>,
    exclude: Option<String>,
}

include and exclude are really just syntactic sugar that get converted into explicit regexs which an area name must match if it is affected by the relevant Change. include and exclude get converted to include_regex and exclude_regex by this function which basically just converts shorthand like cape-town into regex like city-of-cape-town-area-\d{1,2}. The regex matching hasn't been as useful as I thought it would be (and I don't think I've ever actually used it in manually_specified.yaml) so I don't think it's worth your time trying to deal with it. Just assume include_regex and exclude_regex don't exist.

@keeganwhite
Copy link

Agreed. I don't see a disadvantage in keeping the format the same.

To explain my thought process for dealing with two files that have overlapping times or conflicting stages like this:

  - stage: 3
    start: 2023-09-04T10:00:00
    finsh: 2023-09-04T22:00:00
    source: https://twitter.com/CityofCT/status/1698744757000831345
    include: coct
  - stage: 5
    start: 2023-09-04T22:00:00
    finsh: 2023-09-05T05:00:00
    source: https://twitter.com/CityofCT/status/1698744757000831345
    include: coct

and a newer file:

  - stage: 5
    start: 2023-09-04T18:00:00
    finsh: 2023-09-04T22:00:00
    source: https://twitter.com/CityofCT/status/1698744757000831345
    include: coct
  - stage: 6
    start: 2023-09-04T22:00:00
    finsh: 2023-09-05T05:00:00
    source: https://twitter.com/CityofCT/status/1698744757000831345
    include: coct

where the differences are stage 3 in the first entry moving to stage 5 for a portion of the overlapping time and a change from stage 5 to 6 for a whole time frame.

Then the output in the historical data would be

  - stage: 6
    start: 2023-09-04T22:00:00
    finsh: 2023-09-05T05:00:00
    source: https://twitter.com/CityofCT/status/1698744757000831345
    include: coct
  - stage: 5
    start: 2023-09-04T18:00:00
    finsh: 2023-09-04T22:00:00
    source: https://twitter.com/CityofCT/status/1698744757000831345
    include: coct
  - stage: 3
    start: 2023-09-04T10:00:00
    finsh: 2023-09-04T18:00:00
    source: https://twitter.com/CityofCT/status/1698744757000831345
    include: coct

where the rules for these changes can be summarised as "new entries replace older entries for overlapping times". Essentially, we delete the incorrect entry from the file completely...

@beyarkay
Copy link
Owner Author

beyarkay commented Sep 7, 2023

Yes that looks correct to me. Although attempting to read these is making me remember why I tried to make a schedule visualiser a while back (it's trickier than it might seem at first glance). If I get a chance later on today, I'll write up some test cases (probably formatted as a multi-document yaml file) so that we can get the computer verifying these things for us.

I'll write out some high-level test examples below:

It'll be useful to have a little custom syntax: file1 is older than file2, the caret ^ indicates the current time,
and a series of numbers like _ _ 4 4 0 2 2 2 indicates several stages over
some unit of time:

  • _ _: 2 units where we don't know what loadshedding stage it is,
  • 4 4: 2 units of stage four,
  • 0: followed by no loadshedding for one unit of time
  • 2 2 2: followed by stage two for 3 units of time

With the above, we can define some mini-test examples like:

If there are conflicts in the future, the newer file should take precedence:

file1:    2 2 2 2
file2:    4 4 2 2
now:     ^
result:   4 4 2 2

If there are conflicts in the past, the newer file should still take
precedence (sometimes loadshedding will be bumped to stage 6 at 2am, but the
announcement will only be made public at 7am, so we want to catch this edge
case):

file1:  2 2 2 2
file2:  4 4 2 2
now:       ^
result: 4 4 2 2

If the start/finsh boundaries don't align nicely across different
files, then the result should properly figure out the new boundaries

file1:  2 2 2 2 3 3 3 3
file2:  _ _ 4 4 4 4 2 2
now:             ^
result: 2 2 4 4 4 4 2 2

If an old file says "stage 6 for the rest of time" but a newer file updates to
say "stage 3 for the next week" then there should only be stage 3 for the next
week (it should not be followed by stage 6 for the rest of time)

file1:   6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
file2:   _ _ _ _ 3 3 3 3 _ _ _ _ _ _ _
now:                ^
result:  6 6 6 6 3 3 3 3 _ _ _ _ _ _ _

The above example also shows that there must be an option to specify "unknown
loadshedding". Unfortunately this does happen sometimes and it's unavoidable.

Finally, here's one big example, just to stress test things a bit

file1:  1 2 2 2 2 _ _ _ _ _ _ _ _ _ _ _ _ _
file2:  _ _ 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
file3:  _ _ _ _ _ 3 2 3 2 _ _ _ _ _ _ _ _ _
file4:  _ _ _ _ _ _ _ _ _ _ _ 1 1 1 1 1 1 1
file5:  _ _ _ _ _ _ _ _ _ _ _ _ _ 0 0 1 1 _
file6:  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 2 2
now:                   ^
result: 1 2 6 6 6 3 2 3 2 _ _ 1 1 0 0 1 2 2

I'll try write these up as YAML files tonight, but this should give you a good
idea. Please do bug me if it looks like I haven't been consistent with the
rules.

@keeganwhite
Copy link

I will have a look at this and get back to you but an issue (maybe something I missed?) is the different formatting of the manually_specified.yaml file. I can get you the commit hash if necessary. Easy enough to skip the files that are misbehaving and mark a period between two successful reads as unknown.

For example:

# How to edit this file:
# You should add items to `changes`. For example, here's a template that you
# can copy and paste just below the line `changes:`:
# ```
#  - stage: <STAGE NUMBER HERE>
#    start: <START TIME HERE>
#    finsh: <FINISH TIME HERE>
#    source: <URL TO INFORMATION SOURCE HERE>
#    exclude: <coct if this schedule doesn't apply to cape town>
#    include: <coct if this schedule only applies to cape town>
# ```
# See the README.md for more details
---
changes:
  start: 2023-08-31T14:00:00
  finsh: 2023-09-02T05:00:00
  source: https://twitter.com/Eskom_SA/status/1697210092179935262
  exclude: coct
- stage: 2
  start: 2023-09-02T05:00:00
  finsh: 2023-09-02T16:00:00
  source: https://twitter.com/Eskom_SA/status/1697210092179935262
  exclude: coct

- stage: 4
  start: 2023-08-31T10:00:00
  finsh: 2023-08-31T17:00:00
  source: https://twitter.com/CityofCT/status/1697259196931252229
  include: coct
- stage: 2
  start: 2023-08-31T17:00:00
  finsh: 2023-08-31T22:00:00
  source: https://twitter.com/CityofCT/status/1697259196931252229
  include: coct
- stage: 4
  start: 2023-08-31T22:00:00
  finsh: 2023-09-01T05:00:00
  source: https://twitter.com/CityofCT/status/1697259196931252229
  include: coct
- stage: 2
  start: 2023-09-01T05:00:00
  finsh: 2023-09-01T22:00:00
  source: https://twitter.com/CityofCT/status/1697259196931252229
  include: coct
- stage: 4
  start: 2023-09-01T22:00:00
  finsh: 2023-09-03T05:00:00
  source: https://twitter.com/CityofCT/status/1697259196931252229
  include: coct
- stage: 2
  start: 2023-09-03T05:00:00
  finsh: 2023-09-03T17:00:00
  source: https://twitter.com/CityofCT/status/1697259196931252229
  include: coct
historical_changes: []

as compared to:

# How to edit this file:
# You should add items to `changes`. For example, here's a template that you
# can copy and paste just below the line `changes:`:
# ```
#  - stage: <STAGE NUMBER HERE>
#    start: <START TIME HERE>
#    finsh: <FINISH TIME HERE>
#    source: <URL TO INFORMATION SOURCE HERE>
#    exclude: <coct if this schedule doesn't apply to cape town>
#    include: <coct if this schedule only applies to cape town>
# ```
# See the README.md for more details
---
- stage: 3
  start: 2023-08-27T16:00:00
  finsh: 2023-08-28T05:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 1
  start: 2023-08-28T05:00:00
  finsh: 2023-08-28T16:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 3
  start: 2023-08-28T16:00:00
  finsh: 2023-08-29T05:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 1
  start: 2023-08-29T05:00:00
  finsh: 2023-08-29T16:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 3
  start: 2023-08-29T16:00:00
  finsh: 2023-08-30T05:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 1
  start: 2023-08-30T05:00:00
  finsh: 2023-08-30T16:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 3
  start: 2023-08-30T16:00:00
  finsh: 2023-08-31T05:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 1
  start: 2023-08-31T05:00:00
  finsh: 2023-08-31T16:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 3
  start: 2023-08-31T16:00:00
  finsh: 2023-09-01T05:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 1
  start: 2023-09-01T05:00:00
  finsh: 2023-09-01T16:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 3
  start: 2023-09-01T16:00:00
  finsh: 2023-09-02T05:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 1
  start: 2023-09-02T05:00:00
  finsh: 2023-09-02T16:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct

- stage: 3
  start: 2023-08-27T16:00:00
  finsh: 2023-08-28T00:00:00
  source: https://twitter.com/CityofCT/status/1695804610932273188
  include: coct
historical_changes: []

@keeganwhite
Copy link

keeganwhite commented Sep 11, 2023

@beyarkay have you had any time to generate some test yaml files? (I am also working on some).

I have started some rudimentary test cases for a data aggregation file I wrote here. I have generated the historical data using this file but I am afraid it will (may) be riddled with errors until some proper testing is done.

@beyarkay
Copy link
Owner Author

Hey, sorry for the delay.

Yes you're correct, the misbehaving file should be omitted (the one formatted like:

---
- stage: 3
  start: 2023-08-27T16:00:00
  finsh: 2023-08-28T05:00:00
  source: https://twitter.com/Eskom_SA/status/1695790083796828468
  exclude: coct
- stage: 1
  start: 2023-08-28T05:00:00
...

I'm not sure what happened there). The correctly formatted file should have two keys: changes and historical_changes, each of which accepts a list of "change" objects (although historical_changes is deprecated and not used anymore).

Busy working on the test files now, should have them uploaded to your PR in a bit.

beyarkay added a commit to keeganwhite/eskom-calendar-historical-data that referenced this issue Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants