-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved Sankey Diagram #844
Conversation
- include biosphere flows as nodes - do not aggregate positive and negative flows
@BenPortner Thanks a lot for this amazing PR! I'm really happy to see this contribution. As you also mention, AB doesn't currently support BW2.5, we're looking at it behind the screens but it's mostly on @cmutel's side at the moment. Personally, I think we should not merge this in the current form for 2 reasons:
@cmutel and @bsteubing Please have a look, this is very nice work! |
This is impressive and useful work. Here are the parts that I would hope could be addressed before it is merged:
|
Hi @BenPortner , thanks a lot for these excellent ideas. I would very much like to have these features. However, let's first clarify a few content and implementation things. Here some ideas for further improvement:
Concerning implementation:
|
@marc-vdm
I agree that this is the preferred option. However, this would require building a new conda package for bw2calc, which is not something I can do.
I am afraid I do not understand this one. I tested this without scenarios, only. @cmutel
See my answer to marc.
The new approach is depth-first. I have trouble seeing how importance first is preferable. The number of calculations should be the same (importance-first calculates the LCA score for all inputs to set an order, depth-first calculates the LCA score for all inputs to decide whether to go deep).
The current implementation treats the biosphere differently from the technosphere. The improved implementation treats them equally. Hence, I would argue that the improved implementation is actually helping to achieve this goal ("treating all databases the same").
I don't see how the improved implementation prevents this.
As long as the biosphere and the technosphere are separate matrices, it is not possible to restore metadata from the matrix indices. How is the algorithm supposed to know if
Single-product demands are what I could test with the AB. @bsteubing
This should be easy to implement.
Then these 1000s of processes will link to CO2. The real question is: Who wants to have 1000s of processes in their sankey diagram in the first place? My implementation makes it easier to limit the amount of processes by introducing a depth limitation. With a reasonable number of processes, I am not worried about clearness.
This might be true for CO2, which has the same CF for all categories. But users might be interested in different categories for other impact methods (e.g. toxicity).
The way to implement this is to keep the nodes and make the size very small. Should be easy to implement if desired.
This is how brightway currently works. Likewise, electricity is shown as "going into" process B, even though it really leaves as a by-product. If we are to start switching directions for biosphere flows, we should do it for technosphere by-products, too.
Only those above the cut-off.
There might be a lot of edges in cases where three conditions are fulfilled: In my opinion, all three conditions coming together is rather rare. If this proves to be false, one could
I would say this depends on Chris' willingness to further develop (and package) legacy versions of bw. |
I had a look at the code, things are in places that they shouldn't be, but that's not your fault, it was already like this. We'll need to move some things around I think. Anyway, that's not a problem for now.
Agree, this may be problematic. IMO better to keep this as individual flows. This would be the same as grouping all processes with product name
Could you clarify on your example of electricity and process B? From how I understood the sankey process B just has 2 inputs for electricity, one positive and one negative of the same amount. If my understanding is correct, the flows should both still point from the electricity to B. As for applying this to biosphere: I think we should think about this more. The convention in LCA is that both resource extraction (coal mined) and emissions (CO2 emitted) are represented as positive numbers. -Personally I think this convention is wrong as for the technosphere, inputs are negative and outputs are positive, IMO biosphere should follow that convention too, but I don't have the power to change that.- I don't see an immediate way to do this automatically unless we had a list of all biosphere flows that were classed on their direction (though we could perhaps infer them based on the We could fix this by simply not having arrows in the biosphere edges?
I agree, I don't see this as a problem. Users can currently also make unwieldy graphs by misusing the cutoff/depth options -e.g. with electricity markets with many process inputs-, this should just be user discretion on choosing a proper cutoff/depth. Though we may look into this and refine the automatic cutoff/depth values we provide for starters. |
A couple points of clarification:
Indeed, sorry for my misunderstanding.
They will produce the same results if
Yes, you are right. Sorry.
Well, if you are using a list called |
Thank you for your input! @marc-vdm
Please feel free to move things around as necessary. Sorry if this causes extra work.
B has a negative electricity input = positive electricity output (by-product). A has a (positive) electricity input. My point is that the arrow direction and the sign of the flow amount express the same thing. Currently, AB keeps the flow directions true to the model and indicates negative flow amounts° by changing the color of the arrow to green. Alternatively, one could flip all green arrows, ending up with positive flow amounts only. I think that both approaches are fine, but they shouldn't be mixed. Therefore, if we start switching arrows for biosphere flows, we should do the same for by-products and materials for treatment. °Actually the color indicates negative impacts, but since most processes have positive impacts, a negative impact usually indicates a reversed direction of flow.
I agree that this is rather unintuitive. The way of thinking here is as follows: An emission of CO2 is treated as a damage to the environment, as is the extraction of a resource. Hence, both have a positive sign (indicating environmental damage rather than benefit). The sign is not about the flow of mass but about the flow of damages. The same goes for the CO2 flow in the demonstration example: Although CO2 leaves from A, a damage is allocated to A.
Doable if desired. But I don't feel there is a necessity. What do the others think? @cmutel
Makes sense. This should be changed.
Haha, true. My mistake. But do you still see a necessity for keeping biosphere and technosphere edges separate? I don't think it should cause incompatibilities with the bw agenda. |
Personally, I don't think the current labeling is problematic this way. But in the top left, why is the node |
…e d3-dagre draws arrows correctly
@marc-vdm edges = [
...
{'to': ('apos371', 'e53ed2834a03c816b6e233e0dc7e1dfa'), 'from': ('biosphere3', 'f9749677-9c9f-4678-ab55-c607dfdc2cb9'), 'amount': -0.6000000238418579, 'exc_amount': 0.6000000238418579, 'impact': -0.6000000238418579, 'from_type': 'biosphere'}
...
{'to': ('apos371', 'e53ed2834a03c816b6e233e0dc7e1dfa'), 'from': ('biosphere3', 'f9749677-9c9f-4678-ab55-c607dfdc2cb9'), 'amount': 0.6000000238418579, 'exc_amount': 0.6000000238418579, 'impact': 0.6000000238418579, 'from_type': 'biosphere'}
] However, for some reason the negative edge is not drawn by d3-dagre. I need to look into this. Edit: fixed. |
…each pair of nodes: one with negative impact and one with positive impact.
Update @marc-vdm @cmutel @bsteubing:
from bw2data import projects, Database
from activity_browser.bwutils.superstructure.graph_traversal import GraphTraversal as newGT
from bw2calc import GraphTraversal as oldGT
import time
projects.set_current("ab")
method = ("ILCD 2.0 2018 midpoint", "climate change", "climate change total")
db = Database("apos371")
acts = [db.random() for _ in range(10)]
# new
start = time.time()
res_new = [newGT(include_biosphere=False, use_keys=False).calculate(demand={act:1}, method=method, cutoff=0.05, max_calc=250) for act in acts]
end = time.time()
print(f"New GraphTraversal needed {end-start:.1f} s for 10 activities.")
# New GraphTraversal needed 39.9 s for 10 activities.
# old
start = time.time()
res_old = [oldGT().calculate(demand={act:1}, method=method, cutoff=0.05, max_calc=250) for act in acts]
end= time.time()
print(f"Old GraphTraversal needed {end-start:.1f} s for 10 activities.")
# Old GraphTraversal needed 42.5 s for 10 activities. |
@BenPortner Nice. I think we need a new library, I would made the node and edge classes dataclasses - they are exactly the use case for this functionality. There is no reason to limit this to one functional unit, you can just iterate over the I changed the name of this class to |
…ts necessary. `cumulative_score` removed because identical to `scaled_score`.
…eatment activities
I don't see any advantage in creating yet another library. On the contrary, I think it has a lot of disadvantages for developers (e.g. more dependencies to manage manually) and users (e.g. unclarity where to post issues). I have described these before here. Due to the reasons stated there I am opposed to a new library.
Done.
The improved version now supports multiple functional units.
It needs a starting node to keep compatible with the
I personally feel like |
@BenPortner I was using this branch to actually use this and here are some further thoughts: This figure shows the same system in current Sankey and this branch implementations. First thing I notice is that there is 1) much more information for user to interpret when EFs are included, 2) the focus is now on (cumulative) contributions from EFs, where the focus was on (cumulative) contributions from processes. Both are useful perspectives, but we lose the process focus when using this branch. I think there are 2 possible solutions for this: 1) we keep both, allowing user to choose which implementation to use (perhaps with checkbox 'include environmental intervention flows') 2) we add back in the shading (and % direct impact) for processes. The latter would allow user to see which processes still contribute a lot, which -imo- is harder in the new branch (comparing shading vs thickness of arrow input). Now, when I include some loops in the system, we again run this system in both branches, we run into some issues:
edit: add files so others can also try this with the same system |
- fix typo in docstring
@marc-vdm Thanks for the testing effort and the extensive report! I looked at the issue and this is what I found:
I don't have a preference for either 1) or 2). I will let you and your team decide what to do.
As a matter of fact, both numbers are right! The GWP 100 of 1 kg of hot-rolled steel in your model is 7.1 kg CO2e. However, to produce 1 kg of hot-rolled steel, you need another kg of hot-rolled steel in upstream processes. The cumulative impact, which is shown in the tooltip of the corresponding node, accounts for this additional demand, i.e. it relates to 2 kg of hot-rolled steel instead of 1. That is why the cumulative impact is double the unit impact.
This is a good catch! In fact, the new version produces the wrong result here. The old version is correct. The error occurs because the new version will traverse the circular supply chain until the cutoff criterion is reached. As a consequence, part of the demand of raw steel is being cut off. In fact, if you decrease the cutoff criterion you will see that the amount of raw steel increases. The old graph traversal does not repeatedly traverse the same edges. Instead, it uses the life cycle inventory to calculate the correct flow amount. However, because it uses the LCI amount, it cannot differentiate between positive and negative contributions of the same activity. This differentiation is the major goal of the new version, hence it cannot use the same approach. For now, I have no solution how to fix the new version. Maybe someone else has an idea? |
@BenPortner Thanks for the response! You have some interesting points, would you be available for a chat this week or the next? The point you make about the 7 vs 14kg both being right is not entirely correct I think, without the loop the impact is 3.5, with the loop the impact becomes 7 (you're right that this should double, just not again to 14). Though I think it'd be good for me to also discuss the last paragraph of your comment, I still need to learn more about this graph traversal it seems. |
Can you send me an email?
I'm not yet convinced that the numbers are wrong. Without loops the impact of 1 kg hot-rolled steel is 3.5 kg CO2e. With loops the impact of 1 kg of hot-rolled steel becomes 7.0 kg CO2e (see LCIA result). Consequently, the impact of 2 kg of hot-rolled steel is 14.0 kg of CO2e. This is exactly what the cumulative impact shows (because the life cycle inventory amount is 2 kg). |
# Conflicts: # activity_browser/static/javascript/navigator.js (reformat)
Hi @marc-vdm, Just letting you know that I won't have time to finish this, unfortunately. I think the flaw you found is irreparable. It won't be possible to have a Sankey diagram that shows negative and positive flows while at the same time handling circular references correctly. Nevertheless, I believe the biosphere node visualization is a valuable feature. I create a branch here, which keeps this but reset the calculation to the old way. With this, the error you found should be fixed. Unfortunately, I did not have time to test it thoroughly but I hope you will be able to work from there. Let me know how it goes. Sorry that I cannot help you more! |
Closing as stale, we will re-open this once we have the capacity to actually implement this. |
Description
This PR improves the existing sankey diagram feature by two points:
Demonstration
current:
improved:
Implementation Details
The central change is the introduction of a new module
activity-browser.bwutils.superstructure.graph_traversal
, which replacesbw2calc.graph_traversal
. Changes in the other files are minor.activity-browser.bwutils.superstructure.graph_traversal.py
:Basically a complete rewrite of
bw2calc.graph_traversal
version 1.8.1. The rewrite is based on the legacy version because bw 2.5 is not yet stable (at least I couldn't get it to run). I will push to include the new features in brightway so that the activity-browser does not have to maintain a separate version. Important changes toGraphTraversal.calculate
:max_depth
control parameterreturn values: the number of LCA calculations is no longer counted (counter
is None)nodes
keys are now actual activity keys instead of LCA matrix indices. This change was necessary because it is not clear from the matrix indices whether the activity is from the biosphere or technosphere, making it impossible to fetch metadata like name, location...from
andto
fields inedges
are now activity keysactivity-browser.bwutils.superstructure.graph_traversal_with_scenario.py
:Uses new
graph_traversal.py
.activity-browser.ui.web.sankey_navigator.py
:Adapted to account for new function signature and changed keys (reverse dictionaries no longer necessary). For biosphere nodes
categories
is used instead oflocation
.activity-browser.bwutils.commontasks.py
andactivity-browser.static.css.sankey_navigator.css
:Adapted to add a green frame to biosphere nodes.