Improved Sankey Diagram #844

BenPortner · 2022-09-13T14:37:42Z

Description

This PR improves the existing sankey diagram feature by two points:

Sankey diagrams now include biosphere nodes. Currently, one has to switch between the "Sankey" and "EF contributions" tab to see, which EFs cause impacts. Furthermore, one has to "guess" how the EF impacts are distributed across the activties. In the improved version, biosphere flows are drawn just like technosphere flows, allowing to see exactly where impacts come from.
Positive and negative flows from the same activity no longer cancel. Currently, a node, which produces +x units in one instance and -x units in another instance does not appear in the Sankey diagram because its total impact is zero. In the new version, both the node and the corresponding flows are drawn, allowing better insight into systems, which use the substitution method to account for by-products.

Demonstration

current:

improved:

Implementation Details

The central change is the introduction of a new module activity-browser.bwutils.superstructure.graph_traversal, which replaces bw2calc.graph_traversal. Changes in the other files are minor.

`activity-browser.bwutils.superstructure.graph_traversal.py`:

Basically a complete rewrite of bw2calc.graph_traversal version 1.8.1. The rewrite is based on the legacy version because bw 2.5 is not yet stable (at least I couldn't get it to run). I will push to include the new features in brightway so that the activity-browser does not have to maintain a separate version. Important changes to GraphTraversal.calculate:

function signature: introduced additional max_depth control parameter
~~return values: the number of LCA calculations is no longer counted (counter is None)~~
return values: nodes keys are now actual activity keys instead of LCA matrix indices. This change was necessary because it is not clear from the matrix indices whether the activity is from the biosphere or technosphere, making it impossible to fetch metadata like name, location...
return values: for the same reason, from and to fields in edges are now activity keys

`activity-browser.bwutils.superstructure.graph_traversal_with_scenario.py`:

Uses new graph_traversal.py.

`activity-browser.ui.web.sankey_navigator.py`:

Adapted to account for new function signature and changed keys (reverse dictionaries no longer necessary). For biosphere nodes categories is used instead of location.

`activity-browser.bwutils.commontasks.py` and `activity-browser.static.css.sankey_navigator.css`:

Adapted to add a green frame to biosphere nodes.

- include biosphere flows as nodes - do not aggregate positive and negative flows

coveralls · 2022-09-13T14:47:37Z

Coverage decreased (-0.4%) to 54.037% when pulling 5986596 on BenPortner:better-sankey into 44ceac1 on LCA-ActivityBrowser:master.

marc-vdm · 2022-09-13T14:59:51Z

@BenPortner Thanks a lot for this amazing PR! I'm really happy to see this contribution.

As you also mention, AB doesn't currently support BW2.5, we're looking at it behind the screens but it's mostly on @cmutel's side at the moment.

Personally, I think we should not merge this in the current form for 2 reasons:

I think it's not wise to use a different version of graph traversal, even temporarily, but rather merge the graph traversal upstream and then implement it in AB.
Less important/small detail: It looks like you write it for superstructure only. We have two implementations, and it looks like it's currently only written for scenarios.

@cmutel and @bsteubing Please have a look, this is very nice work!

cmutel · 2022-09-13T16:05:30Z

This is impressive and useful work. Here are the parts that I would hope could be addressed before it is merged:

I cannot endorse adding more functionality to the AB which overlaps BW functionality, but is different. We already have too much of this, and it makes life difficult. The proposed changes can be added to the bw2legacy branch of bw2calc.
The new GraphTraversal doesn't use the heap, and instead substitutes a breadth-first search with pruning. In my testing, the important-first search is always preferable; if we should switch, then this choice needs to be justified.
Using activity keys is in direct contradiction to packages A and B of the BW strategic plan. bw2calc should be completely independent of whatever database is used, and should be usable in cloud installations where there are only datapackages. An alternative is to have two lists of edges (B/T), and to add a function to the 2.0 branch of bw2data to rehydrate metadata from matrix indices.
I don't understand the justification for limiting to demand dicts with only one product.

bsteubing · 2022-09-13T16:05:51Z

Hi @BenPortner , thanks a lot for these excellent ideas. I would very much like to have these features. However, let's first clarify a few content and implementation things.

Here some ideas for further improvement:

can the biosphere flows be visualized as going to "circles" instead of "rectangles", which are reserved for economic processes? I think that would visually help.
I think we need an option to switch "on/off" the biosphere flows as we might not want to see this in every case
have you tested what this will look like with real-world ecoinvent? I imagine that we have 1000s of processes emitting "CO2".
not sure if this idea is good, but "CO2", e.g., has several compartments and we may just be interested in "CO2" overall...
thinking further, if it becomes a big mess in real-world systems, an alternative implementation could be to "skip" the nodes of the biosphere flows and just have small horizontal arrows leaving the respective processes that indicate the size and name of the biosphere substance
the flow you show above seems to say that CO2 is "going into" process A, but I think this should be the other way around
question: IF a process shows up in the Sankey, does this mean that ALL of its biosphere flows (contributing to a specific impact category) will be shown or ONLY THOSE above the cut-off value. We need to be clear on this and we should test this for processes and impact categories where a single process has MANY biosphere flows. Take for example the plastics datasets (from plasticseurope), which are aggregated (system) processes. Here the process basically represents the life cycle inventory and that means that it will probably have 10s, 100s, or even more biosphere flows relevant to specific impact categories

Concerning implementation:

I tend to agree with @marc-vdm that this should directly be implemented in bw...
however, I recently learned that bw 2.5 may introduce things that break the graph traversal and I think before we take a decision, I'd like to hear what @cmutel has to say here
if implementing this in bw is not a good idea for some reason, then I think we can temporarily host it in the AB

BenPortner · 2022-09-13T17:12:47Z

@marc-vdm

I think it's not wise to use a different version of graph traversal, even temporarily, but rather merge the graph traversal upstream and then implement it in AB.

I agree that this is the preferred option. However, this would require building a new conda package for bw2calc, which is not something I can do.

Less important/small detail: It looks like you write it for superstructure only. We have two implementations, and it looks like it's currently only written for scenarios.

I am afraid I do not understand this one. I tested this without scenarios, only.

@cmutel

I cannot endorse adding more functionality to the AB which overlaps BW functionality, but is different. We already have too much of this, and it makes life difficult. The proposed changes can be added to the bw2legacy branch of bw2calc.

See my answer to marc.

The new GraphTraversal doesn't use the heap, and instead substitutes a breadth-first search with pruning. In my testing, the important-first search is always preferable; if we should switch, then this choice needs to be justified.

The new approach is depth-first. I have trouble seeing how importance first is preferable. The number of calculations should be the same (importance-first calculates the LCA score for all inputs to set an order, depth-first calculates the LCA score for all inputs to decide whether to go deep).

bw2calc should be completely independent of whatever database is used,

The current implementation treats the biosphere differently from the technosphere. The improved implementation treats them equally. Hence, I would argue that the improved implementation is actually helping to achieve this goal ("treating all databases the same").

should be usable in cloud installations where there are only datapackages.

I don't see how the improved implementation prevents this.

An alternative is to have two lists of edges (B/T), and to add a function to the 2.0 branch of bw2data to rehydrate metadata from matrix indices.

As long as the biosphere and the technosphere are separate matrices, it is not possible to restore metadata from the matrix indices. How is the algorithm supposed to know if 1 means the first entry in the biosphere matrix or the technosphere matrix?

I don't understand the justification for limiting to demand dicts with only one product.

Single-product demands are what I could test with the AB.

@bsteubing

can the biosphere flows be visualized as going to "circles" instead of "rectangles", which are reserved for economic processes? I think that would visually help.

I think we need an option to switch "on/off" the biosphere flows as we might not want to see this in every case

This should be easy to implement.

have you tested what this will look like with real-world ecoinvent? I imagine that we have 1000s of processes emitting "CO2".

Then these 1000s of processes will link to CO2. The real question is: Who wants to have 1000s of processes in their sankey diagram in the first place? My implementation makes it easier to limit the amount of processes by introducing a depth limitation. With a reasonable number of processes, I am not worried about clearness.

not sure if this idea is good, but "CO2", e.g., has several compartments and we may just be interested in "CO2" overall...

This might be true for CO2, which has the same CF for all categories. But users might be interested in different categories for other impact methods (e.g. toxicity).

thinking further, if it becomes a big mess in real-world systems, an alternative implementation could be to "skip" the nodes of the biosphere flows and just have small horizontal arrows leaving the respective processes that indicate the size and name of the biosphere substance

The way to implement this is to keep the nodes and make the size very small. Should be easy to implement if desired.

the flow you show above seems to say that CO2 is "going into" process A, but I think this should be the other way around

This is how brightway currently works. Likewise, electricity is shown as "going into" process B, even though it really leaves as a by-product. If we are to start switching directions for biosphere flows, we should do it for technosphere by-products, too.

question: IF a process shows up in the Sankey, does this mean that ALL of its biosphere flows (contributing to a specific impact category) will be shown or ONLY THOSE above the cut-off value

Only those above the cut-off.

we should test this for processes and impact categories where a single process has MANY biosphere flows. Take for example the plastics datasets (from plasticseurope), which are aggregated (system) processes. Here the process basically represents the life cycle inventory and that means that it will probably have 10s, 100s, or even more biosphere flows relevant to specific impact categories.

There might be a lot of edges in cases where three conditions are fulfilled:
a) a node in the diagram is an aggregated dataset and
b) many of the biosphere flows of this node are above the cut-off and
b) the chosen impact method has lots of CFs (e.g. ecotoxicity)

In my opinion, all three conditions coming together is rather rare. If this proves to be false, one could
i) implement a limit to the number of biosphere flows shown or
ii) aggregate the nodes according to some logic

Concerning implementation:

I tend to agree with @marc-vdm that this should directly be implemented in bw...

however, I recently learned that bw 2.5 may introduce things that break the graph traversal and I think before we take a decision, I'd like to hear what @cmutel has to say here

if implementing this in bw is not a good idea for some reason, then I think we can temporarily host it in the AB

I would say this depends on Chris' willingness to further develop (and package) legacy versions of bw.

marc-vdm · 2022-09-14T07:56:33Z

Less important/small detail: It looks like you write it for superstructure only. We have two implementations, and it looks like it's currently only written for scenarios.

I am afraid I do not understand this one. I tested this without scenarios, only.

I had a look at the code, things are in places that they shouldn't be, but that's not your fault, it was already like this. We'll need to move some things around I think. Anyway, that's not a problem for now.

not sure if this idea is good, but "CO2", e.g., has several compartments and we may just be interested in "CO2" overall...

This might be true for CO2, which has the same CF for all categories. But users might be interested in different categories for other impact methods (e.g. toxicity).

Agree, this may be problematic. IMO better to keep this as individual flows. This would be the same as grouping all processes with product name electricity, high voltage -good for process contributions, not for Sankey-.

the flow you show above seems to say that CO2 is "going into" process A, but I think this should be the other way around

This is how brightway currently works. Likewise, electricity is shown as "going into" process B, even though it really leaves as a by-product. If we are to start switching directions for biosphere flows, we should do it for technosphere by-products, too.

Could you clarify on your example of electricity and process B?

From how I understood the sankey process B just has 2 inputs for electricity, one positive and one negative of the same amount. If my understanding is correct, the flows should both still point from the electricity to B.
With splitting pos/neg flows, we get potentially 4 edges between nodes, the arrow should indicate the direction of the edge, the color should indicate the sign (positive/negative).

As for applying this to biosphere: I think we should think about this more. The convention in LCA is that both resource extraction (coal mined) and emissions (CO2 emitted) are represented as positive numbers. -Personally I think this convention is wrong as for the technosphere, inputs are negative and outputs are positive, IMO biosphere should follow that convention too, but I don't have the power to change that.-

I don't see an immediate way to do this automatically unless we had a list of all biosphere flows that were classed on their direction (though we could perhaps infer them based on the categories column.

We could fix this by simply not having arrows in the biosphere edges?
Or alternatively: we could fix this partly by having a checkbox for users with flip biosphere flows, to use at their own discretion?

we should test this for processes and impact categories where a single process has MANY biosphere flows. Take for example the plastics datasets (from plasticseurope), which are aggregated (system) processes. Here the process basically represents the life cycle inventory and that means that it will probably have 10s, 100s, or even more biosphere flows relevant to specific impact categories.

There might be a lot of edges in cases where three conditions are fulfilled: a) a node in the diagram is an aggregated dataset and b) many of the biosphere flows of this node are above the cut-off and b) the chosen impact method has lots of CFs (e.g. ecotoxicity)

In my opinion, all three conditions coming together is rather rare. If this proves to be false, one could i) implement a limit to the number of biosphere flows shown or ii) aggregate the nodes according to some logic

I agree, I don't see this as a problem. Users can currently also make unwieldy graphs by misusing the cutoff/depth options -e.g. with electricity markets with many process inputs-, this should just be user discretion on choosing a proper cutoff/depth. Though we may look into this and refine the automatic cutoff/depth values we provide for starters.

cmutel · 2022-09-14T08:23:07Z

A couple points of clarification:

The new approach is depth-first.

Indeed, sorry for my misunderstanding.

I have trouble seeing how importance first is preferable. The number of calculations should be the same (importance-first calculates the LCA score for all inputs to set an order, depth-first calculates the LCA score for all inputs to decide whether to go deep).

They will produce the same results if max_calc is infinite; otherwise, importance-first is strictly better, as it will produce the most information. Depth first can exhaust the number of calculations permitted in one branch of the supply chain before even seeing other branches.

should be usable in cloud installations where there are only datapackages.

I don't see how the improved implementation prevents this.

Yes, you are right. Sorry.

An alternative is to have two lists of edges (B/T), and to add a function to the 2.0 branch of bw2data to rehydrate metadata from matrix indices.

As long as the biosphere and the technosphere are separate matrices, it is not possible to restore metadata from the matrix indices. How is the algorithm supposed to know if 1 means the first entry in the biosphere matrix or the technosphere matrix?

Well, if you are using a list called biosphere_edges you might have a hint :)

BenPortner · 2022-09-14T10:34:53Z

Thank you for your input!

@marc-vdm

Less important/small detail: It looks like you write it for superstructure only. We have two implementations, and it looks like it's currently only written for scenarios.

I am afraid I do not understand this one. I tested this without scenarios, only.

I had a look at the code, things are in places that they shouldn't be, but that's not your fault, it was already like this. We'll need to move some things around I think. Anyway, that's not a problem for now.

Please feel free to move things around as necessary. Sorry if this causes extra work.

the flow you show above seems to say that CO2 is "going into" process A, but I think this should be the other way around

This is how brightway currently works. Likewise, electricity is shown as "going into" process B, even though it really leaves as a by-product. If we are to start switching directions for biosphere flows, we should do it for technosphere by-products, too.

Could you clarify on your example of electricity and process B?

From how I understood the sankey process B just has 2 inputs for electricity, one positive and one negative of the same amount. If my understanding is correct, the flows should both still point from the electricity to B. With splitting pos/neg flows, we get potentially 4 edges between nodes, the arrow should indicate the direction of the edge, the color should indicate the sign (positive/negative).

B has a negative electricity input = positive electricity output (by-product). A has a (positive) electricity input. My point is that the arrow direction and the sign of the flow amount express the same thing. Currently, AB keeps the flow directions true to the model and indicates negative flow amounts° by changing the color of the arrow to green. Alternatively, one could flip all green arrows, ending up with positive flow amounts only. I think that both approaches are fine, but they shouldn't be mixed. Therefore, if we start switching arrows for biosphere flows, we should do the same for by-products and materials for treatment.

°Actually the color indicates negative impacts, but since most processes have positive impacts, a negative impact usually indicates a reversed direction of flow.

As for applying this to biosphere: I think we should think about this more. The convention in LCA is that both resource extraction (coal mined) and emissions (CO2 emitted) are represented as positive numbers. -Personally I think this convention is wrong as for the technosphere, inputs are negative and outputs are positive, IMO biosphere should follow that convention too, but I don't have the power to change that.-

I don't see an immediate way to do this automatically unless we had a list of all biosphere flows that were classed on their direction (though we could perhaps infer them based on the categories column.

I agree that this is rather unintuitive. The way of thinking here is as follows: An emission of CO2 is treated as a damage to the environment, as is the extraction of a resource. Hence, both have a positive sign (indicating environmental damage rather than benefit). The sign is not about the flow of mass but about the flow of damages. The same goes for the CO2 flow in the demonstration example: Although CO2 leaves from A, a damage is allocated to A.

We could fix this by simply not having arrows in the biosphere edges? Or alternatively: we could fix this partly by having a checkbox for users with flip biosphere flows, to use at their own discretion?

Doable if desired. But I don't feel there is a necessity. What do the others think?

@cmutel

I have trouble seeing how importance first is preferable. The number of calculations should be the same (importance-first calculates the LCA score for all inputs to set an order, depth-first calculates the LCA score for all inputs to decide whether to go deep).

They will produce the same results if max_calc is infinite; otherwise, importance-first is strictly better, as it will produce the most information. Depth first can exhaust the number of calculations permitted in one branch of the supply chain before even seeing other branches.

Makes sense. This should be changed.

An alternative is to have two lists of edges (B/T), and to add a function to the 2.0 branch of bw2data to rehydrate metadata from matrix indices.

As long as the biosphere and the technosphere are separate matrices, it is not possible to restore metadata from the matrix indices. How is the algorithm supposed to know if 1 means the first entry in the biosphere matrix or the technosphere matrix?

Well, if you are using a list called biosphere_edges you might have a hint :)

Haha, true. My mistake. But do you still see a necessity for keeping biosphere and technosphere edges separate? I don't think it should cause incompatibilities with the bw agenda.

BenPortner · 2022-09-15T10:26:35Z

I tried changing the shape of the biosphere nodes to ellipses and it turned out more complicated than expected. Unfortunately, there are problems with overflowing labels. I tried to fix it by centering the labels, but the result is still not very pleasing. Judge for yourselves:

marc-vdm · 2022-09-15T11:16:59Z

Personally, I don't think the current labeling is problematic this way.

But in the top left, why is the node Carbon dioxide, fossil, air, urban close to ground 0% and it's flow Carbon dioxide, fossil 60%? That doesn't sound right to me.

…e d3-dagre draws arrows correctly

BenPortner · 2022-09-15T11:56:43Z

@marc-vdm
Just like the technosphere nodes, the color of the biosphere nodes shows the total impact that the corresponding activity adds to the functional unit (i.e. not just the impact of one edge but the sum of all edges, also the ones below the cutoff). In this particular case, the total impact caused by Carbon dioxide, fossil, air, urban close to ground is zero, because electricity is consumed by A but the same amount of electricity is produced by B , thereby cancelling the consumption of electricity and the carbon emission. As a matter of fact, the edge list correctly reflects this by having a negative and a positive edge between Carbon dioxide... and electricity production...:

edges = [
...
{'to': ('apos371', 'e53ed2834a03c816b6e233e0dc7e1dfa'), 'from': ('biosphere3', 'f9749677-9c9f-4678-ab55-c607dfdc2cb9'), 'amount': -0.6000000238418579, 'exc_amount': 0.6000000238418579, 'impact': -0.6000000238418579, 'from_type': 'biosphere'}
...
{'to': ('apos371', 'e53ed2834a03c816b6e233e0dc7e1dfa'), 'from': ('biosphere3', 'f9749677-9c9f-4678-ab55-c607dfdc2cb9'), 'amount': 0.6000000238418579, 'exc_amount': 0.6000000238418579, 'impact': 0.6000000238418579, 'from_type': 'biosphere'}
]

However, for some reason the negative edge is not drawn by d3-dagre. I need to look into this.

Edit: fixed.

…e pair of nodes

…each pair of nodes: one with negative impact and one with positive impact.

…x `_get_by_index`

BenPortner · 2022-09-16T15:55:02Z

Update @marc-vdm @cmutel @bsteubing:

I did another major rewrite of graph_traversal.py. There are now classes for nodes, edges and their respective lists. This makes the code cleaner and self-documenting.
biosphere nodes and technosphere nodes are now stored in separate lists
added function for importance first traversal and made it the default
added switch use_keys: True means activity keys will be used in node list and edge list, False means matrix indices will be used (note: the latter option is not compatible with ABs downstream processing)
added switch include_biosphere: whether to include biosphere nodes in the node list (and hence sankey diagram) or not
improved speed: Under comparable conditions (excluding biosphere nodes and keys), the new algorithm is slightly faster than the bw2calc version:

from bw2data import projects, Database
from activity_browser.bwutils.superstructure.graph_traversal import GraphTraversal as newGT
from bw2calc import GraphTraversal as oldGT
import time

projects.set_current("ab")
method = ("ILCD 2.0 2018 midpoint", "climate change", "climate change total")
db = Database("apos371")
acts = [db.random() for _ in range(10)]

# new
start = time.time()
res_new = [newGT(include_biosphere=False, use_keys=False).calculate(demand={act:1}, method=method, cutoff=0.05, max_calc=250) for act in acts]
end = time.time()
print(f"New GraphTraversal needed {end-start:.1f} s for 10 activities.")
# New GraphTraversal needed 39.9 s for 10 activities.

# old
start = time.time()
res_old = [oldGT().calculate(demand={act:1}, method=method, cutoff=0.05, max_calc=250) for act in acts]
end= time.time()
print(f"Old GraphTraversal needed {end-start:.1f} s for 10 activities.")
# Old GraphTraversal needed 42.5 s for 10 activities.

cmutel · 2022-09-17T10:35:11Z

@BenPortner Nice.

I think we need a new library, bw_graph_traversal, which should include your work and the multifunctional traversal class. If you decide to create this, I will make PRs/issues with some small changes. It would be a pity to have this hidden away in the AB code.

I would made the node and edge classes dataclasses - they are exactly the use case for this functionality.

There is no reason to limit this to one functional unit, you can just iterate over the keys here and add multiple nodes. The traverse functions don't need a starting node, just take the top one on the heap.

I changed the name of this class to AssumedDiagonalGraphTraversal, as it assumes values exist on the diagonal and are special. This is not generally the case, see above multifunctional merged PR.

…pletely

…ts necessary. `cumulative_score` removed because identical to `scaled_score`.

…eatment activities

BenPortner · 2022-09-20T17:00:38Z

@cmutel

I think we need a new library, bw_graph_traversal

I don't see any advantage in creating yet another library. On the contrary, I think it has a lot of disadvantages for developers (e.g. more dependencies to manage manually) and users (e.g. unclarity where to post issues). I have described these before here. Due to the reasons stated there I am opposed to a new library.

I would made the node and edge classes dataclasses - they are exactly the use case for this functionality.

Done.

There is no reason to limit this to one functional unit, you can just iterate over the keys here and add multiple nodes.

The improved version now supports multiple functional units.

The traverse functions don't need a starting node, just take the top one on the heap.

It needs a starting node to keep compatible with the traverse_depth_first function syntax.

I changed the name of this class to AssumedDiagonalGraphTraversal, as it assumes values exist on the diagonal and are special. This is not generally the case, see above multifunctional merged PR.

I personally feel like AssumedDiagonalGraphTraversal is a rather irritating name. It sounds like the traversal happens along the diagonal of a matrix, rather than describing the implicit assumption this graph traversal class assumes that diagonal values exist and are special . Perhaps it would be better to have a boolean parameter has_diagonal that changes the behavior of GraphTraversal? What do the others think?

marc-vdm · 2022-12-15T08:50:03Z

@BenPortner I was using this branch to actually use this and here are some further thoughts:

This figure shows the same system in current Sankey and this branch implementations.

First thing I notice is that there is 1) much more information for user to interpret when EFs are included, 2) the focus is now on (cumulative) contributions from EFs, where the focus was on (cumulative) contributions from processes. Both are useful perspectives, but we lose the process focus when using this branch. I think there are 2 possible solutions for this: 1) we keep both, allowing user to choose which implementation to use (perhaps with checkbox 'include environmental intervention flows') 2) we add back in the shading (and % direct impact) for processes. The latter would allow user to see which processes still contribute a lot, which -imo- is harder in the new branch (comparing shading vs thickness of arrow input).

Now, when I include some loops in the system, we again run this system in both branches, we run into some issues:

Note here, that the reference flow is still the hot-rolling process, though Dagre gets confused and completely changes the order, but considering that Dagre is dead, there's not much we can do about that.

Most important problem: The total impact at hot-rolling is 200% of normal LCA results. This is the case in both implementations of Sankey. The impact calculated by normal LCA calculations in BW seem to be correct, I calculated everything in excel too, I get the same result as the normal BW results. For some reason sankey results are wrong.
Next, there are differences between flow results between the branches too: if we for example consider the flow of raw steel from steel smelting (bottom of both graphs), the impact are 171% and 107%, a difference that I don't understand. In the original implementation, we get a 'direct' impact of 31%, where in this branch, we get an impact of 19.6% (this process has no methane emissions). I must say I'm rather confused to the differences, and I'm not sure which of the results is correct, though the cumulative impact of the process iron ore mining is the same as the flow iron ore to steel smelting in the original implementation, where it is not in the new branch. Though, as the results are not the same as we get from normal LCA, I don't know which would be correct.

edit: add files so others can also try this with the same system
example.xlsx
example loop.xlsx

- fix typo in docstring

BenPortner · 2023-01-09T16:48:53Z

@marc-vdm Thanks for the testing effort and the extensive report! I looked at the issue and this is what I found:

First thing I notice is that there is 1) much more information for user to interpret when EFs are included, 2) the focus is now on (cumulative) contributions from EFs, where the focus was on (cumulative) contributions from processes. Both are useful perspectives, but we lose the process focus when using this branch. I think there are 2 possible solutions for this: 1) we keep both, allowing user to choose which implementation to use (perhaps with checkbox 'include environmental intervention flows') 2) we add back in the shading (and % direct impact) for processes. The latter would allow user to see which processes still contribute a lot, which -imo- is harder in the new branch (comparing shading vs thickness of arrow input).

I don't have a preference for either 1) or 2). I will let you and your team decide what to do.

Most important problem: The total impact at hot-rolling is 200% of normal LCA results. This is the case in both implementations of Sankey. The impact calculated by normal LCA calculations in BW seem to be correct, I calculated everything in excel too, I get the same result as the normal BW results. For some reason sankey results are wrong.

As a matter of fact, both numbers are right! The GWP 100 of 1 kg of hot-rolled steel in your model is 7.1 kg CO2e. However, to produce 1 kg of hot-rolled steel, you need another kg of hot-rolled steel in upstream processes. The cumulative impact, which is shown in the tooltip of the corresponding node, accounts for this additional demand, i.e. it relates to 2 kg of hot-rolled steel instead of 1. That is why the cumulative impact is double the unit impact.

Next, there are differences between flow results between the branches too: if we for example consider the flow of raw steel from steel smelting (bottom of both graphs), the impact are 171% and 107%, a difference that I don't understand. In the original implementation, we get a 'direct' impact of 31%, where in this branch, we get an impact of 19.6% (this process has no methane emissions). I must say I'm rather confused to the differences, and I'm not sure which of the results is correct, though the cumulative impact of the process iron ore mining is the same as the flow iron ore to steel smelting in the original implementation, where it is not in the new branch. Though, as the results are not the same as we get from normal LCA, I don't know which would be correct.

This is a good catch! In fact, the new version produces the wrong result here. The old version is correct. The error occurs because the new version will traverse the circular supply chain until the cutoff criterion is reached. As a consequence, part of the demand of raw steel is being cut off. In fact, if you decrease the cutoff criterion you will see that the amount of raw steel increases. The old graph traversal does not repeatedly traverse the same edges. Instead, it uses the life cycle inventory to calculate the correct flow amount. However, because it uses the LCI amount, it cannot differentiate between positive and negative contributions of the same activity. This differentiation is the major goal of the new version, hence it cannot use the same approach. For now, I have no solution how to fix the new version. Maybe someone else has an idea?

marc-vdm · 2023-01-10T09:49:58Z

@BenPortner Thanks for the response! You have some interesting points, would you be available for a chat this week or the next?

The point you make about the 7 vs 14kg both being right is not entirely correct I think, without the loop the impact is 3.5, with the loop the impact becomes 7 (you're right that this should double, just not again to 14). Though I think it'd be good for me to also discuss the last paragraph of your comment, I still need to learn more about this graph traversal it seems.

BenPortner · 2023-01-10T10:32:20Z

@marc-vdm

You have some interesting points, would you be available for a chat this week or the next?

Can you send me an email?

The point you make about the 7 vs 14kg both being right is not entirely correct I think, without the loop the impact is 3.5, with the loop the impact becomes 7 (you're right that this should double, just not again to 14).

I'm not yet convinced that the numbers are wrong. Without loops the impact of 1 kg hot-rolled steel is 3.5 kg CO2e. With loops the impact of 1 kg of hot-rolled steel becomes 7.0 kg CO2e (see LCIA result). Consequently, the impact of 2 kg of hot-rolled steel is 14.0 kg of CO2e. This is exactly what the cumulative impact shows (because the life cycle inventory amount is 2 kg).

# Conflicts: # activity_browser/static/javascript/navigator.js (reformat)

BenPortner · 2023-03-31T12:41:26Z

Hi @marc-vdm,

Just letting you know that I won't have time to finish this, unfortunately. I think the flaw you found is irreparable. It won't be possible to have a Sankey diagram that shows negative and positive flows while at the same time handling circular references correctly. Nevertheless, I believe the biosphere node visualization is a valuable feature. I create a branch here, which keeps this but reset the calculation to the old way. With this, the error you found should be fixed. Unfortunately, I did not have time to test it thoroughly but I hope you will be able to work from there. Let me know how it goes.

Sorry that I cannot help you more!
Ben

marc-vdm · 2023-09-22T13:48:43Z

Closing as stale, we will re-open this once we have the capacity to actually implement this.

Benjamin W. Portner added 7 commits September 13, 2022 15:19

improve graph_traversal algorithm:

221890b

- include biosphere flows as nodes - do not aggregate positive and negative flows

fix GraphTraversal.calculate docstring

64e0254

reset GraphTraversal cutoff default value to half percent

8f8c850

remove unused legacy functions

0331723

black reformatting

59a9f7e

remove unnecessary bw2data.databases import

a06e7eb

remove more unnecessary imports

fec7eb4

add back counter variable and max_calc limit

20fd791

add UI component for GraphTraversal.max_depth selection

c2153ee

bsteubing requested review from bsteubing, Zoophobus and nabilahmed739 September 13, 2022 15:17

Benjamin W. Portner added 4 commits September 14, 2022 17:05

make self.counter instance-specific (delete class variable)

e2cab2a

print to console if GraphTraversal reaches max_depth or max_calc

5986596

change biosphere node shape to ellipse

31a44d2

center node labels horizontally and vertically

244e9b9

use ellipse shape for biosphere nodes (instead of css rx/ry) to ensur…

37b02f1

…e d3-dagre draws arrows correctly

Benjamin W. Portner added 2 commits September 15, 2022 14:21

name edges to ensure that multiple edges can be drawn between the sam…

a22c32a

…e pair of nodes

aggregate sankey edges such that there are at most two edges between …

f798b7b

…each pair of nodes: one with negative impact and one with positive impact.

Benjamin W. Portner added 4 commits September 16, 2022 15:32

implement importance first traversal and make default

595bf5c

bug fix: use separate node lists for biosphere and technosphere to fi…

e328fec

…x `_get_by_index`

including biosphere nodes in the GraphTraversal result is now optional

f010cc5

GraphTraversal speed-up: buffer unit_score

e8b059a

Benjamin W. Portner added 11 commits September 19, 2022 11:37

make GTNode and GTEdge dataclasses

014465b

GTNodeSet wraps set() completely, GTEdgeList wraps list() com…

1d85980

…pletely

node.scaled_score uses buffered _unit_score -> no lca, cb argumen…

bc54a25

…ts necessary. `cumulative_score` removed because identical to `scaled_score`.

no need to calculate supply array separately

2693f88

fix bug in edgelist

1d4c4f5

remove supply from API (because contained in LCA object)

62ae3e3

rewrite GTBiosphereNode to not depend on fu_amount

8df596b

minor reformat

01c7b1e

fix bug: must normalize by abs(score) to ensure correct sankey for tr…

4b1232e

…eatment activities

allow more than one entry in demand dict

a8a7e30

add tests for new GraphTraversal class

600566a

marc-vdm mentioned this pull request Nov 4, 2022

Sankey should show 100% of impact in edges #869

Open

- add typing to GraphTraversal.reset

326c199

- fix typo in docstring

Benjamin W. Portner added 2 commits January 10, 2023 11:41

add example from LCA-ActivityBrowser#844 (comment) to tests

5f26555

Merge branch 'master' into better-sankey

3ac6b4e

# Conflicts: # activity_browser/static/javascript/navigator.js (reformat)

marc-vdm closed this Sep 22, 2023

cmutel mentioned this pull request Jun 5, 2024

Migrate to bw_graph_tools #1302

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved Sankey Diagram #844

Improved Sankey Diagram #844

BenPortner commented Sep 13, 2022 •

edited

Loading

coveralls commented Sep 13, 2022 •

edited

Loading

marc-vdm commented Sep 13, 2022

cmutel commented Sep 13, 2022

bsteubing commented Sep 13, 2022

BenPortner commented Sep 13, 2022 •

edited

Loading

marc-vdm commented Sep 14, 2022

cmutel commented Sep 14, 2022

BenPortner commented Sep 14, 2022

BenPortner commented Sep 15, 2022

marc-vdm commented Sep 15, 2022

BenPortner commented Sep 15, 2022 •

edited

Loading

BenPortner commented Sep 16, 2022

cmutel commented Sep 17, 2022

BenPortner commented Sep 20, 2022

marc-vdm commented Dec 15, 2022 •

edited

Loading

BenPortner commented Jan 9, 2023

marc-vdm commented Jan 10, 2023

BenPortner commented Jan 10, 2023 •

edited

Loading

BenPortner commented Mar 31, 2023 •

edited

Loading

marc-vdm commented Sep 22, 2023

Improved Sankey Diagram #844

Improved Sankey Diagram #844

Conversation

BenPortner commented Sep 13, 2022 • edited Loading

Description

Demonstration

Implementation Details

activity-browser.bwutils.superstructure.graph_traversal.py:

activity-browser.bwutils.superstructure.graph_traversal_with_scenario.py:

activity-browser.ui.web.sankey_navigator.py:

activity-browser.bwutils.commontasks.py and activity-browser.static.css.sankey_navigator.css:

coveralls commented Sep 13, 2022 • edited Loading

marc-vdm commented Sep 13, 2022

cmutel commented Sep 13, 2022

bsteubing commented Sep 13, 2022

BenPortner commented Sep 13, 2022 • edited Loading

@marc-vdm

@cmutel

@bsteubing

marc-vdm commented Sep 14, 2022

cmutel commented Sep 14, 2022

BenPortner commented Sep 14, 2022

@marc-vdm

@cmutel

BenPortner commented Sep 15, 2022

marc-vdm commented Sep 15, 2022

BenPortner commented Sep 15, 2022 • edited Loading

BenPortner commented Sep 16, 2022

cmutel commented Sep 17, 2022

BenPortner commented Sep 20, 2022

marc-vdm commented Dec 15, 2022 • edited Loading

BenPortner commented Jan 9, 2023

marc-vdm commented Jan 10, 2023

BenPortner commented Jan 10, 2023 • edited Loading

BenPortner commented Mar 31, 2023 • edited Loading

marc-vdm commented Sep 22, 2023

BenPortner commented Sep 13, 2022 •

edited

Loading

`activity-browser.bwutils.superstructure.graph_traversal.py`:

`activity-browser.bwutils.superstructure.graph_traversal_with_scenario.py`:

`activity-browser.ui.web.sankey_navigator.py`:

`activity-browser.bwutils.commontasks.py` and `activity-browser.static.css.sankey_navigator.css`:

coveralls commented Sep 13, 2022 •

edited

Loading

BenPortner commented Sep 13, 2022 •

edited

Loading

BenPortner commented Sep 15, 2022 •

edited

Loading

marc-vdm commented Dec 15, 2022 •

edited

Loading

BenPortner commented Jan 10, 2023 •

edited

Loading

BenPortner commented Mar 31, 2023 •

edited

Loading