Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved Sankey Diagram #844

Closed
wants to merge 37 commits into from

Conversation

BenPortner
Copy link
Contributor

@BenPortner BenPortner commented Sep 13, 2022

Description

This PR improves the existing sankey diagram feature by two points:

  • Sankey diagrams now include biosphere nodes. Currently, one has to switch between the "Sankey" and "EF contributions" tab to see, which EFs cause impacts. Furthermore, one has to "guess" how the EF impacts are distributed across the activties. In the improved version, biosphere flows are drawn just like technosphere flows, allowing to see exactly where impacts come from.
  • Positive and negative flows from the same activity no longer cancel. Currently, a node, which produces +x units in one instance and -x units in another instance does not appear in the Sankey diagram because its total impact is zero. In the new version, both the node and the corresponding flows are drawn, allowing better insight into systems, which use the substitution method to account for by-products.

Demonstration

current:
image

improved:
image

Implementation Details

The central change is the introduction of a new module activity-browser.bwutils.superstructure.graph_traversal, which replaces bw2calc.graph_traversal. Changes in the other files are minor.

activity-browser.bwutils.superstructure.graph_traversal.py:

Basically a complete rewrite of bw2calc.graph_traversal version 1.8.1. The rewrite is based on the legacy version because bw 2.5 is not yet stable (at least I couldn't get it to run). I will push to include the new features in brightway so that the activity-browser does not have to maintain a separate version. Important changes to GraphTraversal.calculate:

  • function signature: introduced additional max_depth control parameter
  • return values: the number of LCA calculations is no longer counted (counter is None)
  • return values: nodes keys are now actual activity keys instead of LCA matrix indices. This change was necessary because it is not clear from the matrix indices whether the activity is from the biosphere or technosphere, making it impossible to fetch metadata like name, location...
  • return values: for the same reason, from and to fields in edges are now activity keys
activity-browser.bwutils.superstructure.graph_traversal_with_scenario.py:

Uses new graph_traversal.py.

activity-browser.ui.web.sankey_navigator.py:

Adapted to account for new function signature and changed keys (reverse dictionaries no longer necessary). For biosphere nodes categories is used instead of location.

activity-browser.bwutils.commontasks.py and activity-browser.static.css.sankey_navigator.css:

Adapted to add a green frame to biosphere nodes.

@coveralls
Copy link

coveralls commented Sep 13, 2022

Coverage Status

Coverage decreased (-0.4%) to 54.037% when pulling 5986596 on BenPortner:better-sankey into 44ceac1 on LCA-ActivityBrowser:master.

@marc-vdm
Copy link
Member

@BenPortner Thanks a lot for this amazing PR! I'm really happy to see this contribution.

As you also mention, AB doesn't currently support BW2.5, we're looking at it behind the screens but it's mostly on @cmutel's side at the moment.

Personally, I think we should not merge this in the current form for 2 reasons:

  1. I think it's not wise to use a different version of graph traversal, even temporarily, but rather merge the graph traversal upstream and then implement it in AB.
  2. Less important/small detail: It looks like you write it for superstructure only. We have two implementations, and it looks like it's currently only written for scenarios.

@cmutel and @bsteubing Please have a look, this is very nice work!

@cmutel
Copy link
Contributor

cmutel commented Sep 13, 2022

This is impressive and useful work. Here are the parts that I would hope could be addressed before it is merged:

  1. I cannot endorse adding more functionality to the AB which overlaps BW functionality, but is different. We already have too much of this, and it makes life difficult. The proposed changes can be added to the bw2legacy branch of bw2calc.
  2. The new GraphTraversal doesn't use the heap, and instead substitutes a breadth-first search with pruning. In my testing, the important-first search is always preferable; if we should switch, then this choice needs to be justified.
  3. Using activity keys is in direct contradiction to packages A and B of the BW strategic plan. bw2calc should be completely independent of whatever database is used, and should be usable in cloud installations where there are only datapackages. An alternative is to have two lists of edges (B/T), and to add a function to the 2.0 branch of bw2data to rehydrate metadata from matrix indices.
  4. I don't understand the justification for limiting to demand dicts with only one product.

@bsteubing
Copy link
Member

Hi @BenPortner , thanks a lot for these excellent ideas. I would very much like to have these features. However, let's first clarify a few content and implementation things.

Here some ideas for further improvement:

  • can the biosphere flows be visualized as going to "circles" instead of "rectangles", which are reserved for economic processes? I think that would visually help.
  • I think we need an option to switch "on/off" the biosphere flows as we might not want to see this in every case
  • have you tested what this will look like with real-world ecoinvent? I imagine that we have 1000s of processes emitting "CO2".
  • not sure if this idea is good, but "CO2", e.g., has several compartments and we may just be interested in "CO2" overall...
  • thinking further, if it becomes a big mess in real-world systems, an alternative implementation could be to "skip" the nodes of the biosphere flows and just have small horizontal arrows leaving the respective processes that indicate the size and name of the biosphere substance
  • the flow you show above seems to say that CO2 is "going into" process A, but I think this should be the other way around
  • question: IF a process shows up in the Sankey, does this mean that ALL of its biosphere flows (contributing to a specific impact category) will be shown or ONLY THOSE above the cut-off value. We need to be clear on this and we should test this for processes and impact categories where a single process has MANY biosphere flows. Take for example the plastics datasets (from plasticseurope), which are aggregated (system) processes. Here the process basically represents the life cycle inventory and that means that it will probably have 10s, 100s, or even more biosphere flows relevant to specific impact categories

Concerning implementation:

  • I tend to agree with @marc-vdm that this should directly be implemented in bw...
  • however, I recently learned that bw 2.5 may introduce things that break the graph traversal and I think before we take a decision, I'd like to hear what @cmutel has to say here
  • if implementing this in bw is not a good idea for some reason, then I think we can temporarily host it in the AB

@BenPortner
Copy link
Contributor Author

BenPortner commented Sep 13, 2022

@marc-vdm

I think it's not wise to use a different version of graph traversal, even temporarily, but rather merge the graph traversal upstream and then implement it in AB.

I agree that this is the preferred option. However, this would require building a new conda package for bw2calc, which is not something I can do.

Less important/small detail: It looks like you write it for superstructure only. We have two implementations, and it looks like it's currently only written for scenarios.

I am afraid I do not understand this one. I tested this without scenarios, only.

@cmutel

I cannot endorse adding more functionality to the AB which overlaps BW functionality, but is different. We already have too much of this, and it makes life difficult. The proposed changes can be added to the bw2legacy branch of bw2calc.

See my answer to marc.

The new GraphTraversal doesn't use the heap, and instead substitutes a breadth-first search with pruning. In my testing, the important-first search is always preferable; if we should switch, then this choice needs to be justified.

The new approach is depth-first. I have trouble seeing how importance first is preferable. The number of calculations should be the same (importance-first calculates the LCA score for all inputs to set an order, depth-first calculates the LCA score for all inputs to decide whether to go deep).

bw2calc should be completely independent of whatever database is used,

The current implementation treats the biosphere differently from the technosphere. The improved implementation treats them equally. Hence, I would argue that the improved implementation is actually helping to achieve this goal ("treating all databases the same").

should be usable in cloud installations where there are only datapackages.

I don't see how the improved implementation prevents this.

An alternative is to have two lists of edges (B/T), and to add a function to the 2.0 branch of bw2data to rehydrate metadata from matrix indices.

As long as the biosphere and the technosphere are separate matrices, it is not possible to restore metadata from the matrix indices. How is the algorithm supposed to know if 1 means the first entry in the biosphere matrix or the technosphere matrix?

I don't understand the justification for limiting to demand dicts with only one product.

Single-product demands are what I could test with the AB.

@bsteubing

  • can the biosphere flows be visualized as going to "circles" instead of "rectangles", which are reserved for economic processes? I think that would visually help.
  • I think we need an option to switch "on/off" the biosphere flows as we might not want to see this in every case

This should be easy to implement.

have you tested what this will look like with real-world ecoinvent? I imagine that we have 1000s of processes emitting "CO2".

Then these 1000s of processes will link to CO2. The real question is: Who wants to have 1000s of processes in their sankey diagram in the first place? My implementation makes it easier to limit the amount of processes by introducing a depth limitation. With a reasonable number of processes, I am not worried about clearness.

not sure if this idea is good, but "CO2", e.g., has several compartments and we may just be interested in "CO2" overall...

This might be true for CO2, which has the same CF for all categories. But users might be interested in different categories for other impact methods (e.g. toxicity).

thinking further, if it becomes a big mess in real-world systems, an alternative implementation could be to "skip" the nodes of the biosphere flows and just have small horizontal arrows leaving the respective processes that indicate the size and name of the biosphere substance

The way to implement this is to keep the nodes and make the size very small. Should be easy to implement if desired.

the flow you show above seems to say that CO2 is "going into" process A, but I think this should be the other way around

This is how brightway currently works. Likewise, electricity is shown as "going into" process B, even though it really leaves as a by-product. If we are to start switching directions for biosphere flows, we should do it for technosphere by-products, too.

question: IF a process shows up in the Sankey, does this mean that ALL of its biosphere flows (contributing to a specific impact category) will be shown or ONLY THOSE above the cut-off value

Only those above the cut-off.

we should test this for processes and impact categories where a single process has MANY biosphere flows. Take for example the plastics datasets (from plasticseurope), which are aggregated (system) processes. Here the process basically represents the life cycle inventory and that means that it will probably have 10s, 100s, or even more biosphere flows relevant to specific impact categories.

There might be a lot of edges in cases where three conditions are fulfilled:
a) a node in the diagram is an aggregated dataset and
b) many of the biosphere flows of this node are above the cut-off and
b) the chosen impact method has lots of CFs (e.g. ecotoxicity)

In my opinion, all three conditions coming together is rather rare. If this proves to be false, one could
i) implement a limit to the number of biosphere flows shown or
ii) aggregate the nodes according to some logic

Concerning implementation:

  • I tend to agree with @marc-vdm that this should directly be implemented in bw...
  • however, I recently learned that bw 2.5 may introduce things that break the graph traversal and I think before we take a decision, I'd like to hear what @cmutel has to say here
  • if implementing this in bw is not a good idea for some reason, then I think we can temporarily host it in the AB

I would say this depends on Chris' willingness to further develop (and package) legacy versions of bw.

@marc-vdm
Copy link
Member

Less important/small detail: It looks like you write it for superstructure only. We have two implementations, and it looks like it's currently only written for scenarios.

I am afraid I do not understand this one. I tested this without scenarios, only.

I had a look at the code, things are in places that they shouldn't be, but that's not your fault, it was already like this. We'll need to move some things around I think. Anyway, that's not a problem for now.

not sure if this idea is good, but "CO2", e.g., has several compartments and we may just be interested in "CO2" overall...

This might be true for CO2, which has the same CF for all categories. But users might be interested in different categories for other impact methods (e.g. toxicity).

Agree, this may be problematic. IMO better to keep this as individual flows. This would be the same as grouping all processes with product name electricity, high voltage -good for process contributions, not for Sankey-.

the flow you show above seems to say that CO2 is "going into" process A, but I think this should be the other way around

This is how brightway currently works. Likewise, electricity is shown as "going into" process B, even though it really leaves as a by-product. If we are to start switching directions for biosphere flows, we should do it for technosphere by-products, too.

Could you clarify on your example of electricity and process B?

From how I understood the sankey process B just has 2 inputs for electricity, one positive and one negative of the same amount. If my understanding is correct, the flows should both still point from the electricity to B.
With splitting pos/neg flows, we get potentially 4 edges between nodes, the arrow should indicate the direction of the edge, the color should indicate the sign (positive/negative).

As for applying this to biosphere: I think we should think about this more. The convention in LCA is that both resource extraction (coal mined) and emissions (CO2 emitted) are represented as positive numbers. -Personally I think this convention is wrong as for the technosphere, inputs are negative and outputs are positive, IMO biosphere should follow that convention too, but I don't have the power to change that.-

I don't see an immediate way to do this automatically unless we had a list of all biosphere flows that were classed on their direction (though we could perhaps infer them based on the categories column.

We could fix this by simply not having arrows in the biosphere edges?
Or alternatively: we could fix this partly by having a checkbox for users with flip biosphere flows, to use at their own discretion?

we should test this for processes and impact categories where a single process has MANY biosphere flows. Take for example the plastics datasets (from plasticseurope), which are aggregated (system) processes. Here the process basically represents the life cycle inventory and that means that it will probably have 10s, 100s, or even more biosphere flows relevant to specific impact categories.

There might be a lot of edges in cases where three conditions are fulfilled: a) a node in the diagram is an aggregated dataset and b) many of the biosphere flows of this node are above the cut-off and b) the chosen impact method has lots of CFs (e.g. ecotoxicity)

In my opinion, all three conditions coming together is rather rare. If this proves to be false, one could i) implement a limit to the number of biosphere flows shown or ii) aggregate the nodes according to some logic

I agree, I don't see this as a problem. Users can currently also make unwieldy graphs by misusing the cutoff/depth options -e.g. with electricity markets with many process inputs-, this should just be user discretion on choosing a proper cutoff/depth. Though we may look into this and refine the automatic cutoff/depth values we provide for starters.

@cmutel
Copy link
Contributor

cmutel commented Sep 14, 2022

A couple points of clarification:

The new approach is depth-first.

Indeed, sorry for my misunderstanding.

I have trouble seeing how importance first is preferable. The number of calculations should be the same (importance-first calculates the LCA score for all inputs to set an order, depth-first calculates the LCA score for all inputs to decide whether to go deep).

They will produce the same results if max_calc is infinite; otherwise, importance-first is strictly better, as it will produce the most information. Depth first can exhaust the number of calculations permitted in one branch of the supply chain before even seeing other branches.

should be usable in cloud installations where there are only datapackages.

I don't see how the improved implementation prevents this.

Yes, you are right. Sorry.

An alternative is to have two lists of edges (B/T), and to add a function to the 2.0 branch of bw2data to rehydrate metadata from matrix indices.

As long as the biosphere and the technosphere are separate matrices, it is not possible to restore metadata from the matrix indices. How is the algorithm supposed to know if 1 means the first entry in the biosphere matrix or the technosphere matrix?

Well, if you are using a list called biosphere_edges you might have a hint :)

@BenPortner
Copy link
Contributor Author

Thank you for your input!

@marc-vdm

Less important/small detail: It looks like you write it for superstructure only. We have two implementations, and it looks like it's currently only written for scenarios.

I am afraid I do not understand this one. I tested this without scenarios, only.

I had a look at the code, things are in places that they shouldn't be, but that's not your fault, it was already like this. We'll need to move some things around I think. Anyway, that's not a problem for now.

Please feel free to move things around as necessary. Sorry if this causes extra work.

the flow you show above seems to say that CO2 is "going into" process A, but I think this should be the other way around

This is how brightway currently works. Likewise, electricity is shown as "going into" process B, even though it really leaves as a by-product. If we are to start switching directions for biosphere flows, we should do it for technosphere by-products, too.

Could you clarify on your example of electricity and process B?

From how I understood the sankey process B just has 2 inputs for electricity, one positive and one negative of the same amount. If my understanding is correct, the flows should both still point from the electricity to B. With splitting pos/neg flows, we get potentially 4 edges between nodes, the arrow should indicate the direction of the edge, the color should indicate the sign (positive/negative).

B has a negative electricity input = positive electricity output (by-product). A has a (positive) electricity input. My point is that the arrow direction and the sign of the flow amount express the same thing. Currently, AB keeps the flow directions true to the model and indicates negative flow amounts° by changing the color of the arrow to green. Alternatively, one could flip all green arrows, ending up with positive flow amounts only. I think that both approaches are fine, but they shouldn't be mixed. Therefore, if we start switching arrows for biosphere flows, we should do the same for by-products and materials for treatment.

°Actually the color indicates negative impacts, but since most processes have positive impacts, a negative impact usually indicates a reversed direction of flow.

As for applying this to biosphere: I think we should think about this more. The convention in LCA is that both resource extraction (coal mined) and emissions (CO2 emitted) are represented as positive numbers. -Personally I think this convention is wrong as for the technosphere, inputs are negative and outputs are positive, IMO biosphere should follow that convention too, but I don't have the power to change that.-

I don't see an immediate way to do this automatically unless we had a list of all biosphere flows that were classed on their direction (though we could perhaps infer them based on the categories column.

I agree that this is rather unintuitive. The way of thinking here is as follows: An emission of CO2 is treated as a damage to the environment, as is the extraction of a resource. Hence, both have a positive sign (indicating environmental damage rather than benefit). The sign is not about the flow of mass but about the flow of damages. The same goes for the CO2 flow in the demonstration example: Although CO2 leaves from A, a damage is allocated to A.

We could fix this by simply not having arrows in the biosphere edges? Or alternatively: we could fix this partly by having a checkbox for users with flip biosphere flows, to use at their own discretion?

Doable if desired. But I don't feel there is a necessity. What do the others think?

@cmutel

I have trouble seeing how importance first is preferable. The number of calculations should be the same (importance-first calculates the LCA score for all inputs to set an order, depth-first calculates the LCA score for all inputs to decide whether to go deep).

They will produce the same results if max_calc is infinite; otherwise, importance-first is strictly better, as it will produce the most information. Depth first can exhaust the number of calculations permitted in one branch of the supply chain before even seeing other branches.

Makes sense. This should be changed.

An alternative is to have two lists of edges (B/T), and to add a function to the 2.0 branch of bw2data to rehydrate metadata from matrix indices.

As long as the biosphere and the technosphere are separate matrices, it is not possible to restore metadata from the matrix indices. How is the algorithm supposed to know if 1 means the first entry in the biosphere matrix or the technosphere matrix?

Well, if you are using a list called biosphere_edges you might have a hint :)

Haha, true. My mistake. But do you still see a necessity for keeping biosphere and technosphere edges separate? I don't think it should cause incompatibilities with the bw agenda.

@BenPortner
Copy link
Contributor Author

I tried changing the shape of the biosphere nodes to ellipses and it turned out more complicated than expected. Unfortunately, there are problems with overflowing labels. I tried to fix it by centering the labels, but the result is still not very pleasing. Judge for yourselves:

image

@marc-vdm
Copy link
Member

Personally, I don't think the current labeling is problematic this way.

But in the top left, why is the node Carbon dioxide, fossil, air, urban close to ground 0% and it's flow Carbon dioxide, fossil 60%? That doesn't sound right to me.

@BenPortner
Copy link
Contributor Author

BenPortner commented Sep 15, 2022

@marc-vdm
Just like the technosphere nodes, the color of the biosphere nodes shows the total impact that the corresponding activity adds to the functional unit (i.e. not just the impact of one edge but the sum of all edges, also the ones below the cutoff). In this particular case, the total impact caused by Carbon dioxide, fossil, air, urban close to ground is zero, because electricity is consumed by A but the same amount of electricity is produced by B , thereby cancelling the consumption of electricity and the carbon emission. As a matter of fact, the edge list correctly reflects this by having a negative and a positive edge between Carbon dioxide... and electricity production...:

edges = [
...
{'to': ('apos371', 'e53ed2834a03c816b6e233e0dc7e1dfa'), 'from': ('biosphere3', 'f9749677-9c9f-4678-ab55-c607dfdc2cb9'), 'amount': -0.6000000238418579, 'exc_amount': 0.6000000238418579, 'impact': -0.6000000238418579, 'from_type': 'biosphere'}
...
{'to': ('apos371', 'e53ed2834a03c816b6e233e0dc7e1dfa'), 'from': ('biosphere3', 'f9749677-9c9f-4678-ab55-c607dfdc2cb9'), 'amount': 0.6000000238418579, 'exc_amount': 0.6000000238418579, 'impact': 0.6000000238418579, 'from_type': 'biosphere'}
]

However, for some reason the negative edge is not drawn by d3-dagre. I need to look into this.

Edit: fixed.

image

Benjamin W. Portner added 2 commits September 15, 2022 14:21
…each pair of nodes: one with negative impact and one with positive impact.
@BenPortner
Copy link
Contributor Author

Update @marc-vdm @cmutel @bsteubing:

  • I did another major rewrite of graph_traversal.py. There are now classes for nodes, edges and their respective lists. This makes the code cleaner and self-documenting.
  • biosphere nodes and technosphere nodes are now stored in separate lists
  • added function for importance first traversal and made it the default
  • added switch use_keys: True means activity keys will be used in node list and edge list, False means matrix indices will be used (note: the latter option is not compatible with ABs downstream processing)
  • added switch include_biosphere: whether to include biosphere nodes in the node list (and hence sankey diagram) or not
  • improved speed: Under comparable conditions (excluding biosphere nodes and keys), the new algorithm is slightly faster than the bw2calc version:
from bw2data import projects, Database
from activity_browser.bwutils.superstructure.graph_traversal import GraphTraversal as newGT
from bw2calc import GraphTraversal as oldGT
import time

projects.set_current("ab")
method = ("ILCD 2.0 2018 midpoint", "climate change", "climate change total")
db = Database("apos371")
acts = [db.random() for _ in range(10)]

# new
start = time.time()
res_new = [newGT(include_biosphere=False, use_keys=False).calculate(demand={act:1}, method=method, cutoff=0.05, max_calc=250) for act in acts]
end = time.time()
print(f"New GraphTraversal needed {end-start:.1f} s for 10 activities.")
# New GraphTraversal needed 39.9 s for 10 activities.

# old
start = time.time()
res_old = [oldGT().calculate(demand={act:1}, method=method, cutoff=0.05, max_calc=250) for act in acts]
end= time.time()
print(f"Old GraphTraversal needed {end-start:.1f} s for 10 activities.")
# Old GraphTraversal needed 42.5 s for 10 activities.

@cmutel
Copy link
Contributor

cmutel commented Sep 17, 2022

@BenPortner Nice.

I think we need a new library, bw_graph_traversal, which should include your work and the multifunctional traversal class. If you decide to create this, I will make PRs/issues with some small changes. It would be a pity to have this hidden away in the AB code.

I would made the node and edge classes dataclasses - they are exactly the use case for this functionality.

There is no reason to limit this to one functional unit, you can just iterate over the keys here and add multiple nodes. The traverse functions don't need a starting node, just take the top one on the heap.

I changed the name of this class to AssumedDiagonalGraphTraversal, as it assumes values exist on the diagonal and are special. This is not generally the case, see above multifunctional merged PR.

@BenPortner
Copy link
Contributor Author

@cmutel

I think we need a new library, bw_graph_traversal

I don't see any advantage in creating yet another library. On the contrary, I think it has a lot of disadvantages for developers (e.g. more dependencies to manage manually) and users (e.g. unclarity where to post issues). I have described these before here. Due to the reasons stated there I am opposed to a new library.

I would made the node and edge classes dataclasses - they are exactly the use case for this functionality.

Done.

There is no reason to limit this to one functional unit, you can just iterate over the keys here and add multiple nodes.

The improved version now supports multiple functional units.

The traverse functions don't need a starting node, just take the top one on the heap.

It needs a starting node to keep compatible with the traverse_depth_first function syntax.

I changed the name of this class to AssumedDiagonalGraphTraversal, as it assumes values exist on the diagonal and are special. This is not generally the case, see above multifunctional merged PR.

I personally feel like AssumedDiagonalGraphTraversal is a rather irritating name. It sounds like the traversal happens along the diagonal of a matrix, rather than describing the implicit assumption this graph traversal class assumes that diagonal values exist and are special . Perhaps it would be better to have a boolean parameter has_diagonal that changes the behavior of GraphTraversal? What do the others think?

@marc-vdm
Copy link
Member

marc-vdm commented Dec 15, 2022

@BenPortner I was using this branch to actually use this and here are some further thoughts:

This figure shows the same system in current Sankey and this branch implementations.
image

First thing I notice is that there is 1) much more information for user to interpret when EFs are included, 2) the focus is now on (cumulative) contributions from EFs, where the focus was on (cumulative) contributions from processes. Both are useful perspectives, but we lose the process focus when using this branch. I think there are 2 possible solutions for this: 1) we keep both, allowing user to choose which implementation to use (perhaps with checkbox 'include environmental intervention flows') 2) we add back in the shading (and % direct impact) for processes. The latter would allow user to see which processes still contribute a lot, which -imo- is harder in the new branch (comparing shading vs thickness of arrow input).

Now, when I include some loops in the system, we again run this system in both branches, we run into some issues:
image
Note here, that the reference flow is still the hot-rolling process, though Dagre gets confused and completely changes the order, but considering that Dagre is dead, there's not much we can do about that.

  1. Most important problem: The total impact at hot-rolling is 200% of normal LCA results. This is the case in both implementations of Sankey. The impact calculated by normal LCA calculations in BW seem to be correct, I calculated everything in excel too, I get the same result as the normal BW results. For some reason sankey results are wrong.
  2. Next, there are differences between flow results between the branches too: if we for example consider the flow of raw steel from steel smelting (bottom of both graphs), the impact are 171% and 107%, a difference that I don't understand. In the original implementation, we get a 'direct' impact of 31%, where in this branch, we get an impact of 19.6% (this process has no methane emissions). I must say I'm rather confused to the differences, and I'm not sure which of the results is correct, though the cumulative impact of the process iron ore mining is the same as the flow iron ore to steel smelting in the original implementation, where it is not in the new branch. Though, as the results are not the same as we get from normal LCA, I don't know which would be correct.

edit: add files so others can also try this with the same system
example.xlsx
example loop.xlsx

- fix typo in docstring
@BenPortner
Copy link
Contributor Author

@marc-vdm Thanks for the testing effort and the extensive report! I looked at the issue and this is what I found:

First thing I notice is that there is 1) much more information for user to interpret when EFs are included, 2) the focus is now on (cumulative) contributions from EFs, where the focus was on (cumulative) contributions from processes. Both are useful perspectives, but we lose the process focus when using this branch. I think there are 2 possible solutions for this: 1) we keep both, allowing user to choose which implementation to use (perhaps with checkbox 'include environmental intervention flows') 2) we add back in the shading (and % direct impact) for processes. The latter would allow user to see which processes still contribute a lot, which -imo- is harder in the new branch (comparing shading vs thickness of arrow input).

I don't have a preference for either 1) or 2). I will let you and your team decide what to do.

Most important problem: The total impact at hot-rolling is 200% of normal LCA results. This is the case in both implementations of Sankey. The impact calculated by normal LCA calculations in BW seem to be correct, I calculated everything in excel too, I get the same result as the normal BW results. For some reason sankey results are wrong.

As a matter of fact, both numbers are right! The GWP 100 of 1 kg of hot-rolled steel in your model is 7.1 kg CO2e. However, to produce 1 kg of hot-rolled steel, you need another kg of hot-rolled steel in upstream processes. The cumulative impact, which is shown in the tooltip of the corresponding node, accounts for this additional demand, i.e. it relates to 2 kg of hot-rolled steel instead of 1. That is why the cumulative impact is double the unit impact.

Next, there are differences between flow results between the branches too: if we for example consider the flow of raw steel from steel smelting (bottom of both graphs), the impact are 171% and 107%, a difference that I don't understand. In the original implementation, we get a 'direct' impact of 31%, where in this branch, we get an impact of 19.6% (this process has no methane emissions). I must say I'm rather confused to the differences, and I'm not sure which of the results is correct, though the cumulative impact of the process iron ore mining is the same as the flow iron ore to steel smelting in the original implementation, where it is not in the new branch. Though, as the results are not the same as we get from normal LCA, I don't know which would be correct.

This is a good catch! In fact, the new version produces the wrong result here. The old version is correct. The error occurs because the new version will traverse the circular supply chain until the cutoff criterion is reached. As a consequence, part of the demand of raw steel is being cut off. In fact, if you decrease the cutoff criterion you will see that the amount of raw steel increases. The old graph traversal does not repeatedly traverse the same edges. Instead, it uses the life cycle inventory to calculate the correct flow amount. However, because it uses the LCI amount, it cannot differentiate between positive and negative contributions of the same activity. This differentiation is the major goal of the new version, hence it cannot use the same approach. For now, I have no solution how to fix the new version. Maybe someone else has an idea?

@marc-vdm
Copy link
Member

@BenPortner Thanks for the response! You have some interesting points, would you be available for a chat this week or the next?

The point you make about the 7 vs 14kg both being right is not entirely correct I think, without the loop the impact is 3.5, with the loop the impact becomes 7 (you're right that this should double, just not again to 14). Though I think it'd be good for me to also discuss the last paragraph of your comment, I still need to learn more about this graph traversal it seems.

@BenPortner
Copy link
Contributor Author

BenPortner commented Jan 10, 2023

@marc-vdm

You have some interesting points, would you be available for a chat this week or the next?

Can you send me an email?

The point you make about the 7 vs 14kg both being right is not entirely correct I think, without the loop the impact is 3.5, with the loop the impact becomes 7 (you're right that this should double, just not again to 14).

I'm not yet convinced that the numbers are wrong. Without loops the impact of 1 kg hot-rolled steel is 3.5 kg CO2e. With loops the impact of 1 kg of hot-rolled steel becomes 7.0 kg CO2e (see LCIA result). Consequently, the impact of 2 kg of hot-rolled steel is 14.0 kg of CO2e. This is exactly what the cumulative impact shows (because the life cycle inventory amount is 2 kg).

Benjamin W. Portner added 2 commits January 10, 2023 11:41
# Conflicts:
#	activity_browser/static/javascript/navigator.js (reformat)
@BenPortner
Copy link
Contributor Author

BenPortner commented Mar 31, 2023

Hi @marc-vdm,

Just letting you know that I won't have time to finish this, unfortunately. I think the flaw you found is irreparable. It won't be possible to have a Sankey diagram that shows negative and positive flows while at the same time handling circular references correctly. Nevertheless, I believe the biosphere node visualization is a valuable feature. I create a branch here, which keeps this but reset the calculation to the old way. With this, the error you found should be fixed. Unfortunately, I did not have time to test it thoroughly but I hope you will be able to work from there. Let me know how it goes.

Sorry that I cannot help you more!
Ben

@marc-vdm
Copy link
Member

Closing as stale, we will re-open this once we have the capacity to actually implement this.

@marc-vdm marc-vdm closed this Sep 22, 2023
@cmutel cmutel mentioned this pull request Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants