Characterization matrix ... rabbit hole #97

transfluxus · 2024-06-19T09:58:25Z

I am integrating brightway into tool, for which I am copying different pieces of the MultiLCA code together.

That tool allows to specify a json file with activities, their functional units and methods and it will run a MultiLCA with those selected items. However, there are more features that we wanted to integrate, that's why I took the MultiLCA code and extended it so it allows:

use_distributions (as it is in the github repo, tho it's not in the pypi version yet)
allow to split the inventory matrix by columns and calculating the lcia (multiplication with cc_matrix) separately
allow arbitrary python function instead of constants for characterization

So I had to extend the MultiLCA quite a bit to be able to do all those things.

However, recently we ran some experiments and noticed that for large numbers of methods (e.g. 20 or more) the values are for some methods completely off. After a bit off digging, scripting logging I noticed that when I would calculate the hashes of the individual characterization matrices, they would be different than for a MultiLCA class that is in bw.

The error is NOT very consistent yet, so it's still very confusing for many methods so I needed to simplify it more.

So I start this thread with a simple question.

Trying to simplify the setup and understand what's going on, I created this small script, that
creates a lca , loads the characterization_matrix and compares it characterization factors of the same Method:

My question would be, shoudn't those values be more or less the same... because for me they are not :/

import json

import bw2data
from bw2calc import LCA
from bw2data.backends import Activity

PROJECT_NAME = ""
ECOINVENT_DB_NAME = ""

bw2data.projects.set_current(PROJECT_NAME)
db = bw2data.Database(ECOINVENT_DB_NAME)


def cc_matrix_check() -> None:
    code = db.random()["code"]
    act: Activity = db.get(code)
    methods = list(bw2data.methods)
    # shuffle(methods)
    for method in methods:
        lca = LCA({act: 1}, method)
        lca.lci()
        lca.load_lcia_data()
        cc_mm = lca.characterization_matrix.toarray()
        # columns
        cc_mm_s = cc_mm.sum(0)
        # all non zero values (sorted)
        cc_mm_uq0 = sorted([v for v in cc_mm_s if v != 0])
        # values of the Method data (sorted)
        m_char_data = sorted([cv for m, cv in bw2data.Method(method).load()])
        # put those two list together
        zipped = list(zip(cc_mm_uq0, m_char_data))
        print(method)
        print(json.dumps(zipped, indent=2))
        break

cmutel · 2024-07-02T08:39:32Z

copying different pieces of the MultiLCA code together

This feels like an anti-pattern to me - it would be much better to import the library and subclass the elements that you need to change instead. I have seen people do this in the past, including myself, and it has always led to regrets.

use_distributions (as it is in the github repo, tho it's not in the pypi version yet)

It's on pypi, just use pip --pre.

allow to split the inventory matrix by columns and calculating the lcia (multiplication with cc_matrix) separately

You can do this now, e.g. lca.characterization_matrix @ lca.inventory[:, some_column].

allow arbitrary python function instead of constants for characterization

Sure, but then don't pass an LCIA method, just run your function on lca.inventory.

I noticed that when I would calculate the hashes of the individual characterization matrices

I don't think I got this 100%, but there are a few things to be aware of:

The characterization matrix is not every value provided by the impact category - we only put in factors for biosphere flows which are used by the inventory. It's entirely possible for many defined characterization factors to not show up in the matrix.
The order of insertion into the characterization|biosphere|technosphere matrix is deterministic for a given project but not consistent across functional units (as they bring in different background databases). Never assume anything about indices, use lca.dicts to translate to and from database ids.

transfluxus · 2024-07-03T11:04:03Z

copying different pieces of the MultiLCA code together

This feels like an anti-pattern to me - it would be much better to import the library and subclass the elements that you need to change instead. I have seen people do this in the past, including myself, and it has always led to regrets.

I agree, in the version I was using, everything was happening in the MultiLCA constructor, so seemed a good approach to dissect the parts of it and move it into individual functions.

use_distributions (as it is in the github repo, tho it's not in the pypi version yet)

It's on pypi, just use pip --pre.

I will totally try this and refactor my code around that. However for a simple case it crashes at

m_lca = MultiLCA({'test': {24840: 1, 24438: 1}}, {'impact_categories': [
    ('ReCiPe 2016 v1.03, midpoint (E)', 'acidification: terrestrial', 'terrestrial acidification potential (TAP)')]}
                 , [])

m_lca.lci()

site-packages/matrix_utils/utils.py", line 43, in safe_concatenate_indices
    raise AllArraysEmpty
matrix_utils.errors.AllArraysEmpty

And I strongly assume, that it's because I am passing an empty array as data_objs variable to the MultiLCA constructor.
I am not sure what to provide otherwise in a simple scenario.

allow to split the inventory matrix by columns and calculating the lcia (multiplication with cc_matrix) separately

You can do this now, e.g. lca.characterization_matrix @ lca.inventory[:, some_column].

I will have a look, thanks!

allow arbitrary python function instead of constants for characterization

Sure, but then don't pass an LCIA method, just run your function on lca.inventory.

I would actually run it on inventory.sum(1), in my tests that worked, and I assume it's faster, since the functions are only called just once per row/(biosphere flow?) instead of for each biosphere * technosphere

I noticed that when I would calculate the hashes of the individual characterization matrices

I don't think I got this 100%, but there are a few things to be aware of:
* The characterization matrix is _not_ every value provided by the impact category - we only put in factors for biosphere flows which are _used by the inventory_. It's entirely possible for many defined characterization factors to not show up in the matrix.

That is interesting! :) makes a lot of sense, but I did not see that in the code.

* The order of insertion into the characterization|biosphere|technosphere matrix is deterministic for a given project but not consistent across functional units (as they bring in different background databases). Never assume anything about indices, use `lca.dicts` to translate to and from database ids.

Yes that is what I generally do, and I did it in a more rigidusros test, than the one above.

transfluxus · 2024-07-23T09:58:28Z

#100

transfluxus closed this as completed Jul 2, 2024

transfluxus reopened this Jul 3, 2024

transfluxus closed this as completed Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Characterization matrix ... rabbit hole #97

Characterization matrix ... rabbit hole #97

transfluxus commented Jun 19, 2024 •

edited

Loading

cmutel commented Jul 2, 2024

transfluxus commented Jul 3, 2024

transfluxus commented Jul 23, 2024

Characterization matrix ... rabbit hole #97

Characterization matrix ... rabbit hole #97

Comments

transfluxus commented Jun 19, 2024 • edited Loading

cmutel commented Jul 2, 2024

transfluxus commented Jul 3, 2024

transfluxus commented Jul 23, 2024

transfluxus commented Jun 19, 2024 •

edited

Loading