Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ecospold2matrix Data Flowchart #43

Open
michaelweinold opened this issue Jan 31, 2023 · 0 comments
Open

ecospold2matrix Data Flowchart #43

michaelweinold opened this issue Jan 31, 2023 · 0 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@michaelweinold
Copy link
Collaborator

michaelweinold commented Jan 31, 2023

Ecospold xml Ingestion

flowchart TD
Ain[IntermediateExchanges.xml] --> A["extract_products()"] --> Aout[products: pd.DataFrame]
Bin[ActivityIndex.xml <br> ActivityNames.xml] --> B["extract_activities()"] -->  Bout[activities: pd.DataFrame]
C["get_labels()"] --> Cextract1 --> Cout[PRO: pd.DataFrame <br> STR: pd.DataFrame]
Cin[ElementaryExchanges.xml <br> spold files] --> C --> Cextract2 --> Cout
Din[spold files] --> D["get_flows()"] --> Dextract["extract_flows()"] --> Dout[inflows: pd.DataFrame <br> outflows: pd.DataFrame <br> elementary_flows: pd.DataFrame]
Cextract1["build_PRO()"]
Cextract2["build_STR()"]
Loading

DataFrame Table

object type columns
(e2m)
columns correspondence
(bw) or Ecospold
comment
inflows pd.DataFrame fileId
sourceActivityId
productId
amount
row_index
.spold file name
activityLinkId
intermediateExchangeId
amount
???
extracted from .spold files
outflows pd.DataFrame fileId
productId
amount
productionVolume
outputGroup
.spold file name
intermediateExchangeId
amount
productionVolume
outputGroup
extracted from .spold files
elementary_flows pd.DataFrame fileId
elementaryExchangeId
amount
.spold file name
elementaryExchangeId
amount
extracted from .spold files
activities pd.DataFrame activityId
activityNameId
activityType
startDate
endDate
activityName
id
activityNameId
activityType
startDate
endDate
activityName
extracted from ActivityIndex.xml with activityName data merged from ActivityNames.xml
products pd.DataFrame productName
unitName
productId
unitId
cpc
properties
name
unitName
id
unitId
classification == 'cpc'
properties
extracted from IntermediateExchanges.xml
STR pd.DataFrame id
name
unit
cas
comp
subcomp
id
name
unitName
casNumber
compartment
subcompartment
extracted from ElementaryExchanges.xml
PRO pd.DataFrame 'activityId'
'productId'
'activityName'
'ISIC'
'price'
'priceUnit'
'EcoSpoldCategory'
'geography'
'technologyLevel'
'macroEconomicScenario'
properties_x
'productionVolume'
'productName'
'unitName'
'cpc'
properties_y
'activityNameId'
'activityType'
'startDate'
'endDate'
'activityName_duplicate'
'id'
'productId'
'activityName'
'ISIC'
'price'
'priceUnit'
'EcoSpoldCategory'
'geography'
'technologyLevel'
'macroEconomicScenario'
properties_x
'productionVolume'
'productName'
'unitName'
'cpc'
properties_y
'activityNameId'
'activityType'
'startDate'
'endDate'
'activityName_duplicate'
extracted from .spold files

Preparation and Cleanup

flowchart TD
in[activities: pd.DataFrame <br> products: pd.DataFrame] --> F["complement_labels()"] --> out[PRO: pd.DataFrame <br> STR: pd.DataFrame]
Loading

DataFrame Table

object type columns
(e2m)
columns correspondence
(bw) or Ecospold
comment
PRO pd.DataFrame all prev. columns
'productionVolume'
all cols. from products
all cols. from activities
all prev. columns
'productionVolume'
all cols. from products
all cols. from activities
for merge keys, see below

Join Table

left right left_key right_key added cols.
PRO outflows index = 'abc' index = 'abc' 'productionVolume'
PRO products index = 'abc' index = 'abc' all except potential duplicates
PRO activities index = 'abc' index = 'abc' all except potential duplicates

DataFrame Construction (change heading)

flowchart TD
in[inflows: pd.DataFrame <br> elementary_flows: pd.DataFrame <br> outflows: pd.DataFrame] --> F["build_AF()"] --> out[A: pd.DataFrame <br> F: pd.DataFrame]
Loading

DataFrame Table

Pivot Table

output input index columns values output index output cols.
A inflows 'row_index' = 'fileId' + 'productId' 'fileId' 'amount' PRO.index = 'abc' PRO.index = 'abc'
F elementary_flows 'elementaryExchangeId' 'fileId' 'amount' STR.index = 'abc' PRO.index = 'abc'

Characterization

flowchart TD
in["LCIA Implementation v3.8.xlsx"] --> F1["if-else"]  --> F2["simple_characterisation_matching()"] --> out[A: pd.DataFrame <br> C: pd.DataFrame]
Loading

Pivot Table

output input index columns values output index output cols.
C C_long 'impact_label' 'stressorId' 'CF' N/A N/A
@michaelweinold michaelweinold added the documentation Improvements or additions to documentation label Jan 31, 2023
@michaelweinold michaelweinold self-assigned this Jan 31, 2023
@michaelweinold michaelweinold changed the title Flowchart ecospold2matrix Data Flowchart Feb 1, 2023
@michaelweinold michaelweinold pinned this issue Feb 1, 2023
@michaelweinold michaelweinold unpinned this issue Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant