Nodes with same output dataset for Partitioned Scenarios #3447

mehrzadai · 2023-12-20T08:02:10Z

I faced an issue that may be solved in the future or have any solution available that I don't know.
I have a scenario in which I have different categories of big data e.g. rates, sales, views, and reviews and I want to join them together.
I don't want to have different datasets for each in my catalog, instead, I want to save each as one partition, something like this :

concat:
   type : Partitioned
node(views -> concat) , node(rates -> concat) , ...

In this way, I can use connectivity and lazy save/load in the same time.
But currently, the rule is :
kedro.pipeline.pipeline.OutputNotUniqueError: Output(s) ['concat'] are returned by more than one nodes. Node outputsmust be unique.
I can save my partitions like :

rates:
   type : CSVDataset
views:
   type : CSVDatset
 ...

and load the partitioned dataset in another node, but in this way, I will lose the connectivity of my nodes.
I guess this rule is better to be changed for partitioned datasets to be able to save each partition in a different node.

The text was updated successfully, but these errors were encountered:

astrojuanlu · 2024-01-10T11:26:44Z

Hi @mehrzadai, thanks for opening this issue and sorry for the delay.

On first inspection your use case makes sense, but it might be problematic for us to introduce a special case for partitioned datasets to allow different nodes to write to a different partition of the same dataset. We'll have a look at this soon.

astrojuanlu · 2024-12-02T14:36:01Z

I'm moving this to a discussion for now, let's continue the conversation there.

mehrzadai added the Issue: Feature Request New feature or improvement to existing feature label Dec 20, 2023

astrojuanlu added this to Kedro Framework Jan 10, 2024

astrojuanlu added the Community Issue/PR opened by the open-source community label Jan 10, 2024

merelcht added this to the Something about Incremental- and PartitionedDataset milestone Mar 14, 2024

merelcht removed the Community Issue/PR opened by the open-source community label May 24, 2024

kedro-org locked and limited conversation to collaborators Dec 2, 2024

astrojuanlu converted this issue into discussion #4360 Dec 2, 2024

github-project-automation bot moved this to Done in Kedro Framework Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Nodes with same output dataset for Partitioned Scenarios #3447

Nodes with same output dataset for Partitioned Scenarios #3447

mehrzadai commented Dec 20, 2023 •

edited

Loading

astrojuanlu commented Jan 10, 2024

astrojuanlu commented Dec 2, 2024

This issue was moved to a discussion.

This issue was moved to a discussion.

Nodes with same output dataset for Partitioned Scenarios #3447

Nodes with same output dataset for Partitioned Scenarios #3447

Comments

mehrzadai commented Dec 20, 2023 • edited Loading

astrojuanlu commented Jan 10, 2024

astrojuanlu commented Dec 2, 2024

This issue was moved to a discussion.

mehrzadai commented Dec 20, 2023 •

edited

Loading