Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodes with same output dataset for Partitioned Scenarios #3447

Closed
mehrzadai opened this issue Dec 20, 2023 · 2 comments
Closed

Nodes with same output dataset for Partitioned Scenarios #3447

mehrzadai opened this issue Dec 20, 2023 · 2 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@mehrzadai
Copy link

mehrzadai commented Dec 20, 2023

I faced an issue that may be solved in the future or have any solution available that I don't know.
I have a scenario in which I have different categories of big data e.g. rates, sales, views, and reviews and I want to join them together.
I don't want to have different datasets for each in my catalog, instead, I want to save each as one partition, something like this :

concat:
   type : Partitioned
node(views -> concat) , node(rates -> concat) , ...

In this way, I can use connectivity and lazy save/load in the same time.
But currently, the rule is :
kedro.pipeline.pipeline.OutputNotUniqueError: Output(s) ['concat'] are returned by more than one nodes. Node outputsmust be unique.
I can save my partitions like :

rates:
   type : CSVDataset
views:
   type : CSVDatset
 ...

and load the partitioned dataset in another node, but in this way, I will lose the connectivity of my nodes.
I guess this rule is better to be changed for partitioned datasets to be able to save each partition in a different node.

@mehrzadai mehrzadai added the Issue: Feature Request New feature or improvement to existing feature label Dec 20, 2023
@astrojuanlu
Copy link
Member

Hi @mehrzadai, thanks for opening this issue and sorry for the delay.

On first inspection your use case makes sense, but it might be problematic for us to introduce a special case for partitioned datasets to allow different nodes to write to a different partition of the same dataset. We'll have a look at this soon.

@astrojuanlu astrojuanlu added the Community Issue/PR opened by the open-source community label Jan 10, 2024
@merelcht merelcht removed the Community Issue/PR opened by the open-source community label May 24, 2024
@astrojuanlu
Copy link
Member

I'm moving this to a discussion for now, let's continue the conversation there.

@kedro-org kedro-org locked and limited conversation to collaborators Dec 2, 2024
@astrojuanlu astrojuanlu converted this issue into discussion #4360 Dec 2, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Status: Done
Development

No branches or pull requests

3 participants