Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return StructuredDataset which is a field in a dataclass #3071

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

arbaobao
Copy link
Contributor

@arbaobao arbaobao commented Jan 21, 2025

Tracking issue

Related to #6117

Why are the changes needed?

If we wrap the StructuredDataset in a dataclass, it will fail during the to_flyte_idl conversion.

What changes were proposed in this pull request?

Before returning Literals, we check the type of python_val._literal_sd. If it is a Python native StructuredDataset, we transform it into a Literals.StructuredDataset.

How was this patch tested?

As described in #6117, an error occurs when the extract task is executed.

@dataclass
class Data:
    f: StructuredDataset


@task
def create_data() -> Data:
    return Data(f=StructuredDataset(dataframe=pd.DataFrame({"a": [5]})))


@task
def extract(d: Data) -> StructuredDataset:
    return d.f


@workflow
def example_wf() -> None:
    d = create_data()
    f = extract(d=d)

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Summary by Bito

Fixed a bug in StructuredDataset handling within dataclasses during to_flyte_idl conversion. The PR adds proper transformation of python_val._literal_sd instances into Literals.StructuredDataset, enabling tasks to successfully return StructuredDataset objects as dataclass fields.

Unit tests added: False

Estimated effort to review (1-5, lower is better): 1

Signed-off-by: Nelson Chen <asd3431090@gmail.com>
Signed-off-by: Nelson Chen <asd3431090@gmail.com>
Signed-off-by: Nelson Chen <asd3431090@gmail.com>
@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 21, 2025

Code Review Agent Run #63793c

Actionable Suggestions - 2
  • flytekit/types/structured/structured_dataset.py - 2
Review Details
  • Files reviewed - 2 · Commit Range: 51f6f73..a3df842
    • flytekit/core/type_engine.py
    • flytekit/types/structured/structured_dataset.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Contributor

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
Bug Fix - Fix StructuredDataset Dataclass Handling

structured_dataset.py - Added handling for StructuredDataset fields within dataclasses during type transformation

Comment on lines +738 to +742
if isinstance(python_val._literal_sd, StructuredDataset):
sdt = StructuredDatasetType(format=python_val._literal_sd.file_format)
metad = literals.StructuredDatasetMetadata(structured_dataset_type=sdt)
sd_literal = literals.StructuredDataset(uri=python_val._literal_sd.uri, metadata=metad)
return Literal(scalar=Scalar(structured_dataset=sd_literal))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Private member access needs encapsulation

Accessing private member '_literal_sd'. Consider using a public interface or property to access this data.

Code suggestion
Check the AI-generated fix before applying
 -            if literal_type.structured_dataset_type is not None and self._literal_sd is not None:
 -                return self._literal_sd
 -            if literal_type.structured_dataset_type is not None and self._literal_sd is None:
 +            if literal_type.structured_dataset_type is not None and self.literal_sd is not None:
 +                return self.literal_sd
 +            if literal_type.structured_dataset_type is not None and self.literal_sd is None:

Code Review Run #63793c


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Comment on lines +738 to +742
if isinstance(python_val._literal_sd, StructuredDataset):
sdt = StructuredDatasetType(format=python_val._literal_sd.file_format)
metad = literals.StructuredDatasetMetadata(structured_dataset_type=sdt)
sd_literal = literals.StructuredDataset(uri=python_val._literal_sd.uri, metadata=metad)
return Literal(scalar=Scalar(structured_dataset=sd_literal))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider extracting literal creation logic

The code block for handling StructuredDataset passed through dataclass could be simplified by extracting the literal creation logic into a helper method. This would improve code readability and maintainability.

Code suggestion
Check the AI-generated fix before applying
Suggested change
if isinstance(python_val._literal_sd, StructuredDataset):
sdt = StructuredDatasetType(format=python_val._literal_sd.file_format)
metad = literals.StructuredDatasetMetadata(structured_dataset_type=sdt)
sd_literal = literals.StructuredDataset(uri=python_val._literal_sd.uri, metadata=metad)
return Literal(scalar=Scalar(structured_dataset=sd_literal))
if isinstance(python_val._literal_sd, StructuredDataset):
return self._create_structured_dataset_literal(python_val._literal_sd.uri, python_val._literal_sd.file_format)
def _create_structured_dataset_literal(self, uri: str, file_format: str) -> Literal:
sdt = StructuredDatasetType(format=file_format)
metad = literals.StructuredDatasetMetadata(structured_dataset_type=sdt)
sd_literal = literals.StructuredDataset(uri=uri, metadata=metad)
return Literal(scalar=Scalar(structured_dataset=sd_literal))

Code Review Run #63793c


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Copy link
Member

@Future-Outlier Future-Outlier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks correct, can you provide

  1. screenshot
  2. add an example to integration test to test it properlly?
    test_remote.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants