Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

align databricks-iris template to work with kedro-databricks #227

Conversation

JenspederM
Copy link
Contributor

@JenspederM JenspederM commented Jul 14, 2024

Motivation and Context

This PR is made to align the databricks-iris starter to the kedro-databricks plugin.

With these changes, creating a new Kedro project on Databricks should be as easy as:

  • Create a project kedro new --starter="databricks-iris"
  • Install dependencies python -m venv .venv && source ./.venv/bin/activate && pip install --upgrade pip && pip install -r requirements.txt
  • Initialize databricks asset bundle kedro databricks init
  • Bundle pipelines to asset bundle resources kedro databricks bundle
  • Deploy project to Databricks kedro databricks deploy

How has this been tested?

The above has been tested against a private Databricks workspace.

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Assigned myself to the PR
  • Added tests to cover my changes

@JenspederM JenspederM force-pushed the feat/align-databricks-iris-with-kedro-databricks branch 3 times, most recently from d903972 to 7c101af Compare July 14, 2024 23:22
@JenspederM JenspederM marked this pull request as ready for review July 14, 2024 23:25
@JenspederM JenspederM marked this pull request as draft July 14, 2024 23:30
@JenspederM JenspederM marked this pull request as ready for review July 14, 2024 23:31
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comment. Thanks for creating the plugin I think this is a huge improvement over the manual step deployment guide.

I have left some comment but I suggest separating necessary changes vs README.md, at this stage we are not ready to mention a third-party plugin in our official docs. You can actually create your own custom starter and register it in your plugin so that it can be created with kedro new -s. See the documentations

@JenspederM
Copy link
Contributor Author

Left some comment. Thanks for creating the plugin I think this is a huge improvement over the manual step deployment guide.

... separating necessary changes vs README.md, at this stage we are not ready ...

Does that mean including the plugin

@JenspederM JenspederM closed this Jul 17, 2024
@JenspederM JenspederM reopened this Jul 17, 2024
Signed-off-by: Jens Peder Meldgaard <jenspederm@gmail.com>
Signed-off-by: Jens Peder Meldgaard <jenspederm@gmail.com>
Signed-off-by: Jens Peder Meldgaard <jenspederm@gmail.com>
@JenspederM JenspederM force-pushed the feat/align-databricks-iris-with-kedro-databricks branch from 306f00b to c3a5a08 Compare July 18, 2024 13:28
@JenspederM
Copy link
Contributor Author

Hi @noklam,

Thank you for the thorough review! I have made changes according to your comments - I see I was a bit overenthusiastic in regards to kedro-databricks ;)

I have removed all mentions of it now, and simply ensure that the starter would work out of the box with the plugin instead.

Steps to deploy using the plugin is now:

# Create project
kedro new --starter="https://github.com/JenspederM/kedro-starters" --directory="databricks-iris" --checkout="feat/align-databricks-iris-with-kedro-databricks" --name "iris"

# Install dependencies
pip install -r requirements.txt
pip install kedro-databricks

# Initialization
kedro databricks init

# Bundling
kedro databricks bundle

# Deployment
kedro databricks deploy

This is much less invasive than before.

I also made some changes to the plugin, specifically regarding logging to make this more informative and closer aligned with other methods.

Please let me know what you think! :)

@JenspederM JenspederM force-pushed the feat/align-databricks-iris-with-kedro-databricks branch from c3a5a08 to 6cfdebd Compare July 18, 2024 13:41
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, I think we are getting closed, added a few comments, I missed something in the last round.

Signed-off-by: Jens Peder Meldgaard <jenspederm@gmail.com>
@JenspederM JenspederM force-pushed the feat/align-databricks-iris-with-kedro-databricks branch from 6cfdebd to 9fcd6c3 Compare July 18, 2024 13:50
Signed-off-by: Jens Peder Meldgaard <jenspederm@gmail.com>
@JenspederM
Copy link
Contributor Author

JenspederM commented Jul 18, 2024

FYI @noklam

I implemented the changes that you suggested. You're right, no need for the @pandas. :)

I have just tried the steps from my previous comment with this implementation and everything works as expected.

Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one minor styling comment, thanks for making this change! Exciting to see the datarbricks plugin, great work!

Copy link
Contributor

@datajoely datajoely left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work @JenspederM appreciate the effort. My comments are more for wider discussion not a firm recommendation!

@@ -2,7 +2,7 @@ ipython>=8.10
jupyterlab>=3.0
notebook
kedro~={{ cookiecutter.kedro_version }}
kedro-datasets[spark.SparkDataset, pandas.ParquetDataset]>=1.0
kedro-datasets[spark, pandas, spark.SparkDataset, pandas.ParquetDataset]>=1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an argument we should encourage the use of databricks.ManagedTableDataSet (even if it's commented out in the catalog) since it highlights Kedro's compatibility with Unity Catalog + Delta lake?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree 👆

But perhaps better saved for another PR? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolutely, this is a really important and frankly overdue piece of work and I'm just thinking about how to take it further, later!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, +1 on this!

@JenspederM
Copy link
Contributor Author

@datajoely, do you regard your comments as things that should be included in this PR or is it more general discussion for a future PR? 😊

I would like to merge this as I can then make an announcement for the plugin without having to specify a branch for the starter 😊

@noklam
Copy link
Contributor

noklam commented Jul 19, 2024

@JenspederM I see the DCO checking is failing, can you try to fix it? If you click into the "Details" button it shows you the instruction to fix it.

Signed-off-by: Jens Peder Meldgaard <jenspederm@gmail.com>
@JenspederM JenspederM force-pushed the feat/align-databricks-iris-with-kedro-databricks branch from 042e734 to 159ec3e Compare July 19, 2024 11:18
@JenspederM
Copy link
Contributor Author

@noklam I just accepted your change here - apparently that doesn't sign..

Should be fixed now :)

@JenspederM
Copy link
Contributor Author

@datajoely, I see you resolved your comments, does that mean you approve of the PR? :)

@JenspederM
Copy link
Contributor Author

@noklam @datajoely any idea why tests aren't running?

@noklam
Copy link
Contributor

noklam commented Jul 19, 2024

I trigger it now, it's a setting thing. CI won't run for first time contributor.

@JenspederM
Copy link
Contributor Author

Ah okay.

Thank you!

@JenspederM
Copy link
Contributor Author

@noklam @datajoely Are we ready to merge? :)

@noklam noklam self-requested a review July 19, 2024 15:19
@noklam noklam merged commit a0fbc12 into kedro-org:main Jul 19, 2024
23 checks passed
@noklam
Copy link
Contributor

noklam commented Jul 19, 2024

@JenspederM congratulations for your first PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants