Releases: databrickslabs/lsql
Releases · databrickslabs/lsql
v0.14.1
Release v0.14.1 (#352) * Changes to work with Databricks SDK `v0.38.0` ([#350](https://github.com/databrickslabs/lsql/issues/350)). In this release, we have upgraded the Databricks SDK to version 0.38.0 from version 0.37.0 to ensure compatibility with the latest SDK and address several issues. The update includes changes to make the code compatible with the new SDK version, removing the need for `.as_dict()` method calls when creating or updating dashboards and utilizing a `sdk_dashboard` variable for interacting with the Databricks workspace. We also updated the dependencies to "databricks-labs-blueprint[yaml]" package version greater than or equal to 0.4.2 and `sqlglot` package version greater than or equal to 22.3.1. The `test_core.py` file has been updated to address multiple issues ([#349](https://github.com/databrickslabs/lsql/issues/349) to [#332](https://github.com/databrickslabs/lsql/issues/332)) related to the Databricks SDK and the `test_dashboards.py` file has been revised to work with the new SDK version. These changes improve integration with Databricks' lakeview dashboards, simplify the code, and ensure compatibility with the latest SDK version, resolving issues [#349](https://github.com/databrickslabs/lsql/issues/349) to [#332](https://github.com/databrickslabs/lsql/issues/332). * Specify the minimum required version of `databricks-sdk` as 0.37.0 ([#331](https://github.com/databrickslabs/lsql/issues/331)). In this release, we have updated the minimum required version of the `databricks-sdk` package to 0.37.0 from 0.29.0 in the `pyproject.toml` file to ensure compatibility with the latest version. This change was made necessary due to updates made in issue [#320](https://github.com/databrickslabs/lsql/issues/320). To accommodate any patch release of `databricks-sdk` with a major and minor version of 0.37, we have updated the dependency constraint to use the `~=` operator, resolving issue [#330](https://github.com/databrickslabs/lsql/issues/330). These changes are intended to enhance the compatibility and stability of our software.
v0.14.0
- Added nightly tests run at 4:45am UTC (#318). A new nightly workflow has been added to the codebase, designed to automate a series of jobs every day at 4:45am UTC on the
larger
environment. The workflow includes permissions for writing id-tokens, accessing issues, reading contents and pull-requests. It checks out the code with a full fetch-depth, installs Python 3.10, and uses hatch 1.9.4. The key step in this workflow is the execution of nightly tests using the databrickslabs/sandbox/acceptance action, which creates issues if necessary. The workflow utilizes several secrets, including VAULT_URI, GITHUB_TOKEN, ARM_CLIENT_ID, and ARM_TENANT_ID, and sets the TEST_NIGHTLY environment variable to true. Additionally, the workflow is part of a concurrency group called "single-acceptance-job-per-repo", ensuring that only one acceptance job runs at a time per repository. - Bump codecov/codecov-action from 4 to 5 (#319). In this version update, the Codecov GitHub Action has been upgraded from 4 to 5, bringing improved functionality and new features. This new version utilizes the Codecov Wrapper to encapsulate the CLI, enabling faster updates. Additionally, an opt-out feature has been introduced for tokens in public repositories, allowing contributors and other members to upload coverage reports without requiring access to the Codecov token. The upgrade also includes changes to the arguments:
file
is now deprecated and replaced withfiles
, andplugin
is deprecated and replaced withplugins
. New arguments have been added, includingbinary
,gcov_args
,gcov_executable
,gcov_ignore
,gcov_include
,report_type
,skip_validation
, andswift_project
. Comprehensive documentation on these changes can be found in the release notes and changelog. - Fixed
RuntimeBackend
exception handling (#328). In this release, we have made significant improvements to the exception handling in theRuntimeBackend
component, addressing issues reported in tickets #328, #327, #326, and #325. We have updated theexecute
andfetch
methods to handle exceptions more gracefully and changed exception handling from catchingException
to catchingBaseException
for more comprehensive error handling. Additionally, we have updated thepyproject.toml
file to use a newer version of thedatabricks-labs-pytester
package (0.2.1 to 0.5.0) which may have contributed to the resolution of these issues. Furthermore, thetest_backends.py
file has been updated to improve the readability and user-friendliness of the test output for the functions testing if aNotFound
,BadRequest
, orUnknown
exception is raised when executing and fetching statements. Thetest_runtime_backend_use_statements
function has also been updated to printPASSED
orFAILED
instead of returning those values. These changes enhance the robustness of the exception handling mechanism in theRuntimeBackend
class and update related unit tests.
Dependency updates:
- Bump codecov/codecov-action from 4 to 5 (#319).
Contributors: @nfx, @JCZuurmond, @dependabot[bot]
v0.13.0
- Added
escape_name
function to escape individual SQL names andescape_full_name
function to escape dot-separated full names (#316). Two new functions,escape_name
andescape_full_name
, have been added to thedatabricks.labs.lsql.escapes
module for escaping SQL names. Theescape_name
function takes a single name as an input and returns it enclosed in backticks, whileescape_full_name
handles dot-separated full names by escaping each individual component. These functions have been ported from thedatabrickslabs/ucx
repository and are designed to provide a consistent way to escape names and full names in SQL statements, improving the robustness of the system by preventing issues caused by unescaped special characters in SQL names. The test suite includes various cases, including single names, full names with different combinations of escaped and unescaped components, and special characters, with a specific focus on the scenario where the column name contains a period. - Bump actions/checkout from 4.2.0 to 4.2.1 (#304). In this pull request, the
actions/checkout
dependency is updated from version 4.2.0 to 4.2.1 in the.github/workflows/release.yml
file. This update includes a new feature whererefs/*
are checked out by commit if provided, falling back to the ref specified by the@orhantoy
user. This change improves the flexibility of the action, allowing users to specify a commit or branch for checkout. The pull request also introduces a new contributor,@Jcambass
, who added a workflow file for publishing releases to an immutable action package. The commits for this release include changes to prepare for the 4.2.1 release, add a workflow file for publishing releases, and check out otherrefs/*
by commit if provided, falling back to ref. This pull request has been reviewed and approved by Dependabot. - Bump actions/checkout from 4.2.1 to 4.2.2 (#310). This is a pull request to update the
actions/checkout
dependency from version 4.2.1 to 4.2.2, which includes improvements to theurl-helper.ts
file that now utilize well-known environment variables and expanded unit test coverage for theisGhes
function. Theactions/checkout
action is commonly used in GitHub Actions workflows for checking out a repository at a specific commit or branch. The changes in this update are internal to theactions/checkout
action and should not affect the functionality of the project utilizing this action. The pull request also includes details on the commits and compatibility score for the upgrade, and reviewers can manage and merge the request using Dependabot commands once the changes have been verified. - Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1 (#307). In this release, the
databrickslabs/sandbox
dependency has been updated from versionacceptance/v0.3.0
to0.3.1
. This update includes previously tagged commits, bug fixes for git-related libraries, and resolution of theunsupported protocol scheme
error. The README has been updated with more information on using thedatabricks labs sandbox
command, and installation instructions have been improved. Additionally, there have been dependency updates forgo-git
libraries andgolang.org/x/crypto
in the/go-libs
and/runtime-packages
directories. New commits in this release allow larger logs from acceptance tests and implement experimental OIDC refresh functionality. Ignore conditions have been applied to prevent conflicts with previous versions of the dependency. This update is recommended for users who want to take advantage of the latest bug fixes and improvements. - Bump databrickslabs/sandbox from acceptance/v0.3.1 to 0.4.2 (#315). In this release, the
databrickslabs/sandbox
dependency has been updated from versionacceptance/v0.3.1
to0.4.2
. This update includes bug fixes, dependency updates, and additional go-git libraries. Specifically, theRun integration tests
job in the GitHub Actions workflow has been updated to use the new version of thedatabrickslabs/sandbox/acceptance
Docker image. The updated version also includes install instructions, usage instructions in the README, and a modification to provide more git-related libraries. Additionally, there were several updates to dependencies, includinggolang.org/x/crypto
version0.16.0
to0.17.0
. Dependabot, a tool that manages dependencies in GitHub projects, is responsible for the update and provides instructions for resolving any conflicts or merging the changes into the project. This update is intended to improve the functionality and reliability of thedatabrickslabs/sandbox
dependency. - Deprecate
Row.as_dict()
(#309). In this release, we are introducing a deprecation warning for theas_dict()
method in theRow
class, which will be removed in favor of theasDict()
method. This change aims to maintain consistency with Spark'sRow
behavior and prevent subtle bugs when switching between different backends. The deprecation warning will be implemented using Python's warnings mechanism, including the new annotation in Python 3.13 for static code analysis. The existing functionality of fetching values from the database throughStatementExecutionExt
remains unchanged. We recommend that clients update their code to use.asDict()
instead of.as_dict()
to avoid any disruptions. A new test casetest_row_as_dict_deprecated()
has been added to verify the deprecation warning forRow.as_dict()
. - Minor improvements for
.save_table(mode="overwrite")
(#298). In this release, the.save_table()
method has been improved, particularly when using theoverwrite
mode. If no rows are supplied, the table will now be truncated, ensuring consistency with the mock backend behavior. This change has been optimized for SQL-based backends, which now perform truncation as part of the insert for the first batch. Type hints on the abstract method have been updated to match the concrete implementations. Unit tests and integration tests have been updated to cover the new functionality, and new methods have been added to test the truncation behavior in overwrite mode. These improvements enhance the consistency and efficiency of the.save_table()
method when usingoverwrite
mode across different backends. - Updated databrickslabs/sandbox requirement to acceptance/v0.3.0 (#305). In this release, we have updated the requirement for the
databrickslabs/sandbox
package to versionacceptance/v0.3.0
in thedownstreams.yml
file. This update is necessary to use the latest version of the package, which includes several bug fixes and dependency updates. Thedatabrickslabs/sandbox
package is used in the acceptance tests, which are run as part of the CI/CD pipeline. It provides a set of tools and utilities for developing and testing code in a sandbox environment. The changelog for this version includes the addition of install instructions, more git-related libraries, and the modification of the README to include information about how to use it with thedatabricks labs sandbox
command. Specifically, the version of thedatabrickslabs/sandbox
package used in theacceptance
job has been updated fromacceptance/v0.1.4
toacceptance/v0.3.0
, allowing the integration tests to be run using the latest version of the package. The ignore conditions for this PR ensure that Dependabot will resolve any conflicts that may arise and can be manually triggered with the@dependabot rebase
command.
Dependency updates:
- Bump actions/checkout from 4.2.0 to 4.2.1 (#304).
- Updated databrickslabs/sandbox requirement to acceptance/v0.3.0 (#305).
- Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1 (#307).
- Bump actions/checkout from 4.2.1 to 4.2.2 (#310).
- Bump databrickslabs/sandbox from acceptance/v0.3.1 to 0.4.2 (#315).
Contributors: @dependabot[bot], @asnare, @nfx, @JCZuurmond , @larsgeorge-db
v0.12.1
- Bump actions/checkout from 4.1.7 to 4.2.0 (#295). In this version 4.2.0 release of the
actions/checkout
library, the team has addedRef
andCommit
outputs, which provide the ref and commit that were checked out, respectively. The update also includes dependency updates tobraces
,minor-npm-dependencies
,docker/build-push-action
, anddocker/login-action
, all of which were automatically resolved by Dependabot. These updates improve compatibility and stability for users of the library. This release is a result of contributions from new team members @yasonk and @lucacome. Users can find a detailed commit history, pull requests, and release notes in the associated links. The team strongly encourages all users to upgrade to this new version to access the latest features and improvements. - Set catalog on
SchemaDeployer
to overwrite the defaulthive_metastore
(#296). In this release, the default catalog forSchemaDeployer
has been changed fromhive_metastore
to a user-defined catalog, allowing for more flexibility in deploying resources to different catalogs. A new dependency,databricks-labs-pytester
, has been added with a version constraint of>=0.2.1
, which may indicate the introduction of new testing functionality. TheSchemaDeployer
class has been updated to accept acatalog
parameter and the tests for deploying and deleting schemas, tables, and views have been updated to reflect these changes. Thetest_deploys_schema
,test_deploys_dataclass
, andtest_deploys_view
tests have been updated to accept ainventory_catalog
parameter, and thecaplog
fixture is used to capture log messages and assert that they contain the expected messages. Additionally, a new test functiontest_statement_execution_backend_overwrites_table
has been added to thetests/integration/test_backends.py
file to test the functionality of theStatementExecutionBackend
class in overwriting a table in the database and retrieving the correct data. Issue #294 has been resolved, and progress has been made on issue #278, but issue #280 has been marked as technical debt and issue #287 is required for the CI to pass.
Dependency updates:
- Bump actions/checkout from 4.1.7 to 4.2.0 (#295).
Contributors: @dependabot[bot], @JCZuurmond
v0.12.0
- Added method to detect rows are written to the
MockBackend
(#292). In this commit, theMockBackend
class in the 'backends.py' file has been updated with a new method, 'has_rows_written_for', which allows for differentiation between a table that has never been written to and one with zero rows. This method checks if a specific table has been written to by iterating over the table stubs in the_save_table
attribute and returningTrue
if the given full name matches any of the stub full names. Additionally, the class has been supplemented with therows_written_for
method, which takes a table name and mode as input and returns a list of rows written to that table in the given mode. Furthermore, several new test cases have been added to test the functionality of theMockBackend
class, including checking if thehas_rows_written_for
method correctly identifies when there are no rows written, when there are zero rows written, and when rows are written after the first and second write operations. These changes improve the overall testing coverage of the project and aid in testing the functionality of theMockBackend
class. The new methods are accompanied by documentation strings that explain their purpose and functionality.
Contributors: @JCZuurmond
v0.11.0
- Added filter spec implementation (#276). In this commit, a new
FilterHandler
class has been introduced to handle filter files with the suffix.filter.json
, which can parse filter specifications in the header of the filter file and validate the filter columns and types. The commit also adds support for three types of filters:DATE_RANGE_PICKER
,MULTI_SELECT
, andDROPDOWN
, which can be linked with multiple visualization widgets. Additionally, aFilterTile
class has been added to theTile
class, which represents a filter tile in the dashboard and includes methods to validate the tile, create widgets, and generate filter encodings and queries. TheDashboardMetadata
class has been updated to include a new methodget_datasets()
to retrieve the datasets for the dashboard. These changes enhance the functionality of the dashboard by adding support for filtering data using various filter types and linking them with multiple visualization widgets, improving the customization and interactivity of the dashboard, and making it more user-friendly and efficient. - Bugfix:
MockBackend
wasn't mockingsavetable
properly when the mode isappend
(#289). This release includes a bugfix and enhancements for theMockBackend
component, which is used to mock theSQLBackend
. The.savetable()
method failed to function as expected inappend
mode, writing all rows to the same table instead of accumulating them. This bug has been addressed, ensuring that rows accumulate correctly inappend
mode. Additionally, a new test function,test_mock_backend_save_table_overwrite()
, has been added to demonstrate the corrected behavior ofoverwrite
mode, showing that it now replaces only the existing rows for the given table while preserving other tables' contents. The type signature for.save_table()
has been updated, restricting themode
parameter to accept only two string literals:"append"
and"overwrite"
. TheMockBackend
behavior has been updated accordingly, and rows are now filtered to exclude anyNone
orNULL
values prior to saving. These improvements to theMockBackend
functionality and test suite increase reliability when using theMockBackend
as a testing backend for the system. - Changed filter spec to use YML instead of JSON (#290). In this release, the filter specification files have been converted from JSON to YAML format, providing a more human-readable format for the filter specifications. The schema for the filter file includes flags for column, columns, type, title, description, order, and id, with the type flag taking on values of DROPDOWN, MULTI_SELECT, or DATE_RANGE_PICKER. This change impacts the FilterHandler, is_filter method, and _from_dashboard_folder method, as well as relevant parts of the documentation. Additionally, the parsing methods have been updated to use yaml.safe_load instead of json.loads, and the is_filter method now checks for .filter.yml suffix. A new file, '00_0_date.filter.yml', has been added to the 'tests/integration/dashboards/filter_spec_basic' directory, containing a sample date filter definition. Furthermore, various tests have been added to validate filter specifications, such as checking for invalid type and both
column
andcolumns
keys being present. These updates aim to enhance readability, maintainability, and ease of use for filter configuration. - Increase testing of generic types storage (#282). A new commit enhances the testing of generic types storage by expanding the test suite to include a list of structs, ensuring more comprehensive testing of the system. The
Foo
struct has been renamed toNested
for clarity, and two new structs,NestedWithDict
andNesting
, have been added. TheNesting
struct contains aNested
object, whileNestedWithDict
includes a string and an optional dictionary of strings. A new test case demonstrates appending complex types to a table by creating and saving a table with two rows, each containing aNesting
struct. The test then fetches the data and asserts the expected number of rows are returned, ensuring the proper functioning of the storage system with complex data types. - Minor Changes to avoid redundancy in code and follow code patterns (#279). In this release, we have made significant improvements to the
dashboards.py
file to make the code more concise, maintainable, and in line with the standard library's recommended usage. Theexport_to_zipped_csv
method has undergone major changes, including the removal of theBytesIO
module import and the use ofStringIO
for handling strings as files. The method no longer creates a separate ZIP file for the CSV files, instead using the providedexport_path
. Additionally, the method skips tiles that don't contain queries. We have also introduced a new method,dataclass_transform
, which transforms a given dataclass into a new one with specific attributes and behavior. This method creates a new dataclass with a custom metaclass and adds a new method,to_dict()
, which converts the instances of the new dataclass to dictionaries. These changes promote code reusability and reduce redundancy in the codebase, making it easier for software engineers to work with. - New example with bar chart in dashboards-as-code (#281). A new example of a dashboard featuring a bar chart has been added to the
dashboards-as-code
feature using the existing metadata overrides feature to support the new widget type, without bloating the TileMetadata structure. An integration test was added to demonstrate the creation of a bar chart, and the resulting dashboard can be seen in the attached screenshot. Additionally, a new SQL file has been added for theProduct Sales
dashboard, showcasing sales data for different product categories. This approach can potentially be used to support other widget types such as Bar, Pivot, Area, etc. The team is encouraged to provide feedback on this proposed solution.
Contributors: @JCZuurmond, @bishwajit-db, @ericvergnaud, @jgarciaf106, @asnare
v0.10.0
- Added Functionality to export any dashboards-as-code into CSV (#269). The
DashboardMetadata
class now includes a new method,export_to_zipped_csv
, which enables exporting any dashboard as CSV files in a ZIP archive. This method acceptssql_backend
andexport_path
as parameters and exports dashboard queries to CSV files in the specified ZIP archive by iterating through tiles and fetching dashboard queries if the tile is a query. To ensure the proper functioning of this feature, unit tests and manual testing have been conducted. A new test,test_dashboards_export_to_zipped_csv
, has been added to verify the correct export of dashboard data to a CSV file. - Added support for generic types in
SqlBackend
(#272). In this release, we've added support for using rich dataclasses, including those with optional and generic types, in theSqlBackend
of theStatementExecutionBackend
class. The new functionality is demonstrated in thetest_supports_complex_types
unit test, which creates aNested
dataclass containing various complex data types, such as nested dataclasses,datetime
objects,dict
,list
, and optional fields. This enhancement is achieved by updating thesave_table
method to handle the conversion of complex dataclasses to SQL statements. To facilitate type inference, we've introduced a newStructInference
class that converts Python dataclasses and built-in types to their corresponding SQL Data Definition Language (DDL) representations. This addition simplifies data definition and manipulation operations while maintaining type safety and compatibility with various SQL data types.
Contributors: @jgarciaf106, @nfx
v0.9.3
- Added documentation for exclude flag (#265). A new
exclude
flag has been added to the configuration file for our lab tool, allowing users to specify a path to exclude from formatting during lab execution. This release also includes corrections to grammatical errors in the descriptions of existing flags related to catalog and database settings, such as updatingseperated
to "separate". Additionally, the flag descriptions forpublish
andopen-browser
have been updated for clarification:publish
now clearly controls whether the dashboard is published after creation, whileopen-browser
controls whether the dashboard is opened in a web browser. These changes are aimed at improving user experience and ease of use for our lab tool. - Fixed dataclass field type in _row_to_sql (#266). In this release, we have addressed an issue related to #257 by fixing the dataclass field type in the
_row_to_sql
method of thebackends.py
file. Additionally, we have made updates to the_schema_for
method to use a new_field_type
class method. This change resolves a rare problem where thefield.type
is a string instead of a type and ensures compatibility with a pull request from an external repository (databrickslabs/ucx#2526). The new_field_type
method attempts to load the type from__builtins__
if it's a string and logs a warning if it fails. The_row_to_sql
method now consistently uses the_field_type
method to get the field type. This ensures that the library functions seamlessly and consistently, avoiding any potential issues in the future.
Contributors: @ericvergnaud
v0.9.2
- Make hatch a prerequisite (#259). In this commit, Eric Vergnaud has introduced a change to make the installation of
hatch
version 1.9.4 a prerequisite for the project to avoid errors related topip
command recognition. The Makefile has been updated to handle the installation of hatch automatically, and thehatch env create
command is now used instead ofpip install hatch==1.7.0
. This change ensures that the development environment is consistent and reliable by maintaining the correct version of hatch and automatically handling its installation. Additionally, the.venv/bin/python
anddev
targets have been updated accordingly to reflect these changes. This commit also formats all files using themake dev fmt
command, which helps maintain consistent code formatting throughout the project. - add support for exclusions in
fmt
command (#263). In this release, we have added support for exclusions to thefmt
command in the 'databricks/labs/lsql/cli.py' module. This feature allows users to specify a list of directories or files to exclude while formatting SQL files, which is particularly useful when verifying SQL notebooks in ucx. Thefmt
command now accepts a new optional parameter 'exclude', which accepts an iterable of strings that specify the relative paths to exclude. Anysql_file
that is a descendant of anyexclusion
is skipped during formatting. The exclusions are implemented by converting the relative paths intoPath
objects. This change addresses the issue where single line comments are converted into inlined comments, causing misinterpretation. The added unit test is manually verified, and this pull request fixes issue #261. This feature was authored and co-authored by Eric Vergnaud.
Contributors: @ericvergnaud
v0.9.1
- Fixed dataclass field types (#257). This PR introduces a workaround to a Python bug affecting the
dataclasses.fields()
function, which sometimes returns field types as string type names instead of types. This can cause the ORM to malfunction. The workaround involves checking if the returnedf.type
is a string, and if so, converting it to a type by looking it up in the__builtins__
dictionary. This change is global and affects the_schema_for
function in thebackends.py
file, which is responsible for creating a schema for a given dataclass, taking into account any necessary type conversions. This change ensures consistent and accurate type handling in the face of the Python bug, improving the reliability of our ORM. - Fixed missing EOL when formatting SQL files (#260). In this release, we have addressed an issue related to the inconsistent addition of end-of-line (EOL) characters in formatted SQL files. The
QueryTile.format()
method has been updated to ensure that an EOL character is always added, except when the input query already ends with a newline. This change enhances the reliability of the SQL formatting functionality, making the output format more predictable and improving the overall user experience. The new implementation is demonstrated in thetest_query_format_preserves_eol()
test case, and existing test cases have been updated to check for the presence of EOL characters, further ensuring consistent and correct formatting. - Fixed normalize case input in cli (#258). In this release, we have updated the
fmt
command in thecli.py
file to allow users to specify whether they want to normalize the case of SQL files when formatting. Thenormalize_case
parameter now defaults to the string"true"
and checks if it is in theSTRING_AFFIRMATIVES
list to determine whether to normalize the case of SQL files. Additionally, we have introduced a new optionalnormalize_case
parameter in theformat
method of thedashboards.py
file in the Databricks CLI, which normalizes the identifiers in the query to lower case when set toTrue
. We have also added support for a newnormalize_case
parameter in theQueryTile.format()
method, which prevents the automatic normalization of string input to uppercase when set toFalse
. This change allows for more flexibility in handling string input and ensures that the input string is preserved as-is. These updates improve the functionality and usability of the open-source library, providing more control to users over formatting and handling of string input.
Contributors: @ericvergnaud, @nfx, @JCZuurmond