Skip to content

Releases: dlt-hub/dlt

1.1.0

26 Sep 13:33
d2b6d05
Compare
Choose a tag to compare

What's Changed

Docs

Verified Sources

  • Custom filter clauses supported, pyarrow/arrowmongo requirement optional for Mongo by @Pipboyguy

New Contributors

Full Changelog: 1.0.0...1.1.0

1.0.0

16 Sep 15:07
Compare
Choose a tag to compare

This is a major dlt release. Please check the list of breaking changes and deprecations: #1778

Core Library

  • move rest_api, sql_database and filesystem sources to dlt core by @willi-mueller in #1728
  • drops foreign_key, adds nested references (row_key - parent_key) by @rudolfix in #1774
  • deprecates complex data type, changes to json by @rudolfix in #1792
  • Feat/1749 abort load package and raise exception on terminal errors in jobs by @willi-mueller in #1781
  • Feat/1492 extend timestamp config to handle naive timestamps (without timezone) by @donotpush in #1669
  • Fix/1571 Incremental: Optionally load or ignore/exclude/include records with cursor_path missing or None value by @willi-mueller in #1576
  • creates a single source in extract for all resource instances passed as list by @rudolfix in #1535
  • Enable BigQuery schema auto-detection with partitioning and clustering hints by @Pipboyguy in #1806
  • Sqlalchemy destination (merge support and docs still in progress) by @steinitzu in #1734
  • Feat/1730 extend filesystem sftp by @donotpush in #1769
  • Stops dumping secrets to dlt traces. by @willi-mueller in #1797
  • Don't use Custom Embedding Functions on LanceDB by @Pipboyguy in #1771
  • sets default concurrency for blob upload for adlfs to 1 to avoid massive memory usage on large files by @rudolfix in #1779
  • Fix/1790 support incremental load with arrow when cursor column is not nullable by @willi-mueller in #1791
  • controls row group size and empty tables in memory buffer when writing parquet by @rudolfix in #1782
  • fix installation command" by @novica in #1741
  • skips tables without jobs when merging delta tables by @rudolfix in #1803

Docs

New Contributors

Full Changelog: 0.5.4...1.0.0

0.5.4

28 Aug 20:02
9857029
Compare
Choose a tag to compare

Core Library

Docs:

New Contributors

Full Changelog: 0.5.3...0.5.4

0.5.3

13 Aug 00:20
19c41ea
Compare
Choose a tag to compare

Core Library

  • Add support for continuously starting load jobs as slots free up in the loader. This will significantly speed up loading packages with many files. by @sh-rp in #1494
  • Add get_delta_tables helper function to optimize and vacuum tables by @jorritsandbrink in #1664
  • Raise/warn on incomplete columns in normalize by @steinitzu in #1504
  • Add enable_dataset_name_normalization option by @VioletM in #1676
  • updates duckdb/motherduck load job to match parquet by column names by @rudolfix in #1674
  • updates duckdb/motherduck load job to fully allow jsonl file format by @rudolfix in #1674
  • removes internal locks when loading parquet from multiple threads (duckdb got fixed) #1674
  • enables multi transactions statements for Motherduck #1674
  • fixes dbt logs line endings

Docs

Verified Sources

  • Column selector added to sql_database @steinitzu

New Contributors

Full Changelog: 0.5.2...0.5.3

0.5.2

02 Aug 19:18
e00baa0
Compare
Choose a tag to compare

Core Library

  • Add upsert merge strategy for Postgres and Snowflake, by @jorritsandbrink in #1466
  • Add basic upsert support for delta table format in filesystem destination by @jorritsandbrink in #1600
  • query tagging for snowflake by @rudolfix in #1582
  • Support Open Source ClickHouse Deployments (MergeTree engine and more) by @Pipboyguy in #1496
  • allows nested types in BigQuery via native autodetect_schema by @rudolfix in #1591
  • Enable upsert merge strategy for more SQL destinations (Athena, BigQuery, Databricks, mssql) by @jorritsandbrink in #1628
  • Fix/1512 fixes current.pipeline() access by @rudolfix in #1581
  • feat: add config dataset_name_prefix to set custom staging dataset name by @donotpush in #1563
  • fix: add airflow db reset for all tests by @donotpush in #1559
  • Enable S3 compatible storage for delta table format by @jorritsandbrink in #1586
  • feat/1495 rest_client: renames JSONResponsePaginator to JSONLinkPaginator by @willi-mueller in #1558
  • Feat/1596 adds custom config providers + example of yaml config provider supporting profiles and jinja placeholders by @rudolfix in #1642
  • Feat/1583 rest client session timeout configuration by @willi-mueller in #1590
  • Add clarification for add_limit by @VioletM in #1594
  • Fix/1606 fixes validator incremental step order to keep it always last in the pipe by @rudolfix in #1641
  • Feat/1593 rest_client: allow setting of request kwargs by @willi-mueller in #1609
  • prevent accidental wrapping of sources in resources when using adapters by @sh-rp in #1645
  • Add empty source handling for delta table format on filesystem destination by @jorritsandbrink in #1617
  • Surface original err msg from pydantic as extended_info on DataValidationError by @codingcyclist in #1569
  • fix(dockerfile): remove extra spaces around equals sign in LABEL inst… by @thisisdope in #1573
  • Qdrant uncommitted state restore and test by @steinitzu in #1545
  • fix: suppress alembic logs for tests by @donotpush in #1578

Docs

New Contributors

Full Changelog: 0.5.1...0.5.2

0.5.1

08 Jul 15:28
d1e5666
Compare
Choose a tag to compare

This is a major release (0.4 -> 0.5) in our versioning scheme so please review the breaking changes below. Most of them are relevant only for platform builders that use dlt internals. Some of the long-deprecated components were removed as well

Breaking Changes

Breaking Changes (internals)

  • if dlt.source or dlt.resource decorated function is passed a None in a default argument during a function call, it will be handled exactly like in regular Python function call. Previously such None would request argument injection from configuration. Please read more here: (#1430)
  • dlt.config.value and dlt.secrets.value were evaluating to None at runtime. Now they will evaluate to a sentinel value. All the existing code should be backward compatible. (#1430)
  • full_refresh flag of dlt.pipeline will be deprecated and replaced with dev_mode. (#1063) and (https://dlthub.com/devel/general-usage/pipeline#do-experiments-with-dev-mode)
  • the default resource extraction sequence has changed to round_robin from fifo as a default setting. You can switch back to the previous behavior and learn more about what this means here: (https://dlthub.com/docs/reference/performance#resources-extraction-fifo-vs-round-robin)
  • if you create an instance of a SPEC (ie. SnowflakeCredentials) it will not be marked as resolved even if all required fields are provided. previously some were resolving and some were not. #1489
  • parse_native_representation never marks config as resolved. previously some were resolving and some were not. #1489

Core Library

Docs

Verified Sources

We worked intensively on rest_api and sql_database:

Read more

0.4.12

29 May 13:14
b4e0491
Compare
Choose a tag to compare

Core Library

  • feat(pipeline): add an ability to auto truncate staging dataset by @IlyaFaer in #1292
  • Feat/1406 bumps duckdb 0.10 + dbt to <=1.8.x by @rudolfix in #1407
  • Azure service principal credentials support by @steinitzu in #1377
  • Support partitioning hints for athena iceberg by @steinitzu in #1403
  • Add recommended_file_size cap to limit data writer file size and cap BigQuery to 4gb by @steinitzu in #1368
  • limits mssql query size to fit network buffer to prevent errors on large inserts by @rudolfix in #1372
  • allows to bubble up exceptions when standalone resource returns by @rudolfix in #1374
  • Fix: use .get on column in mssql destination for cases where the yaml… by @Daniel-Vetter-Coverwhale in #1380
  • Make path tests Windows compatible by @jorritsandbrink in #1384
  • RESTClient: Added "values" to the data pattern of the rest_api helper by @francescomucio in #1399
  • corrects single entity path detection by @rudolfix in #1394
  • RESTClient: implement AuthConfigBase.bool + update docs by @burnash in #1413
  • Fix: ensure custom session can be provided to rest client by @z3z1ma in #1396

Docs

  • RESTClient: add an example for creating a custom POST paginator by @burnash in #1358
  • Add rest_api verified source documentation by @burnash in #1308
  • Fix typo in Slack Docs by @cybermaxs in #1369
  • RESTClient: docs: add the troubleshooting section by @burnash in #1367
  • Replace weather api example with github in create a pipeline walkthrough by @sultaniman in #1351
  • RESTClient: docs: Fixed snippet definition by @burnash in #1373
  • docs: destination tables: elaborate on example code by @burnash in #1386
  • add naming rules to contributing by @sh-rp in #1291
  • Added info about how to reorder the columns to adjust a schema by @dat-a-man in #1364
  • rest_api: add response_actions documentation by @burnash in #1362
  • Update the tutorial to use rest_client.paginate for pagination by @burnash in #1287
  • fix command to install dlt by @Benjamin0313 in #1404
  • improves sql database docs by @rudolfix in #1383
  • add typing classifier and update maintainers in pyproject by @sh-rp in #1391
  • Updated installation command in destination docs and a few others by @dat-a-man in #1410
  • Update filesystem docs with auto mkdir config by @VioletM in #1416
  • add page to docs for openapi generator by @sh-rp in #1417

New Contributors

Full Changelog: 0.4.11...0.4.12

0.4.11

14 May 16:04
aab21ba
Compare
Choose a tag to compare

Core Library

  • RESTClient: building blocks (auths, paginators, response extractors etc.) to write REST API pipelines by @burnash
  • Enable merge write disposition for athena Iceberg by @jorritsandbrink in #1315
  • adds std pipe iterator for stdout and stderr by @rudolfix in #1321
  • adds _impl_cls to dlt.resource and dynamic config section to standalone resources with dynamic names by @rudolfix in #1324
  • Accept :memory: mode for credentials parameter in duckdb factory by @sultaniman in #1297
  • allows windows native, UNC and extended paths in filesystem source and destination by @rudolfix in #1335
  • improves union validation: user friendly exceptions by @rudolfix in #1327
  • improves instantiation and shutdown of thread pools for telemetry trackers by @rudolfix in #1340
  • feat(airflow): pass data sources as callables and additional initializers for delayed source evaluation by @IlyaFaer in #1318
  • Fix: ignores table options on ALTER TABLE in BigQuery by @rudolfix in #1306
  • Fix: use correct check for column prop in column schema by @z3z1ma in #1347
  • Streamlit caching and session state store fixes by @sultaniman in #1326
  • implements method to merge columns in two table schemas by @rudolfix in #1348
  • Extend motherduck client configuration to pass custom user agent by @sultaniman in #1284
  • allows fsspec until 2023.1.0 by @rudolfix in #1305

Docs

Verified Sources

Full Changelog: 0.4.10...0.4.11

0.4.10

30 Apr 19:34
048839d
Compare
Choose a tag to compare

Core Library

  • Clickhouse destination by @Pipboyguy in #1097
  • fix(filesystem): UNC paths are supported on filesystem source and destination by @IlyaFaer in #1209
  • scd2 extension: pick your active record literal, defaults to NULL by @jorritsandbrink in #1275
  • make missing keys warning conditional on merge strategy by @jorritsandbrink in #1290
  • Fix filesystem layout timestamps with milliseconds by @sultaniman in #1286
  • fallbacks to copy on any OSError when doing hardlink by @rudolfix in #1302
  • configurable anonymous telemetry tracker by @rudolfix in #1301
  • fix athena edge case and adds layout tests for athena by @sh-rp in #1289
  • Streamlit app: do not show a notice if there is no resource state for schema by @sultaniman in #1300

Docs

Full Changelog: 0.4.9...0.4.10

0.4.9

25 Apr 05:50
efaedc2
Compare
Choose a tag to compare

Core Library

Docs

Verified Sources

New Contributors

Full Changelog: 0.4.8...0.4.9