Skip to content

Contributor meetings

Manfred Moser edited this page Dec 17, 2024 · 114 revisions

The Trino project organizes regular meetings with contributors to improve collaboration with maintainers and among contributors. This space acts is used to plan upcoming meetings and provide meeting minutes after the event.

General information

  • Trino Contributor Congregations are in-person meetings, typically run the day after Trino Fest or Trino Summit.
  • Trino Contributor Calls are virtual events, typically scheduled for every fourth Thursday of the month.
  • Anyone can attend.
  • Attendees can request invites from the Manfred Moser (@mosabua) on slack.
  • Attendees can also just join on the day, connections details are announced just prior to the event on slack or on this page.
  • Event dates are announced on slack, LinkedIn, and added to the Trino events calendar.
  • Topics can be suggested prior to the event by updating this page or during the meeting.
  • 📹 Video recordings are posted on the dedicated YouTube playlist starting with the April 2024 call.
  • Meeting notes and other details are captured on this page.

Trino Contributor Call, 13 Dec 2024

📹 Video recording on YouTube with timestamps

Attendees

mosabua, xkrogen, dprophet, monimiller, [georgewfisher](https://github.com/georgewfisher], robertzych

Topics, notes and action items

0:00 Introduction

0:49 Trino Summit recap

  • including keynote, AI panel,
  • Python UDF demo and explanation,
  • c# client and more

4:54 C# client details

7:47 JSON pushdown in Pinot connector

12:54 Bulk data insertion with SQL

  • And Python in the demo
  • demo from dprophet
  • discussion around how to bring this to the community
  • what next steps are coming in trino for this usecase of inserting lots of data using SQL

26:20 More details about Python user-defined functions

39:13 Spooling protocol and query level retry in FTE

42:21 Closing

Trino Contributor Call, 28 Nov 2024

📹 Video recording on YouTube with timestamps

Attendees

mosabua, wendigo, mosiac1

Topics, notes and action items

0:00 Introduction

0:30 HTTP/2 for internal communication by default

  • Users found issues
  • Deactivating by default in 467
  • Investigation ongoing, let us know

2:40 Spooling protocol

  • Shipped with docs in 466

5:08 Apache Ranger plugin

  • Shipped with docs in 466
  • Blog post coming

5:50 Java client library requirement

  • Plan to update to Java 11 or 17 as requirement for newer version
  • Old versions still available
  • Asking for feedback if that causes any issues for anyone
  • Decompression with spooling gets more performance with Java 22+

9:03 Kinesis, Phoenix, and Kudu connectors

10:12 Trino Summit 2024

Trino Kubernetes operator discussion, 30 Oct 2024

More details in https://trino.io/blog/2024/10/10/operator

Minutes at https://github.com/trinodb/trino-k8s-operator/wiki/Contributor-meetings

Trino Contributor Call, 24 Oct 2024

📹 Video recording on YouTube with timestamps

Attendees

martint, mosabua, wendigo, dprophet, dougb, mosiac1, robert zych (raft)

Topics, notes and action items

0:00 Introduction
0:30 Trino Summit 2024

1:57 Contribution process doc update

3:12 Issues clean up

  • Call for help

4:18 Accumulo connector removal, Kinesis connector

6:50 Java 23 and JDBC driver

  • will file issue about desire to upgrade requirement for JDBC driver to Java 11
  • java 23 in 464 as requirement

10:15 k8s operator meeting

11:05 HTTP/2

  • HTTP/2 for internal communication in 463

12:10 Views, Iceberg views, Coral, different query engines, Substrait and more

  • Iceberg view from Trino , doesnt work in Spark
  • Explanation of a lot of background and details
  • Coral library
  • Substrait project
  • mosabua to file issue about Coral on by default
  • mosabua to file issue about Coral support in Iceberg, Delta Lake

30:21 Pushdown connectors, JSON functions, and Pinot connector

  • pinot pushdown - robert zych/raft
  • working on pushdown via brokers, chat with wendigo
  • long discussion and explanations
  • json - new and old functions

contains(array, json_extract_scalar(jsonColumn, ‘$.key’)) -> json_match(jsonColumn, ‘$.key in array’)

json_array_contains(json_extract(jsonColumn, ‘$.array’), value) -> json_match(jsonColumn, ‘$.array[*] = value’)

56:00 LDAP group provider plugin was taken over
57:00 Open API spec for Trino client API

Trino Contributor Call, 26 Sep 2024

📹 Video recording on YouTube with timestamps

Attendees

martint, dain, mosabua, xkrogen, wendigo, nineinchnick, dprophet, mosiac1, osscm, vagaerg, losipiuk, dougb, Saravanan Arumugam

Topics, notes and action items

  • File system switch
    • Breaking change for users done in 458
    • Always ready for feedback
  • Java 23
    • Coming soon
    • Container runtime change is in 459
    • Requirement switch coming in a few releases, looking for reports of any issues
  • Submit proposals for Trino Summit
  • Trino Gateway 11 is out
  • Removal of atop connector - will be 460, no concerns from attendees
  • Removal of localfile connector - is in 459, no concerns from attendees
  • Removal of Raptor connector
  • Project Swift update
    • From wendigo
    • most spooling features are in current releases
    • JDBC and CLI clients support it
    • opt in feature, backward compatible
    • s3/azure/gcs for spooling supported
    • supports server side encryption
    • performance work on clients next
    • Trino Community Broadcast and docs with more info coming soon
    • Implementation help for clients coming
    • work on go, python and js coming
  • Aircompressor
    • 3.0 in 458 uses native binaries and FFI from java 22
    • Looking for feedback and report of any issues
  • Add LDAP Group Provider
    • mosiac1 cleaned up PR
    • looking for help in getting it finalize
    • dain and electrum will help
    • community help also welcome
    • https://github.com/trinodb/trino/pull/20157
    • http group provider potentially to be contributed by vagaerg
  • Richer identity info for access control
    • dprophet talks about attaching more info to user identity for access control/security
    • dain and others provide help and info
    • pablo and team to create proposal with storing attributes and sql functions to retrieve
    • desired to have similar function like current user, current group, parameterized
    • for row filtering and column masking
    • current role statement - mosabua to add docs
  • SQL statement analysis after run
    • topic from dprophet
    • xkrogen has similar need, uses data lineage from query history info and analyses that
    • what tables and data where accessed by a specific query
    • directly and in expressions
  • Query column limit when running MERGE over Iceberg table
    • problem mentioned by dougb
    • https://github.com/trinodb/trino/issues/15848
    • mosabua to loop in electrum to help on merge statement (discussed with Dain in past)
    • merge explodes and run into limits earlier
    • separate problem - Bytecode generation limit - Dain is looking at potentially, very complex, also solutions further up stack and some work is done in Project Hummingbird

Trino Contributor Call, 22 Aug 2024

📹 Video recording on YouTube with timestamps

Attendees

mosabua, electrum, colebow xkrogen, dain, bitsondatadev, lxynov, pettyjamesm, koszti

Topics, notes and action items

Trino Contributor Call, 25 Jul 2024

📹 Video recording on YouTube with timestamps

Attendees

mosabua, dprophet, kmurra xkrogen, dain, martint, wendigo, bitsondatadev, nineinchnick

Topics, notes and action items

  • Trino Summit call for speakers, sponsors, and registration - https://trino.io/blog/2024/07/11/trino-summit-2024-call-for-speakers
  • Packaging roadmap item in progress - https://github.com/trinodb/trino/issues/22597
  • Contributor matrix coming to the website - https://github.com/trinodb/trino/issues/22625
  • Effort to refresh UI - https://github.com/trinodb/trino/issues/22697
  • Upcoming switch to file system support from Trino and deactivating old Hive/Hadoop code
    • Will switch soon
    • Breaking change
    • Old setup will only be for HDFS
    • Azure, S3, GCS, .. will be on new fs
    • Different properties
  • Lateral column aliasing work
    • initially requested by kmurra
    • kmurra will not be able to dedicate too much resources
    • Changes SQL scoping rules, difficult to implement
    • Implementation varies between engines, not part of SQL standard
    • No decision on path forward made
    • kmurra will try to do some more research and write it up for martint
    • kmurra will connect martint to SQL standard body stakeholders
  • New scheduling driver for workers
    • Based on ideas from fair scheduling algorithm from Linux
    • Seems to have problems with some edge conditions
    • martint asking for issue reports and contact if anyone finds problems
    • goal is to replace old scheduling algorithm
    • Part of project hummingbird
  • Query rewriting and changes from dprophet and xkrogen
  • AI trained expert system for Trino - idea from bitsondatadev

Trino Contributor Call, 27 Jun 2024

📹 Video recording on YouTube

mosabua, jkylling, dprophet xkrogen, dain, martint, wendigo, sajjoseph, bitsondatadev, mattheusv

Topics, notes and action items

Trino Contributor Congregation, 14 June 2024

In person only event in Boston following Trino Fest. Contact Manfred Moser for invite and registration info. 9:00 - 14:00 EST

Attendees

mosabua, dain, colebow, jkylling, dprophet, findinpath, xkrogen, alprusty, marton-bod, osscm, wendigo, andythsu, vishalya, mattstep, georgewfisher, Ishan, Oleg Savin, James Petty, Chenren Shao, Dai Osaki, Ruhollah Farchtchi, Xuanyu Zhan, Y Ethan Guo, Lei Xu, Vijaj Chakilam and others.

Topics, notes and action items

Planning was to talk about the following items:

  • Trino project status and future, looking for contributors and maintainers, martint
  • Trino Helm chart status, mosabua
  • Trino K8s operator project, osscm/Manish and mosabua
  • Trino Gateway update with roadmap and planning discussion, mosabua
  • Desire in Trino Gateway to have unauthenticated REST endpoint on Trino with info about worker count and number of running queries mosabua and TGW team
  • Iceberg aggregation pushdown presented and discussed by osscm/Manish
  • Incremental refresh on materialized views in Iceberg and beyond osscm/Manish/Marton
  • Open Policy Agent standard policies for relational data domain, related tooling and more vagaerg, dprophet
  • HTTP/2 support development, testing, and collaboration dain
  • Trino connector progress update and discussion ideally mosabua

Recap notes by mosabua, who will also follow up with lots of action items in the form of github issues and follow up meetings and conversations

  • Reassurance of Trino as a project of individuals, the strong commitment to the Apache license and the clarification around the CLA was well received. Manfred will work on updates to the website to make this clear for a larger audience.
  • Forks of Trino
    • Everyone admitted to maintaining a fork
    • The call to get more changes upstream was well received.
    • There was a longer discussion around PR review, expectations of quality and more.
    • Work on rebasing is considerable for everyone and we offered to help with management justification around reducing the delta in the fork by contributing more and faster to Trino itself.
  • Contributor, reviewer, subproject maintainer, and maintainer
    • The call out to get more people to help by reviewing was received and we should follow up.
    • We agreed to try and establish a maintained list of experts for specific parts of the system to allow easier determination who to pull in for reviews.
  • Discussed subproject maintainer setup - well received
    • Explained the need to earn trust and build relationships as contributor and importantly also reviewer. The list of experts can help people participate and Dain suggested we create another slack channel for core dev work rather than random dev as found in the current #dev channel.
  • Releases and breaking changes
    • Correctness fixes also seen as breaking changes
    • Rebasing to newer releases is typically done one release at a time and when issues are found the next newer release is tried. This can result in chains of rebases until a good release is found.
    • We agreed to try and figure out how we can receive that information from other forks
    • We need to discuss how to then use that information and potentially update release notes or some other documentation so that users can determine the quality of a specific release. Of course this is problematic since what constitutes a breaking issue varies widely.
  • Native file system support is slowly being adopted. So far no issues but we called out to report issues and send fixes.
  • Java 22 adoption was questioned quite a bit. Some claim that they will have to hold back until the next Java LTS is used. The fact that this is transparent in the container and pointing that out seemed to help. We explain all the benefits and people reluctantly agreed that the fast pace of release and innovation is better than any alternative, so we just need to fine tune and adjust to feedback and issues.
  • Iceberg - There is clear interest in ensuring Apache Iceberg keeps innovating. More details to follow
  • Kubernetes - Numerous K8s operators exist, there is clear desire to have one operator in the Trino project, choice of implementation language is not so clear, Go is better since it is the native K8s language and has more support for operators, Java might fit better into the Trino community, Apple is interested in contributing their Go-based operator and also able to provide a subproject maintainer looking after the project Manfred should organize a follow up meeting with Apple, Stackable, Ubuntu, Starburst and others to sort out next steps
  • Packaging
    • Only one company seems to be using the RPM, and they use the RPM in a docker container. Removing RPM could motivate them to refactor to the actual docker container and add their customisations
    • Reproducible builds for Trino also allows anyone to build the exact RPM our build would produce
    • We might be fine with just removing the publishing and update docs initially
    • A second phase could be to move the RPM out into a separate repo (like the trino-packages repo approach from Manfred)
    • We discussed multiple packages with less plugins and documenting how to add more for tarball and container, at this stage this looks like unavoidable work since we are adding more connectors and some are large due to FFI usage and native libraries (Lancedb), core will also get bigger for the same reason and usage in aircompressor
  • Trino connector - Bloomberg and Comcast are collaborating already and will send PR soon, Saj was not in attendance so not much more discussion happened
  • HTTP/2 - wendigo provided update that we are close for internal communication, positive feedback from all, potentially need to consider opt in configuration at first to allow wide testing
  • Client protocol
    • Dain and Mateusz talked about thought on performance improvements,
    • Everything was well received as overdue ;-)
    • V2 protocol is also requested but no reason is really given, v1 improvements and other tooling might be sufficient
  • OPA
    • Bloomberg talked about their efforts and work with Dain and others.
    • They have dedicated engineers on it
    • Location of the code on trinodb would still help them progress faster so we should consider setting these repos up and making the subproject maintainers for the work around OPA standard format, rego scripts and tips to start the whole thing with Trino and OPA
  • Trino Gateway
    • Update of the progress was appreciated
    • Numerous attendants already use it or are planning to use it soon
    • Vision for the project overall is rather large and things are to be determined
    • Team got info from Dain on how to proceed around cluster status and authentication
    • Functionality and library sharing with Trino will have to be looked at closer and it potentially tricky and a lot of work, airlift adoption helps

Trino Contributor Call, 23 May 2024

📹 Video recording with time stamps on YouTube

Attendees

mosabua, electrum, dain, nineinchnick, brianwmunz, dprophet, findinpath, xkrogen, jkylling, alprusty, marton-bod, vgankidi, osscm, vagaerg, walterddr, lxynov, mgorsk1 Praveen Sadhu, Josh Yeh

Topics, notes and action items

Trino Contributor Call, 24 Apr 2024

Attendees

martint, mosabua, sajjoseph, wendigo, nineinchnick, brianwmunz, bitsondatadev, xkrogen, virajjasani, stoty, kmurra

Topics, notes and action items

📹 Video recording with time stamps on YouTube

  • Status of upgrade to required Java 22
    • Trino works with Java 22
    • Trino 444 uses Java 22 in docker container
    • Requirement for Java 22 in general to follow in 446, 447, or 448
    • Java 23 also used in testing
    • No issues reports so far
  • Discuss Phoenix connector usage
    • Security issue cause Trino project to contemplate removal
    • martint, mosabua and bitsondatadev discuss options about deprecating, reducing impact or removing connector
    • wendigo explain current and past approaches and issues, looking for help
    • stoty and virajjasani explain options and offer to help
    • see https://github.com/trinodb/trino/pull/20739 and others
    • mosabua to follow up on how to proceed with wendigo and martint
    • we are looking for people to help and people who use the connector
    • reasonable for connector to require phoenix 5.2.0 and explicitly declare and manage dependencies for newer hadoop, phoenix and hbase
  • Incremental refresh materialized views (#18673, #20959)
    • mosabua explains that work is ongoing but nothing is there to report
  • Lateral column alias support
    • kmurra explains proposal and discusses with martint
    • various complications such as shadowing aliases, usage in aggregation and window functions, and more
    • kmurra will file issue with research on how it works on other engines and more
    • kmurra to reach out to SQL spec workgroup
    • martint will help with input and eventually review of PR for planner and parser
  • Trino REST API improvement
    • sajjoseph explains nextURI data in HTTP header values for usage with blue/green deployment and more behind load balancer or Trino Gateway
    • he will file issue to discuss more and probably send a PR
    • same for some performance improvements

Trino Contributor Call, 21 Mar 2024

9am PST

Attendees

martint, electrum, dain, mosabua, sajjoseph, jkylling, amoghmargoor, marton-bod, vgankidi, osscm, wendigo, monimiller, oneonestar, alprusty, nineinchnick, Praveen2112, brianwmunz, xingyuanlin, yathi, manoj narayanan, kasun indrasiri and others

Topics, notes and action items

  • Request for speaker submissions for Trino Fest - call out by mosabua
  • Trino Contributor Congregation after Trino Fest in person in Boston - call out to contact mosabua about attedance
  • Discuss plans to move to Java 22, explained by mosabua, wendigo, martint, dain:
    • https://trino.io/blog/2024/03/13/java-22
    • https://github.com/trinodb/trino/issues/20980
    • Build and runtime already work with Java 22
    • One of next release will ship docker container with Java 22
    • A few releases later we will switch to Java 22 as requirement, looking for feedback and testers
    • going to include native code via aircompressor using foreign function support
    • Java 22 should reduce locking problems from gzip
    • JDBC driver and CLI will continue to stay with Java 8
    • various other code base segments will see new language and library feature adoption
    • V2 protocol might see adoption of new language and library features
    • preview features will only be adopted if we see significant benefits
    • we will also adopt Java 23, 24, 25 and so on soon after they are released
  • Update on file system lead and related work - update from electrum
    • removal of Hadoop/Hive library usage completed in new native file system support, massive project and refactor
    • new FileSystemAPI is well designed, simple, clean, tested, no unneeded complexity
    • electrum acts as lead
    • not all legacy features moved
    • looking for feedback from testing
    • docs will be updated more by mosabua
    • see https://trino.io/docs/current/object-storage.html
    • old hdfs code will only be for HDFS, other object storage usage will use new
    • currently planning to have all system disabled by default and require manual activation, looking for feedback on that behavior
    • s3 security mapping and other improvements in progress or based on input
  • OpenTelemetry
    • proven to be VERY useful
    • great TCB episode, https://trino.io/episodes/57
    • looking for practical experience and PRs to add in other important places
  • Merged separation of IR and AST - discussed by martint
    • separation of IR (internal representation) and AST (abstract syntax tree)
    • massive internal refactor
    • fixed many latent, dormant bugs
    • brings numerous performance improvements
    • great code simplification and improvement
    • separated as of 442 release, after series of very large PR,
    • more cleanup work still ongoing, opportunity for future changes on IR, brings performance improvement,
    • avoids lots of duplicate transforms,
    • also removes internal caching needs,
    • still a month or two of work ongoing, looking for feedback after that in terms of people working on separate forks and so on
  • Discuss OkHTTP related regression around redirect - mosabua, electrum and wendigo
    • https://github.com/trinodb/trino/issues/21026
    • Brought issue in Trino Gateway sync
    • Two separate fixes now available
    • electrum and mosabua to follow up and sync and help with review and decision making,
    • redirect with auth can be considered a security risk, might want to restore old behavior,
    • might also have to go through clients and update them if possible, needs to potentially be made explicit,
    • also chat about http client in jdbc driver and other clients,
    • jetty client is too heavy, jdbc driver should probably use jvm http client
  • Iceberg Aggregate Pushdown - mosabua and osscm will lead discussion to establish approach and then drive into more features and other connectors
  • Call for reviewers, contributors, and maintainers - from mosabua, contact for guidance
  • mosabua to plan next call in a few weeks
  • Next call will be recorded and available on youtube channel

Trino Contributor Call, 1 Feb 2024

9am PST

Topics

Attendees

martint, electrum, dain, mosabua, sajjoseph, rice668, alok, i-93, colebow, jkylling, amoghmargoor, marton-bod, vgankidi, osscm

Notes and action items

  • Update on Java 21 provided, all done, no negative feedback from community
  • Test improvements initiative and developer guide for testing is in progress and living document for further improvements
  • Trino Gateway completed release 5, lots more improvements and progress ongoing, dev sync every second Wednesday, release 6 coming soon
  • Trino Kubernetes operator - mosabua to start conversation to move forward similar to Trino Gateway subproject, looking for initial code contributions and then ongoing support and maintenance
  • Default time precision change to 6
    • PR https://github.com/trinodb/trino/pull/20290
    • amoghmargoor will link to github issue
    • mosabua to update to roadmap issue and add tasks with martint and others
    • feature will need a switch since this will be NOT be backwards compatible
    • lots of impact on connectors, functions, and so on
    • testing need will be significant
  • Client protocol
    • no active work going on according to dain
    • parallel transfer to clients probably best to use object storage filesystem as proxy, parallel write by trino, parallel read by client
    • mostly aimed at python and related workloads
    • streaming in parallel from memory is not restartable
    • lots of discussion about Arrow/ADBC, not suitable as protocol for Trino since has limited scope below what Trino offers in terms of data types and such, also tightly tied to Spark only, very limited interoperability, security issues
    • in the long run Trino might end up with arrow support for import/export or so, only for interoperability due to problems of Arrow,
    • Arrow not suitable for native use in Trino
  • Separate aspect is trino to trino cluster communication
    • jdbc just a current approach
    • potential to create separate protocol for trino to trino
    • communication in cluster is implementation details, arrow not suitable at all, no benefits
  • Parquet column encryption
    • in progress with amoghmargoor,
    • blocked by hadoop dependency,
    • trino filesystem no longer uses hadoop deps
    • amoghmargoor will reimplement in collab with electrum
    • mosabua followed up - details at https://github.com/trinodb/trino/pull/20069
  • Alluxio caching
    • PR for Delta Lake from jkylling is ready for merge
    • PR for docs in progress with mosabua
    • Iceberg and Hive support coming in quick follow up PRs
    • Rubix removal PR is ready as well
    • Potential for everything to land together in 439
    • mosabua to follow up on coordination
  • Iceberg agg pushdown
    • osscm working with findepi on multiple PRs
    • including work in Iceberg project, and ideas for future PRs
    • electrum to help
  • Secondary index and aggregate index, Iceberg
    • osscm working with Iceberg and Trino community
    • rice668 working on another(?) implementation
    • mosabua to connect and start thread with electrum and findepi
  • Rewrite partial top-n node to LimitNode or LastNNode
    • PR from rice668
    • Assumption of all files being sorted and having correct ordering might be problematic
    • martint to chime in with input and potential ideas, https://github.com/trinodb/trino/pull/18384
    • mosabua to help coordinate meeting / sync

Prior events

Prior events include various calls and the first Trino Contributor Congregation at Trino Summit 2022.