Include ability to add table and column comments to Avro file schema #23638
michaeleengel1023
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
New here and maybe this is already possible but can't find it. My use case is to use Avro as a file format to send data to external customers since all the associated metadata is embedded into the file. Ideally I would create an Avro table in either the Hive or Iceberg connector, without partitioning, so it creates a single Avro file, (in theory). In the DDL for the table, I would add comments to the table and columns that include any valid text. I would like to be able to take that single Avro file and send to my customers with the comments in the 'doc' attribute of each relevant section. Hope this make sense. I'd like to see this on any of the connectors that use Avro as a file format.
--Sample DDL for user table.
CREATE TABLE users (
user_id INT COMMENT 'Unique identifier for the user',
first_name STRING COMMENT 'First name of the user',
last_name STRING COMMENT 'Last name of the user',
email STRING COMMENT '{"pii": 1, "description": "Email address of the user"}',
created_at TIMESTAMP COMMENT 'Timestamp indicating when the user was created'
)
COMMENT 'This table stores user information'
STORED AS AVRO
LOCATION '/path/to/your/avro/data';
-- Sample Avro Schema Example:
{
"namespace": "com.example.users",
"type": "record",
"name": "User",
"doc": "This table stores user information
"fields": [
{"name": "user_id", "type": "int", "doc": "Unique identifier for the user"},
{"name": "first_name", "type": "string", "doc": "First name of the user"},
{"name": "last_name", "type": "string", "doc": "Last name of the user"},
{"name": "email", "type": "string", "doc": "Email address of the user"},
{"name": "created_at", "type": "string", "doc": "Timestamp indicating when the user was created"}
]
}
Beta Was this translation helpful? Give feedback.
All reactions