Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatible with Postgres system catalog #3560

Open
tisonkun opened this issue Mar 22, 2024 · 17 comments
Open

Compatible with Postgres system catalog #3560

tisonkun opened this issue Mar 22, 2024 · 17 comments

Comments

@tisonkun
Copy link
Collaborator

What type of enhancement is this?

User experience

What does the enhancement do?

Since we want to support users to access GreptimeDB with psql, we'd better be more compatible when possible.

Currently, when you run psql -l against GreptimeDB, it would fail with:

psql: ERROR:  Failed to plan SQL: Error during planning: Table not found: greptime.pg_catalog.pg_database

Implementation challenges

This should be similar to MySQL's information

@tisonkun
Copy link
Collaborator Author

Not quite easy. Seems related to some internal functions also:

Failed to execute query: SELECT d.datname as "Name",
       pg_catalog.pg_get_userbyid(d.datdba) as "Owner",
       pg_catalog.pg_encoding_to_char(d.encoding) as "Encoding",
       pg_catalog.array_to_string(d.datacl, '\n') AS "Access privileges"
FROM pg_catalog.pg_database d
ORDER BY 1; err=0: Table operation error, at src/frontend/src/instance.rs:272:14
1: Failed to plan statement, at /Users/tison/GreptimeWorkspace/greptimedb/src/operator/src/statement.rs:253:14
2: Failed to plan SQL, at src/query/src/planner.rs:90:14
...

@tisonkun
Copy link
Collaborator Author

And this is the query for "show tables" (\dt):

Failed to execute query: SELECT n.nspname as "Schema",
  c.relname as "Name",
  CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 't' THEN 'TOAST table' WHEN 'f' THEN 'foreign table' WHEN 'p' THEN 'partitioned table' WHEN 'I' THEN 'partitioned index' END as "Type",
  pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('r','p','')
      AND n.nspname <> 'pg_catalog'
      AND n.nspname !~ '^pg_toast'
      AND n.nspname <> 'information_schema'
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2; err=0: Table operation error, at src/frontend/src/instance.rs:272:14
1: Failed to plan statement, at /Users/tison/GreptimeWorkspace/greptimedb/src/operator/src/statement.rs:253:14
2: Failed to plan SQL, at src/query/src/planner.rs:90:14

@yihong0618
Copy link

I did some research about this before, it needs a lot work, and most of them had been done by risingwave, maybe you can use some code from them, and make less effort for this.

FYI: you can refer this issue.
some of them can easily add in greptime others maybe need to add in pgwire

risingwavelabs/risingwave#2954

@killme2008
Copy link
Contributor

Yes, we are doing a lot of work to be compatible with MySQL, pg is our next goal.

@yihong0618
Copy link

yihong0618 commented Mar 23, 2024

Yes, we are doing a lot of work to be compatible with MySQL, pg is our next goal.

One thing came to my mind that databend, risgingwave and greptime all using rust and all maybe have the need to implement protocols risingwave mainly for pg, and databend mainly for mysql, maybe you can extract something common and help each other, maybe its a triple-win thing.

@sundy-li
Copy link

Let's add pgwrite into databend!

@J0HN50N133
Copy link
Contributor

I would like to get involved in this work.😊

@tisonkun
Copy link
Collaborator Author

@J0HN50N133 Welcome! You can take a look at how we implement the "information_schema" tables as a reference.

But I'm also still unaware of how to add the necessary functions or work it around.

@J0HN50N133
Copy link
Contributor

J0HN50N133 commented May 5, 2024

@tisonkun
BELOW IS ALSO THE PROPOSAL SUMITTED TO OSPP
To support pg_catalog related features. The main challenge is to support the System Information Functions and Operators.

I will try to add a new schema called pg_catalog and related table for trivial cases like \d under src/catalog/pg_catalog and decide how to implement the system function according to what happen in DFLogicalPlanner. Related system functions may be implemented via udf.

Although SHOW TABLES and SHOW DATABASES can also work even under psql, pg users may still depend on pg_catalog
image

Goals

  1. support normal system catalog related query(trivial statements first)
  2. pg_catalog should be optional, users shouldn't pay for this if they don't need pg. Data should be able to build from information_schema.

System Catalogs implementation plan

In the current version, tables following are possible to support:

The first three tables will be implemented first since they are basis.

Implementation

pg_catalog management

  1. add a new schema called pg_catalog and related table for trivial cases like \d under src/catalog/pg_catalog. Create a new SchemaProvider called PgCatalogProvider and mount it under SystemCatalog. Then while finding table for query, catalog_manager will first check system_catalog.

    if let Some(table) = self.system_catalog.table(catalog, schema, table_name) {

  2. Each System table is implemented as a struct under pg_catalog/. To make them work as a Table, we just need to do some adaption between those implementation of pg_catalog.tables and Table.

  3. Since view is supported recently in feat: create view #3807 , view related catalog function should also be supported. The id of view is also implemented as TableId in feat: create view #3807. So we could reuse TableId as oid for view as we will do for table.

  4. There is currently no concept of Namespace in greptime; to workaround, it can be considered equivalent to a database, i.e. each database contains only one namespace. To provide a comparison, the data organization in greptime and postgres is as follows:
    Pg: Database -> Namespace(Schema) -> Table
    Gt: Catalog -> Database(Schema) -> Table (we can't create new Catalog currently)

pg_catalog related system function

Implement them with datafusion udf. Like:

SystemFunction::register(&function_registry);

@J0HN50N133
Copy link
Contributor

J0HN50N133 commented Jul 8, 2024

I'll manage to provide similar features in #2931.

Tasks

  • pg_database
  • pg_namespace
  • pg_class
  • pg_collation, not supported yet, just like the mysql ver. did
  • pg_trigger, we don't support it.
  • pg_type
  • pg_partitioned_table
  • pg_statistic

@J0HN50N133
Copy link
Contributor

J0HN50N133 commented Jul 22, 2024

While supporting \dt, i.e. the following sql statement, I met some problems:

SELECT n.nspname as "Schema",
  c.relname as "Name",
  CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 't' THEN 'TOAST table' WHEN 'f' THEN 'foreign table' WHEN 'p' THEN 'partitioned table' WHEN 'I' THEN 'partitioned index' END as "Type",
  pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('r','p','')
      AND n.nspname <> 'pg_catalog'
      AND n.nspname !~ '^pg_toast'
      AND n.nspname <> 'information_schema'
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
  1. n.oid = c.relnamespace, this predicate need a namespace.oid: u32 to check equality, which can be seen as schema's id in greptime. However, there is no such thing like schema_id in current SchemaManager. https://github.com/J0HN50N133/greptimedb/blob/906abd61de5e6da4e2ae5bba3f13fea567509166/src/common/meta/src/key/schema_name.rs#L54
    Should we add this field in SchemaManager? This may break backward compatibility.

  2. pg_get_userbyid: it seems that user authentication is not supported in the open source codebase. Should I just return empty string?

@tisonkun WDYT?

@tisonkun
Copy link
Collaborator Author

@J0HN50N133 Thanks for pushing forward this ticket!

pg_get_userbyid: it seems that user authentication is not supported in the open source codebase. Should I just return empty string?

Yeah. This is reasonable as a first implementation. cc @shuiyisong @zyy17 later we may consider how to integrate it with the plugin system to allow populate user info in this case.

n.oid = c.relnamespace

Adding ID for schema can be a breaking change and introduce overload burden to overcome in this issue.

cc @killme2008 @waynexia please take a look at this information schema related logics. A workaround can be using the schema name as "oid" in this place.

@J0HN50N133
Copy link
Contributor

J0HN50N133 commented Jul 23, 2024

@tisonkun

A workaround can be using the schema name as "oid" in this place.

The shortcomings is the potential incompatibility with existing pg eco.

@tisonkun
Copy link
Collaborator Author

The shortcomings is the potential incompatibility with existing pg eco.

Yeah. Some how the type is different, but comparing should work.

cc @killme2008 @waynexia what's the affect if we try to add id to each schema?

@J0HN50N133
Copy link
Contributor

J0HN50N133 commented Jul 24, 2024

Next I will try to support dbeaver connection to greptime, via pg protocol. The following sql is the statements fail.

SELECT typcategory FROM pg_catalog.pg_type WHERE 1<>1 LIMIT 1
SELECT t.oid,t.*,c.relkind,NULL as base_type_name, d.description
FROM pg_catalog.pg_type t
LEFT OUTER JOIN pg_catalog.pg_class c ON c.oid=t.typrelid
LEFT OUTER JOIN pg_catalog.pg_description d ON t.oid=d.objoid
WHERE t.typname IS NOT NULL
AND (c.relkind IS NULL OR c.relkind = 'c')

@sunng87
Copy link
Member

sunng87 commented Aug 14, 2024

@J0HN50N133 I just found that tables like pg_class, pg_namespace are hidden tables that available under all schemas, which means some client can run sql like:

SELECT c.relname FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relkind in ('r', 'p')

without calling pg_catalog.pg_class.

One idea is to resolve those table like pg_class, pg_namespace and etc. under any schema to pg_catalog.*

@J0HN50N133
Copy link
Contributor

J0HN50N133 commented Aug 15, 2024

@J0HN50N133 I just found that tables like pg_class, pg_namespace are hidden tables that available under all schemas, which means some client can run sql like:

SELECT c.relname FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE n.nspname = 'public' AND c.relkind in ('r', 'p')

without calling pg_catalog.pg_class.

One idea is to resolve those table like pg_class, pg_namespace and etc. under any schema to pg_catalog.*

@sunng87 Yeah, #4543 maybe also relate to this, for example, this hidden table should not appear in MySQL client. Maybe we modify DfTableSourceProvider to solve both problems.

J0HN50N133 added a commit to J0HN50N133/greptimedb that referenced this issue Aug 17, 2024
J0HN50N133 added a commit to J0HN50N133/greptimedb that referenced this issue Aug 25, 2024
github-merge-queue bot pushed a commit that referenced this issue Sep 9, 2024
* fix: table resolving logic related to pg_catalog

refer to
#3560 (comment)
and #4543

* refactor: remove CatalogProtocol type

* fix: sqlness

* fix: forbid create database pg_catalog with mysql client

* refactor: use QueryContext as arguments rather than Channel

* refactor: pass None as default behaviour in information_schema

* test: fix test
CookiePieWw pushed a commit to CookiePieWw/greptimedb that referenced this issue Sep 17, 2024
* fix: table resolving logic related to pg_catalog

refer to
GreptimeTeam#3560 (comment)
and GreptimeTeam#4543

* refactor: remove CatalogProtocol type

* fix: sqlness

* fix: forbid create database pg_catalog with mysql client

* refactor: use QueryContext as arguments rather than Channel

* refactor: pass None as default behaviour in information_schema

* test: fix test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants