Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pscale database dump accidentally generates data files for views #701

Open
orware opened this issue Jun 22, 2023 · 0 comments
Open

pscale database dump accidentally generates data files for views #701

orware opened this issue Jun 22, 2023 · 0 comments
Labels
invalid This doesn't seem right

Comments

@orware
Copy link
Contributor

orware commented Jun 22, 2023

I came across this issue while investigating another one over the past weekend and just wanted to simplify it and report it so it can be addressed down the road.

You can replicate the issue currently by generating a new database and then running the following queries on it to create a few sample tables and an associated view with some data:

CREATE TABLE IF NOT EXISTS `t1` (
  `id` bigint UNSIGNED NOT NULL AUTO_INCREMENT,
  UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO `t1` (`id`) VALUES (1);

CREATE TABLE IF NOT EXISTS `t2` (
  `id` bigint UNSIGNED NOT NULL AUTO_INCREMENT,
  UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO `t2` (`id`) VALUES (1);

CREATE VIEW `t1_t2`  AS SELECT `t1`.`id` AS `t1_id`, `t2`.`id` AS `t2_id` FROM (`t1` join `t2`);

After running the above queries, if you run a pscale database dump command against the dataset you'll end up with the following file listing:

<DATABASE_NAME>.t1.00001.sql
<DATABASE_NAME>.t1_t2.00001.sql
<DATABASE_NAME>.t1_t2-schema.sql
<DATABASE_NAME>.t1-schema.sql
<DATABASE_NAME>.t2.00001.sql
<DATABASE_NAME>.t2-schema.sql
metadata

In this list of files, the <DATABASE_NAME>.t1_t2-schema.sql file is ok since that allows the VIEW to be defined in the new database when you go to use restore-dump.

The problematic file(s) would begin with the <DATABASE_NAME>.t1_t2.00001.sql file.

Currently, the pscale database dump command appears to run a query against the VIEW and then begins to generate data files to include in the folder.

This can be particularly problematic the larger the underlying tables are that are associated with the view, as it would then need to generate many extra data files and make the whole process take longer, in addition to the final result being incorrect due to those extra files being present in the folder.

With those data files associated with the VIEW left in the folder, it would lead to an error like the following when the restore-dump process attempts to restore that "data":

target: <DATABASE_NAME>.-.primary: vttablet: rpc error: code = Unknown desc = Can not modify more than one base table through a join view '<DATABASE_NAME>.t1_t2' (errno 1393)

While putting together this issue and attempting a pscale database restore-dump test, I also encountered a different order of operations type issue, where the VIEW was trying to be created before the dependent tables were present in the database.

A fix for that issue was to add an extra prefix to the filename which seemed to do the trick for this simple situation, but may not work for all situations (e.g. I feel like it could fail for views that depend on other views without additional ordering/introspection being applied):

<DATABASE_NAME>.view.t1_t2-schema.sql

Without that minor adjustment to that filename I would get an error due to the process attempting to create the VIEW before the t2 table had been created due to how pscale database restore-dump seems to iterate through the files by default:

vttablet: rpc error: code = NotFound desc = Table '<DATABASE_NAME>.t2' doesn't exist (errno 1146)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

1 participant