Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improvement](jdbc catalog) Delete unnecessary schema and optimize insert logic #37244

Merged
merged 1 commit into from
Jul 4, 2024

Conversation

zy-kkk
Copy link
Member

@zy-kkk zy-kkk commented Jul 3, 2024

pick (#30880)

In the previous design, we were compatible with MySQL's auto-increment column and default value to bypass the null value check when writing back Jdbc External Table. However, because MySQL's default value is not completely unified with Doris, this resulted in The unsuitable default value is wrong. In response to this situation, I made the following optimizations

  1. For JDBC External Table, we always allow certain columns to be missing during insertion. Even if these columns are not allowed to be empty at the source end, the error should be generated by the source end, not Doris herself.
  2. When the target column is non-nullable and the insertion is done via INSERT INTO tbl VALUES() or INSERT INTO tbl SELECT constants, Doris should verify any inconsistency between them and throw an exception. This check is not applied for INSERT INTO tbl SELECT ... FROM tbl operations.

…sert logic (apache#30880)

In the previous design, we were compatible with MySQL's auto-increment column and default value to bypass the null value check when writing back Jdbc External Table. However, because MySQL's default value is not completely unified with Doris, this resulted in The unsuitable default value is wrong. In response to this situation, I made the following optimizations
1. For JDBC External Table, we always allow certain columns to be missing during insertion. Even if these columns are not allowed to be empty at the source end, the error should be generated by the source end, not Doris herself.
2. When the target column is non-nullable and the insertion is done via `INSERT INTO tbl VALUES()` or `INSERT INTO tbl SELECT constants`, Doris should verify any inconsistency between them and throw an exception. This check is not applied for `INSERT INTO tbl SELECT ... FROM tbl` operations.
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@github-actions github-actions bot added area/planner Issues or PRs related to the query planner kind/test labels Jul 3, 2024
@zy-kkk
Copy link
Member Author

zy-kkk commented Jul 3, 2024

run buildall

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 3, 2024
Copy link
Contributor

github-actions bot commented Jul 3, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Jul 3, 2024

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 50080 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7823eb9e01102eed2eb894e82f5edf3f01d9fa44, data reload: false

------ Round 1 ----------------------------------
q1	17686	4412	4347	4347
q2	2066	157	148	148
q3	10500	1884	1941	1884
q4	10383	1241	1321	1241
q5	8380	3939	3903	3903
q6	228	149	124	124
q7	1992	1589	1623	1589
q8	9517	2694	2702	2694
q9	14225	10592	10641	10592
q10	8636	3545	3534	3534
q11	408	240	253	240
q12	473	307	310	307
q13	18371	3983	4027	3983
q14	356	331	330	330
q15	501	456	453	453
q16	675	569	573	569
q17	1104	907	914	907
q18	7347	6879	6995	6879
q19	1761	1653	1541	1541
q20	521	332	308	308
q21	4341	4108	4075	4075
q22	518	443	432	432
Total cold run time: 119989 ms
Total hot run time: 50080 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4296	4270	4268	4268
q2	317	228	231	228
q3	4202	4154	4106	4106
q4	2751	2766	2736	2736
q5	7209	7175	7140	7140
q6	240	123	123	123
q7	3212	2801	2811	2801
q8	4293	4379	4434	4379
q9	17343	16968	17032	16968
q10	4237	4286	4223	4223
q11	751	678	694	678
q12	1024	838	857	838
q13	4375	3771	3730	3730
q14	447	415	428	415
q15	497	475	455	455
q16	733	684	667	667
q17	3771	3814	3848	3814
q18	8760	8689	8765	8689
q19	1718	1673	1624	1624
q20	2337	2130	2113	2113
q21	8553	8376	8362	8362
q22	1052	947	936	936
Total cold run time: 82118 ms
Total hot run time: 79293 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 203356 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7823eb9e01102eed2eb894e82f5edf3f01d9fa44, data reload: false

query1	939	431	379	379
query2	6533	2766	2698	2698
query3	6922	208	200	200
query4	20945	18031	17908	17908
query5	19738	6477	6519	6477
query6	298	221	237	221
query7	4164	309	309	309
query8	407	402	375	375
query9	3144	2699	2646	2646
query10	422	307	320	307
query11	11366	10788	10644	10644
query12	127	76	80	76
query13	5606	706	695	695
query14	17909	13159	13444	13159
query15	387	237	251	237
query16	6463	298	269	269
query17	1695	1430	884	884
query18	2311	426	419	419
query19	212	149	157	149
query20	76	78	82	78
query21	191	101	97	97
query22	5203	4993	4933	4933
query23	32438	32017	31850	31850
query24	6917	6506	6496	6496
query25	518	424	419	419
query26	526	162	169	162
query27	1883	305	313	305
query28	6158	2397	2338	2338
query29	2877	2722	2759	2722
query30	251	166	168	166
query31	940	797	734	734
query32	74	65	47	47
query33	416	270	259	259
query34	864	480	488	480
query35	1135	943	971	943
query36	1348	1182	1086	1086
query37	91	66	66	66
query38	3013	2944	2908	2908
query39	1365	1340	1343	1340
query40	206	95	95	95
query41	45	45	43	43
query42	83	84	84	84
query43	907	696	630	630
query44	1130	714	724	714
query45	242	238	235	235
query46	1232	965	986	965
query47	1827	1704	1732	1704
query48	1024	725	707	707
query49	626	375	368	368
query50	873	661	637	637
query51	4809	4663	4635	4635
query52	96	85	81	81
query53	470	333	332	332
query54	2667	2465	2472	2465
query55	87	86	84	84
query56	246	217	212	212
query57	1174	1119	1113	1113
query58	219	211	216	211
query59	4114	4031	3882	3882
query60	212	199	220	199
query61	101	99	98	98
query62	857	504	484	484
query63	490	354	352	352
query64	2604	1561	1529	1529
query65	3600	3595	3547	3547
query66	825	391	374	374
query67	15880	15339	15119	15119
query68	12749	658	656	656
query69	589	332	350	332
query70	2183	1482	1362	1362
query71	425	312	314	312
query72	6664	3500	3531	3500
query73	2245	329	327	327
query74	6305	5824	5853	5824
query75	5661	3676	3641	3641
query76	7063	1137	1213	1137
query77	1196	259	268	259
query78	12697	11791	12925	11791
query79	12420	671	661	661
query80	1300	408	409	408
query81	473	240	231	231
query82	562	101	97	97
query83	184	131	135	131
query84	256	73	71	71
query85	932	331	337	331
query86	353	332	267	267
query87	3241	3032	3028	3028
query88	4499	2315	2315	2315
query89	454	285	292	285
query90	2744	219	220	219
query91	174	137	142	137
query92	61	53	52	52
query93	6393	588	618	588
query94	1283	219	213	213
query95	1116	1067	1060	1060
query96	655	329	330	329
query97	6548	6366	6418	6366
query98	202	180	182	180
query99	2935	866	911	866
Total cold run time: 327080 ms
Total hot run time: 203356 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.96 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7823eb9e01102eed2eb894e82f5edf3f01d9fa44, data reload: false

query1	0.02	0.02	0.02
query2	0.06	0.03	0.02
query3	0.24	0.05	0.05
query4	1.77	0.06	0.08
query5	0.54	0.52	0.52
query6	1.23	0.61	0.62
query7	0.02	0.01	0.01
query8	0.04	0.03	0.02
query9	0.53	0.46	0.47
query10	0.54	0.54	0.53
query11	0.12	0.09	0.09
query12	0.11	0.09	0.09
query13	0.61	0.60	0.61
query14	0.78	0.78	0.79
query15	0.80	0.77	0.75
query16	0.36	0.37	0.39
query17	1.02	1.03	1.00
query18	0.22	0.26	0.23
query19	1.89	1.87	1.83
query20	0.02	0.01	0.02
query21	15.47	0.55	0.53
query22	2.20	2.15	1.80
query23	17.30	1.05	1.11
query24	6.10	1.03	0.91
query25	0.42	0.10	0.05
query26	0.59	0.15	0.14
query27	0.04	0.03	0.04
query28	7.13	0.71	0.72
query29	12.62	2.37	2.34
query30	0.57	0.51	0.51
query31	3.07	0.39	0.38
query32	3.42	0.50	0.50
query33	3.05	3.05	3.09
query34	15.25	4.85	4.80
query35	4.85	4.88	4.86
query36	1.06	1.01	1.02
query37	0.06	0.05	0.04
query38	0.04	0.02	0.02
query39	0.02	0.02	0.01
query40	0.15	0.14	0.15
query41	0.06	0.02	0.01
query42	0.02	0.02	0.01
query43	0.03	0.02	0.01
Total cold run time: 104.44 s
Total hot run time: 30.96 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 7823eb9e01102eed2eb894e82f5edf3f01d9fa44 with default session variables
Stream load json:         20 seconds loaded 2358488459 Bytes, about 112 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       21.3 seconds inserted 10000000 Rows, about 469K ops/s

@zy-kkk zy-kkk merged commit 4b51731 into apache:branch-2.0 Jul 4, 2024
25 of 27 checks passed
@zy-kkk zy-kkk deleted the jdbc_default_20 branch July 4, 2024 09:55
mongo360 pushed a commit to mongo360/doris that referenced this pull request Aug 16, 2024
…sert logic (apache#37244)

pick (apache#30880)

In the previous design, we were compatible with MySQL's auto-increment
column and default value to bypass the null value check when writing
back Jdbc External Table. However, because MySQL's default value is not
completely unified with Doris, this resulted in The unsuitable default
value is wrong. In response to this situation, I made the following
optimizations
1. For JDBC External Table, we always allow certain columns to be
missing during insertion. Even if these columns are not allowed to be
empty at the source end, the error should be generated by the source
end, not Doris herself.
2. When the target column is non-nullable and the insertion is done via
`INSERT INTO tbl VALUES()` or `INSERT INTO tbl SELECT constants`, Doris
should verify any inconsistency between them and throw an exception.
This check is not applied for `INSERT INTO tbl SELECT ... FROM tbl`
operations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. area/planner Issues or PRs related to the query planner kind/test reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants