Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](serde)fix string deserialize with unescaped char #37251

Conversation

amorynan
Copy link
Contributor

@amorynan amorynan commented Jul 3, 2024

before this pr : if we use streamload with unescaped char in json format , we will not deal with it in nested type.
like this :
image
after we will deal with it in output:

|   27 | "双引号"    | [""双引号"", "反斜\线"]       |

Issue Number: close #xxx

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Copy link
Contributor

github-actions bot commented Jul 3, 2024

clang-tidy review says "All clean, LGTM! 👍"

@amorynan
Copy link
Contributor Author

amorynan commented Jul 3, 2024

run buildall

Copy link
Contributor

github-actions bot commented Jul 3, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 39783 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 21cad0711756cbe1979fb1ccf693d92222ce054f, data reload: false

------ Round 1 ----------------------------------
q1	17636	4317	4283	4283
q2	2018	194	190	190
q3	10449	1175	1078	1078
q4	10182	761	779	761
q5	7560	2686	2634	2634
q6	220	138	137	137
q7	964	611	594	594
q8	9234	2037	2068	2037
q9	8765	6482	6443	6443
q10	8992	3727	3714	3714
q11	467	231	244	231
q12	437	241	225	225
q13	17777	3016	3008	3008
q14	271	227	240	227
q15	519	478	483	478
q16	509	372	367	367
q17	950	709	685	685
q18	7929	7407	7446	7407
q19	5937	1406	1475	1406
q20	640	310	322	310
q21	4922	3238	3800	3238
q22	387	343	330	330
Total cold run time: 116765 ms
Total hot run time: 39783 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4363	4228	4250	4228
q2	358	254	259	254
q3	3000	2798	2822	2798
q4	2040	1682	1669	1669
q5	5686	5464	5448	5448
q6	226	142	135	135
q7	2136	1819	1871	1819
q8	3270	3405	3379	3379
q9	8684	8617	8810	8617
q10	4134	3941	3795	3795
q11	589	494	481	481
q12	764	641	649	641
q13	16246	3159	3175	3159
q14	313	276	271	271
q15	511	517	499	499
q16	487	431	417	417
q17	1805	1507	1493	1493
q18	8024	8002	7901	7901
q19	5565	1642	1503	1503
q20	2208	1911	1827	1827
q21	5228	5012	4833	4833
q22	585	565	532	532
Total cold run time: 76222 ms
Total hot run time: 55699 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173297 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 21cad0711756cbe1979fb1ccf693d92222ce054f, data reload: false

query1	924	393	387	387
query2	6462	2365	2248	2248
query3	6633	214	218	214
query4	19432	17632	17463	17463
query5	3660	490	502	490
query6	254	174	162	162
query7	4597	289	291	289
query8	290	276	295	276
query9	8488	2457	2444	2444
query10	584	334	275	275
query11	10733	10036	10117	10036
query12	120	85	82	82
query13	1643	386	371	371
query14	10514	7793	7680	7680
query15	240	196	192	192
query16	7719	307	308	307
query17	1825	531	524	524
query18	1897	268	266	266
query19	189	145	153	145
query20	86	80	82	80
query21	217	135	126	126
query22	4412	4076	3919	3919
query23	33832	33627	33672	33627
query24	10629	2843	2902	2843
query25	599	401	392	392
query26	712	157	156	156
query27	2302	330	326	326
query28	5881	2195	2192	2192
query29	890	632	630	630
query30	243	155	154	154
query31	985	797	755	755
query32	95	54	54	54
query33	652	296	301	296
query34	880	474	481	474
query35	741	710	661	661
query36	1124	997	993	993
query37	141	79	84	79
query38	2948	2892	2813	2813
query39	911	854	841	841
query40	213	129	131	129
query41	55	52	55	52
query42	115	110	106	106
query43	616	539	551	539
query44	1101	752	752	752
query45	235	164	163	163
query46	1060	708	746	708
query47	1832	1772	1764	1764
query48	379	295	301	295
query49	839	407	412	407
query50	765	379	384	379
query51	6865	6920	6649	6649
query52	106	95	88	88
query53	363	290	295	290
query54	869	462	452	452
query55	75	72	72	72
query56	295	274	270	270
query57	1118	1032	1034	1032
query58	245	255	274	255
query59	3424	3302	2976	2976
query60	296	288	292	288
query61	96	129	92	92
query62	601	445	443	443
query63	329	294	291	291
query64	8552	2272	1761	1761
query65	3132	3087	3119	3087
query66	752	324	323	323
query67	15388	15004	14739	14739
query68	4512	539	531	531
query69	620	398	337	337
query70	1162	1106	1077	1077
query71	380	281	274	274
query72	7071	5261	5232	5232
query73	753	326	322	322
query74	5881	5629	5440	5440
query75	3399	2631	2690	2631
query76	2229	984	931	931
query77	440	313	308	308
query78	9436	9027	8858	8858
query79	2460	511	521	511
query80	2386	467	471	467
query81	585	216	222	216
query82	911	109	106	106
query83	303	169	174	169
query84	274	92	94	92
query85	1946	282	279	279
query86	483	319	322	319
query87	3288	3103	3084	3084
query88	3853	2373	2386	2373
query89	495	385	453	385
query90	1728	186	188	186
query91	133	102	102	102
query92	60	51	47	47
query93	2891	519	513	513
query94	1143	210	216	210
query95	411	333	323	323
query96	600	270	266	266
query97	3179	3008	3021	3008
query98	236	198	196	196
query99	1096	881	840	840
Total cold run time: 268544 ms
Total hot run time: 173297 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.63 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 21cad0711756cbe1979fb1ccf693d92222ce054f, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.03	0.04
query3	0.22	0.05	0.05
query4	1.67	0.07	0.07
query5	0.50	0.48	0.48
query6	1.14	0.72	0.72
query7	0.02	0.02	0.02
query8	0.05	0.04	0.05
query9	0.55	0.49	0.48
query10	0.54	0.54	0.53
query11	0.15	0.11	0.12
query12	0.15	0.12	0.13
query13	0.61	0.59	0.59
query14	0.76	0.78	0.78
query15	0.83	0.84	0.84
query16	0.38	0.38	0.38
query17	1.05	1.06	1.06
query18	0.22	0.26	0.25
query19	1.96	1.87	1.86
query20	0.01	0.02	0.01
query21	15.43	0.77	0.66
query22	4.68	5.90	2.69
query23	18.31	1.34	1.28
query24	2.11	0.25	0.23
query25	0.17	0.09	0.08
query26	0.27	0.18	0.18
query27	0.08	0.08	0.07
query28	13.20	1.02	1.00
query29	12.65	3.23	3.23
query30	0.25	0.07	0.06
query31	2.85	0.38	0.40
query32	3.28	0.49	0.47
query33	2.89	2.92	2.85
query34	17.15	4.42	4.42
query35	4.47	4.49	4.52
query36	0.66	0.50	0.50
query37	0.20	0.15	0.15
query38	0.15	0.14	0.15
query39	0.04	0.03	0.04
query40	0.17	0.14	0.14
query41	0.09	0.05	0.05
query42	0.05	0.05	0.04
query43	0.05	0.04	0.04
Total cold run time: 110.13 s
Total hot run time: 31.63 s

@amorynan
Copy link
Contributor Author

amorynan commented Jul 4, 2024

run p0

Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

github-actions bot commented Jul 4, 2024

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Jul 4, 2024
Copy link
Contributor

github-actions bot commented Jul 4, 2024

PR approved by anyone and no changes requested.

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HappenLee HappenLee merged commit 308dd2b into apache:master Jul 8, 2024
25 of 29 checks passed
dataroaring pushed a commit that referenced this pull request Jul 17, 2024
before this pr : if we use streamload with unescaped char in json format
, we will not deal with it in nested type.
like this :
```
|   27 | "双引号"    | [""双引号"", "反斜\线"]       |
```
yiguolei pushed a commit that referenced this pull request Aug 1, 2024
## Proposed changes
backport: #37251
Issue Number: close #xxx

<!--Describe your changes.-->
xiaokang pushed a commit that referenced this pull request Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.14-merged dev/2.1.6-merged dev/3.0.1-merged kind/behavior-changed reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants