Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](group commit)Fix replay wal fail problem on agg state type #49081

Merged
merged 1 commit into from
Mar 17, 2025

Conversation

hust-hhb
Copy link
Contributor

When replay wal on test regression-test/suites/mv_p0/dis_26495/dis_26495.groovy, be will core, this pr fix it.
*** Query id: 764ad7ba18485829-63a6eb98dcea4981 ***
*** is nereids: 1 ***
*** tablet id: 0 ***
*** Aborted at 1740808023 (unix time) try "date -d @1740808023" if you are using GNU date ***
*** Current BE git commitID: a386e8b ***
*** SIGSEGV address not mapped to object (@0xD9) received by PID 8864 (TID 23505 OR 0x7f6aa893c640) from PID 217; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/common/signal_handler.h:421
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
3# 0x00007F76D2AD4520 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::vectorized::ColumnNullable::insert_range_from_not_nullable(doris::vectorized::IColumn const&, unsigned long, unsigned long) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/columns/column_nullable.cpp:340
5# doris::vectorized::VScanner::_do_projections(doris::vectorized::Block*, doris::vectorized::Block*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/exec/scan/vscanner.cpp:204
6# doris::vectorized::VScanner::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/exec/scan/vscanner.cpp:84
7# doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr, std::shared_ptr) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:221
8# std::_Function_handler, std::shared_ptr)::$_1::operator()() const::{lambda()#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
9# doris::ThreadPool::dispatch_thread() in /mnt/hdd01/ci/doris-deploy-branch-3.0-local/be/lib/doris_be
10# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/util/thread.cpp:499
11# start_thread at ./nptl/pthread_create.c:442
12# 0x00007F76D2BB8850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
172.20.48.99 last coredump sql:

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Mar 14, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hust-hhb
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32186 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1f39415b1ff9dbdd1904107bc4970d7a9b0d186f, data reload: false

------ Round 1 ----------------------------------
q1	17604	5244	5026	5026
q2	2042	298	162	162
q3	10415	1356	681	681
q4	10207	1038	543	543
q5	7545	2353	2349	2349
q6	188	166	133	133
q7	894	731	587	587
q8	9320	1240	1028	1028
q9	4883	4885	4835	4835
q10	6804	2300	1896	1896
q11	477	272	252	252
q12	344	353	209	209
q13	17745	3643	3045	3045
q14	229	224	207	207
q15	545	489	498	489
q16	629	610	577	577
q17	576	854	333	333
q18	6797	6529	6257	6257
q19	1091	948	540	540
q20	315	320	191	191
q21	2777	2110	1874	1874
q22	999	977	972	972
Total cold run time: 102426 ms
Total hot run time: 32186 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5056	5075	5062	5062
q2	241	326	232	232
q3	2163	2634	2287	2287
q4	1455	1792	1355	1355
q5	4229	4113	4165	4113
q6	207	166	122	122
q7	1865	1808	1804	1804
q8	2587	2630	2591	2591
q9	7407	7216	7129	7129
q10	2983	3163	2723	2723
q11	564	503	492	492
q12	698	800	605	605
q13	3575	3829	3247	3247
q14	276	291	265	265
q15	527	501	490	490
q16	645	680	634	634
q17	1158	1583	1352	1352
q18	7857	7634	7362	7362
q19	822	852	900	852
q20	1973	2067	1852	1852
q21	5401	4842	4994	4842
q22	1115	1073	1000	1000
Total cold run time: 52804 ms
Total hot run time: 50411 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184408 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1f39415b1ff9dbdd1904107bc4970d7a9b0d186f, data reload: false

query1	1014	389	375	375
query2	6534	1932	1838	1838
query3	6797	224	220	220
query4	26445	23307	22901	22901
query5	4327	630	492	492
query6	351	229	212	212
query7	4614	497	284	284
query8	295	242	239	239
query9	8613	2622	2613	2613
query10	474	313	259	259
query11	15856	15110	15089	15089
query12	182	112	113	112
query13	1646	537	408	408
query14	10348	6528	6617	6528
query15	208	199	173	173
query16	7693	626	462	462
query17	1193	720	555	555
query18	2009	413	313	313
query19	192	184	160	160
query20	121	119	122	119
query21	208	127	105	105
query22	4284	4430	4002	4002
query23	33868	32881	32963	32881
query24	7739	2348	2387	2348
query25	496	448	382	382
query26	1217	270	151	151
query27	2117	484	325	325
query28	3920	2441	2400	2400
query29	689	561	412	412
query30	277	216	194	194
query31	965	864	764	764
query32	73	65	61	61
query33	548	347	302	302
query34	780	838	486	486
query35	780	810	741	741
query36	952	1019	896	896
query37	120	96	74	74
query38	4224	4191	4092	4092
query39	1427	1388	1446	1388
query40	203	136	104	104
query41	56	53	52	52
query42	117	101	101	101
query43	483	492	479	479
query44	1260	779	765	765
query45	171	170	165	165
query46	818	1024	608	608
query47	1744	1798	1724	1724
query48	375	410	308	308
query49	774	518	408	408
query50	687	738	391	391
query51	4240	4177	4140	4140
query52	103	106	99	99
query53	225	253	187	187
query54	499	493	415	415
query55	82	81	80	80
query56	270	262	257	257
query57	1136	1113	1051	1051
query58	242	237	226	226
query59	2700	2640	2484	2484
query60	286	273	245	245
query61	122	125	119	119
query62	782	738	655	655
query63	232	186	186	186
query64	4196	1039	645	645
query65	4399	4300	4357	4300
query66	1049	398	295	295
query67	15660	15432	15175	15175
query68	8055	872	501	501
query69	475	306	267	267
query70	1177	1113	1113	1113
query71	448	301	257	257
query72	5671	3556	3688	3556
query73	783	716	340	340
query74	9035	9061	8704	8704
query75	3785	3198	2698	2698
query76	3701	1173	748	748
query77	789	365	275	275
query78	10145	10118	9318	9318
query79	2242	814	593	593
query80	625	532	442	442
query81	487	263	229	229
query82	680	128	97	97
query83	170	166	149	149
query84	322	98	72	72
query85	798	349	371	349
query86	385	310	299	299
query87	4425	4524	4289	4289
query88	3480	2277	2251	2251
query89	395	307	284	284
query90	1959	209	212	209
query91	138	138	110	110
query92	78	65	56	56
query93	1548	1068	565	565
query94	656	404	341	341
query95	351	266	263	263
query96	478	568	274	274
query97	3321	3368	3288	3288
query98	229	207	201	201
query99	1336	1394	1250	1250
Total cold run time: 274674 ms
Total hot run time: 184408 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.1 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1f39415b1ff9dbdd1904107bc4970d7a9b0d186f, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.03
query3	0.24	0.06	0.06
query4	1.63	0.10	0.10
query5	0.55	0.58	0.54
query6	1.21	0.73	0.72
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.59	0.52	0.53
query10	0.57	0.61	0.58
query11	0.15	0.11	0.10
query12	0.15	0.12	0.11
query13	0.62	0.61	0.59
query14	2.82	2.81	2.81
query15	0.92	0.84	0.85
query16	0.40	0.37	0.37
query17	1.01	1.04	1.01
query18	0.22	0.20	0.19
query19	1.86	1.83	2.03
query20	0.01	0.01	0.01
query21	15.38	0.88	0.55
query22	0.74	1.05	0.88
query23	14.85	1.38	0.66
query24	7.76	1.15	0.78
query25	0.48	0.24	0.18
query26	0.76	0.15	0.14
query27	0.05	0.05	0.05
query28	9.31	0.89	0.43
query29	12.52	4.04	3.30
query30	0.26	0.09	0.07
query31	2.82	0.57	0.38
query32	3.23	0.55	0.46
query33	3.07	3.00	3.00
query34	15.70	5.13	4.48
query35	4.50	4.55	4.46
query36	0.68	0.49	0.48
query37	0.09	0.06	0.07
query38	0.05	0.04	0.03
query39	0.03	0.02	0.02
query40	0.18	0.14	0.13
query41	0.09	0.03	0.02
query42	0.03	0.02	0.02
query43	0.03	0.04	0.03
Total cold run time: 105.73 s
Total hot run time: 31.1 s

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 17, 2025
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit e1262bd into apache:master Mar 17, 2025
30 checks passed
github-actions bot pushed a commit that referenced this pull request Mar 17, 2025
)

When replay wal on test
regression-test/suites/mv_p0/dis_26495/dis_26495.groovy, be will core,
this pr fix it.
*** Query id: 764ad7ba18485829-63a6eb98dcea4981 ***
*** is nereids: 1 ***
*** tablet id: 0 ***
*** Aborted at 1740808023 (unix time) try "date -d @1740808023" if you
are using GNU date ***
*** Current BE git commitID: a386e8b ***
*** SIGSEGV address not mapped to object (@0xD9) received by PID 8864
(TID 23505 OR 0x7f6aa893c640) from PID 217; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int,
siginfo_t*, void*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/common/signal_handler.h:421
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0]
in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in
/usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F76D2AD4520 in /lib/x86_64-linux-gnu/libc.so.6
4#
doris::vectorized::ColumnNullable::insert_range_from_not_nullable(doris::vectorized::IColumn
const&, unsigned long, unsigned long) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/columns/column_nullable.cpp:340
5#
doris::vectorized::VScanner::_do_projections(doris::vectorized::Block*,
doris::vectorized::Block*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/exec/scan/vscanner.cpp:204
6#
doris::vectorized::VScanner::get_block_after_projects(doris::RuntimeState*,
doris::vectorized::Block*, bool*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/exec/scan/vscanner.cpp:84
7# doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr,
std::shared_ptr) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:221
8# std::_Function_handler, std::shared_ptr)::$_1::operator()()
const::{lambda()#1}>::_M_invoke(std::_Any_data const&) at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
9# doris::ThreadPool::dispatch_thread() in
/mnt/hdd01/ci/doris-deploy-branch-3.0-local/be/lib/doris_be
10# doris::Thread::supervise_thread(void*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/util/thread.cpp:499
11# start_thread at ./nptl/pthread_create.c:442
12# 0x00007F76D2BB8850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
172.20.48.99 last coredump sql: 
### What problem does this PR solve?
github-actions bot pushed a commit that referenced this pull request Mar 17, 2025
)

When replay wal on test
regression-test/suites/mv_p0/dis_26495/dis_26495.groovy, be will core,
this pr fix it.
*** Query id: 764ad7ba18485829-63a6eb98dcea4981 ***
*** is nereids: 1 ***
*** tablet id: 0 ***
*** Aborted at 1740808023 (unix time) try "date -d @1740808023" if you
are using GNU date ***
*** Current BE git commitID: a386e8b ***
*** SIGSEGV address not mapped to object (@0xD9) received by PID 8864
(TID 23505 OR 0x7f6aa893c640) from PID 217; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int,
siginfo_t*, void*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/common/signal_handler.h:421
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0]
in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in
/usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F76D2AD4520 in /lib/x86_64-linux-gnu/libc.so.6
4#
doris::vectorized::ColumnNullable::insert_range_from_not_nullable(doris::vectorized::IColumn
const&, unsigned long, unsigned long) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/columns/column_nullable.cpp:340
5#
doris::vectorized::VScanner::_do_projections(doris::vectorized::Block*,
doris::vectorized::Block*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/exec/scan/vscanner.cpp:204
6#
doris::vectorized::VScanner::get_block_after_projects(doris::RuntimeState*,
doris::vectorized::Block*, bool*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/exec/scan/vscanner.cpp:84
7# doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr,
std::shared_ptr) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:221
8# std::_Function_handler, std::shared_ptr)::$_1::operator()()
const::{lambda()#1}>::_M_invoke(std::_Any_data const&) at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
9# doris::ThreadPool::dispatch_thread() in
/mnt/hdd01/ci/doris-deploy-branch-3.0-local/be/lib/doris_be
10# doris::Thread::supervise_thread(void*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/util/thread.cpp:499
11# start_thread at ./nptl/pthread_create.c:442
12# 0x00007F76D2BB8850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
172.20.48.99 last coredump sql: 
### What problem does this PR solve?
dataroaring pushed a commit that referenced this pull request Mar 17, 2025
yiguolei pushed a commit that referenced this pull request Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.9-merged dev/3.0.5-merged p0_c reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants