Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] (inverted index) Disallow variant columns from using inverted index format v1 #43599

Merged
merged 3 commits into from
Nov 12, 2024

Conversation

csun5285
Copy link
Contributor

@csun5285 csun5285 commented Nov 11, 2024

What problem does this PR solve?

Problem Summary:

  1. When the inverted index of a variant column uses storage format v1, schema changes can cause some segments to lack corresponding index files.

  2. By using storage format v2 for inverted indexes, all indexes correspond to a single file, and the corresponding files will always exist regardless of whether the variant includes subcolumn indexes.

Release note

When creating an inverted index for a variant column, file format v1 is not supported

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@csun5285
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 51945 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3a3ff8ca38eb571115f0a390a55e71361dc196ff, data reload: false

------ Round 1 ----------------------------------
q1	17597	7507	7348	7348
q2	2225	1175	1167	1167
q3	10042	1203	1200	1200
q4	10350	863	794	794
q5	7587	3750	2985	2985
q6	237	150	148	148
q7	1042	616	608	608
q8	9383	2373	2382	2373
q9	12688	12329	12586	12329
q10	7116	2429	2463	2429
q11	464	254	256	254
q12	418	225	220	220
q13	17798	3036	3048	3036
q14	251	208	215	208
q15	582	536	522	522
q16	655	605	617	605
q17	994	532	586	532
q18	7421	6840	6754	6754
q19	1336	932	1101	932
q20	3063	2894	2838	2838
q21	4048	3320	3329	3320
q22	1392	1361	1343	1343
Total cold run time: 116689 ms
Total hot run time: 51945 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7360	7313	7337	7313
q2	334	237	238	237
q3	2948	2812	2828	2812
q4	2035	1797	1736	1736
q5	5448	5456	5502	5456
q6	221	140	141	140
q7	2140	1768	1740	1740
q8	3343	3438	3479	3438
q9	14138	14223	14109	14109
q10	3551	3429	3482	3429
q11	592	503	499	499
q12	797	587	601	587
q13	10212	3055	3053	3053
q14	291	270	268	268
q15	584	532	540	532
q16	671	632	627	627
q17	1840	1594	1580	1580
q18	7886	7565	7513	7513
q19	1680	1590	1647	1590
q20	2189	1985	1984	1984
q21	5260	5176	5269	5176
q22	653	554	560	554
Total cold run time: 74173 ms
Total hot run time: 64373 ms

@csun5285
Copy link
Contributor Author

run buildall

@csun5285 csun5285 force-pushed the disable_index_variant branch from 5cccd9c to d7bd773 Compare November 12, 2024 08:18
@csun5285
Copy link
Contributor Author

run buildall

Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 12, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@airborne12 airborne12 merged commit 127253a into apache:master Nov 12, 2024
26 of 28 checks passed
csun5285 added a commit to csun5285/doris that referenced this pull request Nov 13, 2024
…ndex format v1 (apache#43599)

Problem Summary:
1. When the inverted index of a variant column uses storage format v1,
schema changes can cause some segments to lack corresponding index
files.

2. By using storage format v2 for inverted indexes, all indexes
correspond to a single file, and the corresponding files will always
exist regardless of whether the variant includes subcolumn indexes.

When creating an inverted index for a variant column, file format v1 is
not supported
py023 pushed a commit to py023/doris that referenced this pull request Nov 13, 2024
…ndex format v1 (apache#43599)

### What problem does this PR solve?

Problem Summary:
1. When the inverted index of a variant column uses storage format v1,
schema changes can cause some segments to lack corresponding index
files.

2. By using storage format v2 for inverted indexes, all indexes
correspond to a single file, and the corresponding files will always
exist regardless of whether the variant includes subcolumn indexes.

### Release note

When creating an inverted index for a variant column, file format v1 is
not supported
airborne12 pushed a commit that referenced this pull request Nov 13, 2024
csun5285 added a commit to csun5285/doris that referenced this pull request Dec 9, 2024
…ndex format v1 (apache#43599)

Problem Summary:
1. When the inverted index of a variant column uses storage format v1,
schema changes can cause some segments to lack corresponding index
files.

2. By using storage format v2 for inverted indexes, all indexes
correspond to a single file, and the corresponding files will always
exist regardless of whether the variant includes subcolumn indexes.

When creating an inverted index for a variant column, file format v1 is
not supported
csun5285 added a commit to csun5285/doris that referenced this pull request Dec 9, 2024
…ndex format v1 (apache#43599)

Problem Summary:
1. When the inverted index of a variant column uses storage format v1,
schema changes can cause some segments to lack corresponding index
files.

2. By using storage format v2 for inverted indexes, all indexes
correspond to a single file, and the corresponding files will always
exist regardless of whether the variant includes subcolumn indexes.

When creating an inverted index for a variant column, file format v1 is
not supported
csun5285 added a commit to csun5285/doris that referenced this pull request Dec 9, 2024
…ndex format v1 (apache#43599)

Problem Summary:
1. When the inverted index of a variant column uses storage format v1,
schema changes can cause some segments to lack corresponding index
files.

2. By using storage format v2 for inverted indexes, all indexes
correspond to a single file, and the corresponding files will always
exist regardless of whether the variant includes subcolumn indexes.

When creating an inverted index for a variant column, file format v1 is
not supported
airborne12 pushed a commit that referenced this pull request Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants