Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(rust): Add updated multiscan pipeline #21925

Merged
merged 13 commits into from
Mar 28, 2025

Conversation

nameexhaustion
Copy link
Collaborator

@nameexhaustion nameexhaustion commented Mar 26, 2025

PR contains an updated multiscan pipeline implementation. This is not hooked up yet - it is currently missing the IR lowering and refactoring of the individual readers to use the updated pipeline.

Some improvements include:

  • Replacing monomorphized <T: MultiScanable> with dyn FileReader
  • Generalized operation pushdown into the readers based on ReaderCapabilities (slice/predicate)
  • Faster early stopping for some file formats when scanning with positive slice via row_position_on_end
  • Faster negative slice resolution using concurrent row counting
  • Parallelized post-apply pipeline (for missing columns, row index, column casting, reordering etc.)

The effects of these optimizations differ across file types - I will have more concrete benchmark numbers for them per file type in follow-up PRs.

@coastalwhite

@github-actions github-actions bot added internal An internal refactor or improvement rust Related to Rust Polars labels Mar 26, 2025
@@ -75,8 +77,10 @@ impl ApplyExtraOps {
scan_source_idx,
hive_parts,
} => {
// This should always be pushed to the reader, or otherwise handled separately.
assert!(pre_slice.is_none());
// Negative slice should have been resolved earlier.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added support for slice in post-apply, as external readers may not support ReaderCapabilities::PRE_SLICE.

Copy link

codecov bot commented Mar 26, 2025

Codecov Report

Attention: Patch coverage is 0.26316% with 1137 lines in your changes missing coverage. Please review.

Project coverage is 80.46%. Comparing base (82d57a4) to head (a0f7309).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
...rces/multi_file_reader/reader_pipelines/generic.rs 0.00% 543 Missing ⚠️
...ream/src/nodes/io_sources/multi_file_reader/mod.rs 0.00% 159 Missing ⚠️
..._sources/multi_file_reader/initialization/slice.rs 0.00% 141 Missing ⚠️
...urces/multi_file_reader/post_apply_pipeline/mod.rs 0.00% 138 Missing ⚠️
...io_sources/multi_file_reader/initialization/mod.rs 0.00% 80 Missing ⚠️
..._sources/multi_file_reader/reader_interface/mod.rs 0.00% 25 Missing ⚠️
...rces/multi_file_reader/initialization/predicate.rs 0.00% 18 Missing ⚠️
...es/io_sources/multi_file_reader/extra_ops/apply.rs 0.00% 12 Missing ⚠️
crates/polars-utils/src/slice_enum.rs 0.00% 12 Missing ⚠️
crates/polars-plan/src/dsl/scan_sources.rs 0.00% 5 Missing ⚠️
... and 2 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #21925      +/-   ##
==========================================
- Coverage   80.83%   80.46%   -0.37%     
==========================================
  Files        1629     1635       +6     
  Lines      235097   236228    +1131     
  Branches     2693     2693              
==========================================
+ Hits       190039   190080      +41     
- Misses      44424    45515    +1091     
+ Partials      634      633       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nameexhaustion nameexhaustion marked this pull request as ready for review March 26, 2025 05:52
@nameexhaustion nameexhaustion marked this pull request as draft March 26, 2025 17:47
@nameexhaustion nameexhaustion marked this pull request as ready for review March 26, 2025 20:15
@nameexhaustion nameexhaustion marked this pull request as draft March 27, 2025 00:26
@nameexhaustion nameexhaustion marked this pull request as ready for review March 27, 2025 05:50
Copy link
Collaborator

@coastalwhite coastalwhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good. Bit difficult to say much more when it is not connected yet.

@ritchie46 ritchie46 merged commit c538836 into pola-rs:main Mar 28, 2025
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants