Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: Reuse the same file pointer for repeated find output #511

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

tonakai-s
Copy link

@tonakai-s tonakai-s commented Mar 24, 2025

Fixes #439
This PR creates a kind of memoization to the file creation inside the findutils/find command.
Uses a hashmap to know which file paths already has been created and returns the Arc to prevent overwrite.

It also add a new testing function find_using_same_out_multiple_times to these more "complex" cases.
But, the expected outputs are fixed, if the assert would be better by catching the expected value dynamically in case of the test_data/simple structure change, let me know!

@tonakai-s tonakai-s changed the title bugfix: Reuse the same file pointer for repeated find output (#439) bugfix: Reuse the same file pointer for repeated find output Mar 24, 2025
@hanbings
Copy link
Collaborator

Hi, I see some tests failed, could you please check them? Thanks.

Copy link

codecov bot commented Mar 25, 2025

Codecov Report

Attention: Patch coverage is 92.68293% with 3 lines in your changes missing coverage. Please review.

Project coverage is 87.51%. Comparing base (5d264c9) to head (7926e6d).
Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
src/find/matchers/mod.rs 94.11% 1 Missing and 1 partial ⚠️
src/find/matchers/ls.rs 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #511      +/-   ##
==========================================
+ Coverage   87.46%   87.51%   +0.05%     
==========================================
  Files          31       31              
  Lines        6692     6720      +28     
  Branches      299      300       +1     
==========================================
+ Hits         5853     5881      +28     
+ Misses        627      626       -1     
- Partials      212      213       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tonakai-s
Copy link
Author

I executed the test suite on my windows machine and it failed, and I don't know why certainly.
I will run the fmt and check for solutions...
(Now I understand why other tests not checked all items, sorry about the waste of time)

// Used on file output arguments.
// When the same file is specified multiple times, the same pointer is used.
struct FileMemoizer {
mem: HashMap<String, Arc<File>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a string as the hash key won't work. It needs to handle different spellings of the same path, as well as hard links:

$ touch file
$ ln file link
$ find -fprint file -fprint ./file -fprint link

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, newbie mistake of my part, will fix it.

@@ -256,7 +257,7 @@ impl Ls {
impl Matcher for Ls {
fn matches(&self, file_info: &WalkEntry, matcher_io: &mut MatcherIO) -> bool {
if let Some(file) = &self.output_file {
self.print(file_info, file, true);
self.print(file_info, file.clone(), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.print(file_info, file.clone(), true);
self.print(file_info, file.as_ref(), true);

Same everywhere else, there's usually no need to clone the Arc

…Arc<File> clones. test:Add new tests to symbolic and hardlinks.
// Based on path inode or file_index check if the file already has been specified.
// If yes, use the same file pointer.
struct FileMemoizer {
mem: HashMap<u64, Arc<File>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single u64 is not enough to uniquely identify a file. On UNIX-like systems, you need the pair (st_dev, st_ino) (AKA device ID + inode number) at least for a unique ID. Not sure what's necessary on Windows. But using just the inode number is wrong since inode numbers are only unique within a single file system.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I read about it sometime ago but forgot, thanks for the precise explanation.
I will use the std::fs::metadata to take the st_dev and st_ino from the same source, but using std::os::unix::fs::MetadataExt and std::os::windows::fs::MetadataExt.
Will do a manual test to check the assurance of this implementation, but I don't think that has a way to do an automated test of it, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use uucore's FileInformation? Check samefile.rs for an example. (Hopefully it impls Hash)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that makes sense, FileInformation impls PartialEq and Hash, it's more simple than what I was trying to do. Will implement, test, and send a new commit (again).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Specifying the same file for output multiple times in find will overwrite the file by default.
3 participants