-
-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Close file streams if MinIO upload fails #1531
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fflorent, @jonathanperret, and @vviers.
@paulfitz, should we file an issue in our internal backlog to check S3 and Azure in grist-ee, or do you already happen to know how they handle cleanup?
Just cross-checking with @Spoffy that uploadStream calling destroy is ok also in the attachment usecase. I expect it is. Responsibility seems fuzzy here though. would be good to add a docstring to |
Firstly - some really nice sleuthing to find this issue. With respect to closing the stream, I think there's two considerations here: one is responsibility, one is attachment archives. I think the code that creates the stream from the file should take the responsibility to ensure the FD is closed properly. Attachment archive uploads are one situation where this might be problematic. The entire file stream needs to be consumed, so that the next file in the .tar archive can be processed. Destroying the stream here would prevent the .tar unpacking function from doing that. This would likely cause attachment archive uploads to fail if any individual S3 upload failed for any reason. For comparison, the current behaviour is that on a failed upload, the file is skipped and it tries to upload the next file. Looking at the code, I think the best place to handle this is in the upload method here: grist-core/app/server/lib/MinIOExternalStorage.ts Lines 99 to 104 in fdbf73b
Which creates the stream from the file. That does mean this issue might also exist on our other providers @paulfitz, if they're not cleaning up the file stream. Weirdly, it seems like Node doesn't release the FD when the stream is garbage collected? |
I agree. Actually our first implementation was in I'll move the code back into
I guess it could be debated whether skipping an attachment is better than failing the upload outright, but this doesn't take away from your main point.
This is a known behavior of Node streams, see for example nodejs/node#38681. But as far as I can gather, even language runtimes that do make an attempt to release external resources on garbage collection generally warn against relying on this, see e.g. for Python, despite this being a significant footgun. |
91c5a16
to
f814f6e
Compare
The branch was updated with a change that only touches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me :)
Context
Quoting @vviers in #1526:
We successfully reproduced the bug:
grist
:su grist
andwatch "ls -l /proc/$(pgrep -u grist -f 'server.js')/fd | grep persist"
/persist
, the upload should create a copy of the document (a backup suffixed with-backup
)➡️ And you should see through the above watch command that each time the server attempts to send the copy of the document, this copy remains open despite being dereferenced from the filesystem:
Leading to the result described in the issue #1526
Proposed solution
We solve the issue by just ensuring that the stream open when uploading the document is closed no matter if the minio client could have successfully sent the file or not.
Credits goes very mostly to @jonathanperret for this solution.
Related issues
Fixes #1526
Has this been tested?