Skip to content
This repository was archived by the owner on Jan 10, 2025. It is now read-only.

Fix SIGPIPE when piping large data in magic.rs #134

Closed
wants to merge 1 commit into from
Closed

Fix SIGPIPE when piping large data in magic.rs #134

wants to merge 1 commit into from

Conversation

fspillner
Copy link
Contributor

@fspillner fspillner commented May 2, 2020

Hi.

Here the second separate PR to fix the rendering of large image files, if the large image (>1MB) is piped through the file command in src/magic.rs:

The method write_all fail with the error "Broken pipe (os error 32)", when the data being piped is large enough to be terminated due to SIGPIPE. To workaround, we just extract only the first 4kb of interesting bytes that's sufficient for determining the mime-type of the image.

I could reproduce the behavior in my fish shell:

> cat sample/unicorn.png | file --brief --mime-type -
image/png
> echo $pipestatus
0 0

and with a large image (>1MB):

> cat sample/1mb.png | file --brief --mime-type -
image/png
> echo $pipestatus
141 0

A SIGPIPE signal is sent by Kernel and the command is terminated with pipe status 141. My wild guess is that the command file closes the stdin after reading the 1MB bytes. I need more research on the topic around pipes.

WDYT?
Fabian

Update:

Running strace on file command and found out that the command file stops reading the bytes at 1048576 bytes, here a extract from strace output:

read(3, 0x7f40994dd010, 1048576)        = 1048576

In the man of libmagic you find the documentation:

MAGIC_PARAM_BYTES_MAX        size_t    1048576

So, the file command reads while the MAGIC_PARAM_BYTES_MAX is reached and closes the file descriptor.

@fspillner
Copy link
Contributor Author

fspillner commented May 2, 2020

As alternative we may use / could try the library https://github.com/aahancoc/tree_magic for detecting the mime types.

Copy link
Owner

@swsnr swsnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for discovering and fixing this issue.

Would you mind to add this to CHANGELOG.md and also perhaps write some little smoke tests? I think one test that calls this function with a zero-buffer of exactly the maximum buffer size and another that calls it with lets say a 2 MiB buffer of zeros, and check that neither fails, would be sufficient.

And please do rebase onto master to fix the lint issue and make the PR build succeed.

@fspillner
Copy link
Contributor Author

@lunaryorn I’ll do rebasing and add some smoke tests. That makes sense for me, too. 👍

The method `write_all` fail with "Broken pipe (os error 32)" when the data being piped is large enough to be terminated due to SIGPIPE as the file command reads maximum 1048576 bytes from the data. See MAGIC_PARAM_BYTES_MAX in manpage libmagic(3). To workaround we extract only the first 1mb of interesting bytes that's sufficient for determining the mime-type of the image.
Copy link
Owner

@swsnr swsnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please edit the commit message and wrap all text at 72 or at most 80 characters, see eg these guidelines

Currently it's one long line without line breaks, and that's just impossible to read sanely in git log.

@@ -7,6 +7,8 @@ This project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html
To publish a new release run `scripts/release` from the project directory.

## [Unreleased]
### Fixed
- Fix SIGPIPE if rendering large image >1MB (see [GH-134]).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, it's "MiB`, because it's a binary quantity, not a decimal.

@swsnr
Copy link
Owner

swsnr commented May 7, 2020

I just tried this code, and can reproduce the test failure, and it turns out that I misunderstood what this code fixes.

I assumed mdcat would actually crash with SIGPIPE, but in my test it's just write that fails with a broken pipe error. Is that also what you saw?

If so I'd prefer if we just kept writing the whole buffer and specifically ignored the broken pipe error from the write call. To me that seems not only easier, but also more future-proof because it doesn't depend on some specific libmagic constant (which wouldn't exist or perhaps have a different value in alternative file implementations).

@fspillner
Copy link
Contributor Author

Please edit the commit message and wrap all text at 72 or at most 80 characters, see eg these guidelines

Currently it's one long line without line breaks, and that's just impossible to read sanely in git log.

Oh. Next time I should give more attention on the formatting of my commit message.

@fspillner
Copy link
Contributor Author

I assumed mdcat would actually crash with SIGPIPE, but in my test it's just write that fails with a broken pipe error. Is that also what you saw?

Yes. Also it's good to know that Rust doesn't let the application crash :-)

If so I'd prefer if we just kept writing the whole buffer and specifically ignored the broken pipe error from the write call. To me that seems not only easier, but also more future-proof because it doesn't depend on some specific libmagic constant (which wouldn't exist or perhaps have a different value in alternative file implementations).

I can offer an alternative implementation to workaround the problem by bypassing the BrokenPipe error e.g.

process
    .stdin
    .as_mut()
    .expect("Forgot to pipe stdin?")
    .write_all(buffer)
    .or_else(|e| {
        match e.kind() {
            ErrorKind::BrokenPipe => Ok(()),
            _ => Err(e)
        }
    })?;

It doesn't rely on the magic variables anymore.

If you dislike the alternative, I'd appreciate to hear your reason and feel free to close the PR according.

Thanks.

@swsnr
Copy link
Owner

swsnr commented May 8, 2020

@fspillner Yes, that's what I meant; I think I like this more :)

@swsnr swsnr self-assigned this May 15, 2020
@swsnr swsnr closed this in 28723fc May 15, 2020
swsnr added a commit that referenced this pull request May 15, 2020
Some file implementations, notably the libmagic one typically installed
on Linux, close their stdin when they read enough data to determine the
file type.  This triggers a broken pipe error on the rust side which we
can ignore.

That's safer and less complicated than figuring out how much data we can
pipe safely to file.

See GH-134
swsnr added a commit that referenced this pull request May 15, 2020
Some file implementations, notably the libmagic one typically installed
on Linux, close their stdin when they read enough data to determine the
file type.  This triggers a broken pipe error on the rust side which we
can ignore.

That's safer and less complicated than figuring out how much data we can
pipe safely to file.

See GH-134
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants