-
-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid hangs in async pony_check properties when using actions #4405
Conversation
In _Async... there is h.long_test(params.timeout) If I read that correctly it sets the long test to X which starts before the properties test so the properties tests think they have X amount of time, but were started after the long test value for X. I think that is a logical flaw, and the long_test would do "X + some value" where X is params.timeout. Am I incorrect in my understanding? That is the logical flaw I was speaking of in the issue. |
Logical error aside, I think this is a fine PR and if the logic error does exist, then it can be done in another PR. |
I'm going to rerun the tests a few times to make sure we don't see the timeout we were seeing before. |
So, the call to |
Interestingly while I wrote all that, the error returned on a musl build, rendering some of my assumptions invalid it seems. |
ive never seen it timeout on "a real machine" (that I remember). That's genuinely surprising and has me rethinking assumptions that I had made. |
I was able to reproduce the hang on a freebsd machine with
It does not happen often, but sometimes. The log was:
Investigating... |
I saw you pushed an update to this @mfelsche, any progress? Got fun debugging stories? |
@SeanTAllen Well, as it turns out, the narrator is always right. It was a logical flaw in the pony_check implementation. See the update above in the PR description. |
Hi @mfelsche, The changelog - fixed label was added to this pull request; all PRs with a changelog label need to have release notes included as part of the PR. If you haven't added release notes already, please do. Release notes are added by creating a uniquely named file in the The basic format of the release notes (using markdown) should be:
Thanks. |
@mfelsche do we still need the timeout changes? |
@mfelsche can you draft a file commit message and post here so we can use it when we squash and merge this? |
Discussed during sync and @jemc thinks that if we don't need the timeout changes, they should be removed. |
and with that hopefully terminate the occasional hangs.
56837f3
to
30ebd5b
Compare
The timeout changes have been removed. Here is a commit message for the squash:
|
This should fix issue #4391 for very slow execution environments.
[update]
After first assuming it was a timeout issue, i.e. that a slow execution environment did not finish within the default 60 secs timeout.
It turned out to be a logical flaw in how the
_PropertyRunner
handles expecting and completing actions. It was possible to have stray actors from old property runs call into the_PropertyRunner
instance that was already dealing with another run, thus disturbing the current run. The async Property handling withcomplete
andfail
already had such a safeguard, theexpect_action
andcomplete_action
/fail_action
calls did not yet have this.[/update]
After investigating #4391 i was not able to find a logical flaw in the
PropertyRunner
or in the way theUnitTest
timeout and theTestHelper.long_test()
feature is being used by pony_check. After looking at the logs, the best explanation for the spurious failures is that the timeout of 1 minute per property by default, was just too low for the specific qemu execution environment. The log lines are all expected and there should be 4 of them per property execution for a default of 100 samples. But the logs are short some of those, so the test simply didn't finish in the given 60 seconds.Hence this PR is changing the timeout for the tests to 5 minutes. There is no test actually checking for this timeout explicitly, so there will be no increase in testing time, coming from this change.