Avoid hangs in async pony_check properties when using actions #4405

mfelsche · 2023-08-20T20:08:27Z

This should fix issue #4391 for very slow execution environments.

[update]

After first assuming it was a timeout issue, i.e. that a slow execution environment did not finish within the default 60 secs timeout.
It turned out to be a logical flaw in how the _PropertyRunner handles expecting and completing actions. It was possible to have stray actors from old property runs call into the _PropertyRunner instance that was already dealing with another run, thus disturbing the current run. The async Property handling with complete and fail already had such a safeguard, the expect_action and complete_action/fail_action calls did not yet have this.

[/update]

After investigating #4391 i was not able to find a logical flaw in the PropertyRunner or in the way the UnitTest timeout and the TestHelper.long_test() feature is being used by pony_check. After looking at the logs, the best explanation for the spurious failures is that the timeout of 1 minute per property by default, was just too low for the specific qemu execution environment. The log lines are all expected and there should be 4 of them per property execution for a default of 100 samples. But the logs are short some of those, so the test simply didn't finish in the given 60 seconds.

Hence this PR is changing the timeout for the tests to 5 minutes. There is no test actually checking for this timeout explicitly, so there will be no increase in testing time, coming from this change.

SeanTAllen · 2023-08-20T21:02:57Z

In _Async...

there is

    h.long_test(params.timeout)

If I read that correctly it sets the long test to X which starts before the properties test so the properties tests think they have X amount of time, but were started after the long test value for X. I think that is a logical flaw, and the long_test would do "X + some value" where X is params.timeout.

Am I incorrect in my understanding? That is the logical flaw I was speaking of in the issue.

SeanTAllen · 2023-08-20T21:03:55Z

Logical error aside, I think this is a fine PR and if the logic error does exist, then it can be done in another PR.

SeanTAllen · 2023-08-20T21:04:42Z

I'm going to rerun the tests a few times to make sure we don't see the timeout we were seeing before.

mfelsche · 2023-08-20T21:50:10Z

If I read that correctly it sets the long test to X which starts before the properties test so the properties tests think they have X amount of time, but were started after the long test value for X. I think that is a logical flaw, and the long_test would do "X + some value" where X is params.timeout.

Am I incorrect in my understanding? That is the logical flaw I was speaking of in the issue.

So, the call to h.long_test(params.timeout) immediately starts the timer, before the _PropertyRunner starts generating samples and actually executing the property. This delta is considerably small and inherent to how the runtime schedules the calls, and not controllable from the pony program itself. We might not have such a big delta if we'd handle the timeout in the _PropertyRunner itself, maybe checking it after each sample run and during shrinking? But honestly a timer seems like the best/most convenient option we got (as it is also able to interrupt the machinery if a generator or property takes very long), and there is already one in the pony_test harness, which is currently the way of running pony_check. I would consider it an improvement if the pony_check _PropertyRunner would handle the timeout itself, making it more independent from pony_test.
Summarizing, I would not call it a logical flaw, but a rather unprecise timeout. I can have a stab at this, but i think the added value is considerably small.

mfelsche · 2023-08-20T21:50:47Z

Interestingly while I wrote all that, the error returned on a musl build, rendering some of my assumptions invalid it seems.

SeanTAllen · 2023-08-20T22:13:22Z

ive never seen it timeout on "a real machine" (that I remember). That's genuinely surprising and has me rethinking assumptions that I had made.

mfelsche · 2023-08-23T17:45:53Z

I was able to reproduce the hang on a freebsd machine with

./build/debug/ponyc --debug packages/pony_check && cpuset -l3 ./pony_check --sequential

It does not happen often, but sometimes.

The log was:

**** FAILED: property_runner/async/complete_multi_fail_action
[PROPERTY] Action expected: succeed-once
[PROPERTY] Action completed: succeed-once
[PROPERTY] Action expected: succeed-once
[PROPERTY] Action failed: succeed-once
[PROPERTY] Action completed: succeed-once
[PROPERTY] action 'succeed-once' finished unexpectedly. ignoring.
[PROPERTY] Action failed: succeed-once
[PROPERTY] action 'succeed-once' finished unexpectedly. ignoring.
Test timed out without completing

Investigating...

SeanTAllen · 2023-08-27T19:56:03Z

I saw you pushed an update to this @mfelsche, any progress? Got fun debugging stories?

mfelsche · 2023-08-27T20:09:41Z

@SeanTAllen Well, as it turns out, the narrator is always right. It was a logical flaw in the pony_check implementation. See the update above in the PR description.

ponylang-main · 2023-08-27T20:12:19Z

Hi @mfelsche,

The changelog - fixed label was added to this pull request; all PRs with a changelog label need to have release notes included as part of the PR. If you haven't added release notes already, please do.

Release notes are added by creating a uniquely named file in the .release-notes directory. We suggest you call the file 4405.md to match the number of this pull request.

The basic format of the release notes (using markdown) should be:

## Title

End user description of changes, why it's important,
problems it solves etc.

If a breaking change, make sure to include 1 or more
examples what code would look like prior to this change
and how to update it to work after this change.

Thanks.

.release-notes/4405.md

SeanTAllen · 2023-08-27T21:45:44Z

@mfelsche do we still need the timeout changes?

SeanTAllen · 2023-08-29T17:43:39Z

@mfelsche can you draft a file commit message and post here so we can use it when we squash and merge this?

SeanTAllen · 2023-08-29T18:21:34Z

Discussed during sync and @jemc thinks that if we don't need the timeout changes, they should be removed.

and with that hopefully terminate the occasional hangs.

mfelsche · 2023-08-29T20:44:38Z

The timeout changes have been removed.

Here is a commit message for the squash:

Avoid hangs in async pony_check properties when using actions

It was possible for calls from stray actors from previous executions of the 
same property to call into the current execution of the property 
(in `_PropertyRunner`) and thus confuse  the current run.

This is fixed now. Each property run, and also the property executions during 
shrinking, now is uniquely identified, and all calls using actions now need to
provide those unique ids. Inside the `_PropertyRunner` we now check if the call
we receive is from the current expected run or from a previous one, and discard
the call in the latter case.

There is no user-facing API change for the `pony_check` package coming from
this.

ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Aug 20, 2023

SeanTAllen added the do not merge This PR should not be merged at this time label Aug 20, 2023

mfelsche changed the title ~~Increase ponycheck test timeout.~~ Fix hang pony_check when using asnyc properties and actions Aug 27, 2023

mfelsche added the changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge label Aug 27, 2023

mfelsche changed the title ~~Fix hang pony_check when using asnyc properties and actions~~ Fix hang in pony_check when using asnyc properties and actions Aug 27, 2023

SeanTAllen reviewed Aug 27, 2023

View reviewed changes

.release-notes/4405.md Outdated Show resolved Hide resolved

jemc approved these changes Aug 29, 2023

View reviewed changes

Avoid calls from earlier round confuse a later round

30ebd5b

and with that hopefully terminate the occasional hangs.

mfelsche force-pushed the pony_check/timeout branch from 56837f3 to 30ebd5b Compare August 29, 2023 20:34

Add release notes

43de019

SeanTAllen changed the title ~~Fix hang in pony_check when using asnyc properties and actions~~ Fix hang in pony_check when using async properties and actions Aug 29, 2023

SeanTAllen removed do not merge This PR should not be merged at this time discuss during sync Should be discussed during an upcoming sync labels Aug 29, 2023

ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Aug 29, 2023

SeanTAllen changed the title ~~Fix hang in pony_check when using async properties and actions~~ Avoid hangs in async pony_check properties when using actions Aug 29, 2023

SeanTAllen merged commit c97337a into main Aug 29, 2023

SeanTAllen deleted the pony_check/timeout branch August 29, 2023 22:54

ponylang-main removed the discuss during sync Should be discussed during an upcoming sync label Aug 29, 2023

github-actions bot pushed a commit that referenced this pull request Aug 29, 2023

Updates release notes for PR #4405

1382a25

github-actions bot pushed a commit that referenced this pull request Aug 29, 2023

Update CHANGELOG for PR #4405

aa55fac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid hangs in async pony_check properties when using actions #4405

Avoid hangs in async pony_check properties when using actions #4405

mfelsche commented Aug 20, 2023 •

edited

Loading

SeanTAllen commented Aug 20, 2023

SeanTAllen commented Aug 20, 2023

SeanTAllen commented Aug 20, 2023

mfelsche commented Aug 20, 2023

mfelsche commented Aug 20, 2023

SeanTAllen commented Aug 20, 2023

mfelsche commented Aug 23, 2023 •

edited

Loading

SeanTAllen commented Aug 27, 2023

mfelsche commented Aug 27, 2023

ponylang-main commented Aug 27, 2023

SeanTAllen commented Aug 27, 2023

SeanTAllen commented Aug 29, 2023

SeanTAllen commented Aug 29, 2023

mfelsche commented Aug 29, 2023 •

edited by SeanTAllen

Loading

Avoid hangs in async pony_check properties when using actions #4405

Avoid hangs in async pony_check properties when using actions #4405

Conversation

mfelsche commented Aug 20, 2023 • edited Loading

SeanTAllen commented Aug 20, 2023

SeanTAllen commented Aug 20, 2023

SeanTAllen commented Aug 20, 2023

mfelsche commented Aug 20, 2023

mfelsche commented Aug 20, 2023

SeanTAllen commented Aug 20, 2023

mfelsche commented Aug 23, 2023 • edited Loading

SeanTAllen commented Aug 27, 2023

mfelsche commented Aug 27, 2023

ponylang-main commented Aug 27, 2023

SeanTAllen commented Aug 27, 2023

SeanTAllen commented Aug 29, 2023

SeanTAllen commented Aug 29, 2023

mfelsche commented Aug 29, 2023 • edited by SeanTAllen Loading

mfelsche commented Aug 20, 2023 •

edited

Loading

mfelsche commented Aug 23, 2023 •

edited

Loading

mfelsche commented Aug 29, 2023 •

edited by SeanTAllen

Loading